文章详情页

python爬虫beautifulsoup解析html方法

浏览：138日期：2022-07-03 08:08:36

用BeautifulSoup 解析html和xml字符串

实例：

#!/usr/bin/python# -*- coding: UTF-8 -*-from bs4 import BeautifulSoupimport re#待分析字符串html_doc = '''<html><head> <title>The Dormouse’s story</title></head><body> The Dormouse’s story Once upon a time there were three little sisters; and their names were <a href='http://example.com/elsie' rel='external nofollow' id='link1'>Elsie</a>, <a href='http://example.com/lacie' rel='external nofollow' id='link2'>Lacie</a> and <a href='http://example.com/tillie' rel='external nofollow' id='link3'>Tillie</a>; and they lived at the bottom of a well....'''# html字符串创建BeautifulSoup对象soup = BeautifulSoup(html_doc, ’html.parser’, from_encoding=’utf-8’)#输出第一个 title 标签print soup.title#输出第一个 title 标签的标签名称print soup.title.name#输出第一个 title 标签的包含内容print soup.title.string#输出第一个 title 标签的父标签的标签名称print soup.title.parent.name#输出第一个 p 标签print soup.p#输出第一个 p 标签的 class 属性内容print soup.p[’class’]#输出第一个 a 标签的 href 属性内容print soup.a[’href’]’’’soup的属性可以被添加,删除或修改. 再说一次, soup的属性操作方法与字典一样’’’#修改第一个 a 标签的href属性为 http://www.baidu.com/soup.a[’href’] = ’http://www.baidu.com/’#给第一个 a 标签添加 name 属性soup.a[’name’] = u’百度’#删除第一个 a 标签的 class 属性为del soup.a[’class’]##输出第一个 p 标签的所有子节点print soup.p.contents#输出第一个 a 标签print soup.a#输出所有的 a 标签，以列表形式显示print soup.find_all(’a’)#输出第一个 id 属性等于 link3 的 a 标签print soup.find(id='link3')#获取所有文字内容print(soup.get_text())#输出第一个 a 标签的所有属性信息print soup.a.attrsfor link in soup.find_all(’a’): #获取 link 的 href 属性内容 print(link.get(’href’))#对soup.p的子节点进行循环输出 for child in soup.p.children: print(child)#正则匹配，名字中带有b的标签for tag in soup.find_all(re.compile('b')): print(tag.name)

爬虫设计思路：

python爬虫beautifulsoup解析html方法

详细手册：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

到此这篇关于python爬虫beautifulsoup解析html方法的文章就介绍到这了,更多相关beautifulsoup解析html内容请搜索好吧啦网以前的文章或继续浏览下面的相关文章希望大家以后多多支持好吧啦网！

Python 编程

上一条：python 通过 pybind11 使用Eigen加速代码的步骤下一条：python可视化 matplotlib画图使用colorbar工具自定义颜色

相关文章：

1. Vue axios获取token临时令牌封装案例2. IntelliJ IDEA导出项目的方法3. Intellij IDEA 关闭和开启自动更新的提示?4. ASP.NET MVC前台动态添加文本框并在后台使用FormCollection接收值5. idea修改背景颜色样式的方法6. python 实现有道翻译功能7. 制作JAVA的安装程序-Advanced Installer for Java v3.1 Released8. idea设置自动导入依赖的方法步骤9. Python插件机制实现详解10. python如何操作mysql

排行榜

					
					ASP.NET MVC前台动态添加文本框并在后台使用FormCollection接收值
idea设置自动导入依赖的方法步骤
Intellij IDEA 关闭和开启自动更新的提示?
python 实现有道翻译功能
IntelliJ IDEA导出项目的方法
Vue axios获取token临时令牌封装案例
idea修改背景颜色样式的方法
制作JAVA的安装程序-Advanced Installer for Java v3.1 Released
Python3读写ini配置文件的示例
python开发一款翻译工具
.NET的基元类型包括什么及Unmanaged和Blittable类型详解
				

热门标签