0

点赞

收藏

分享

Python中HTML解析

微笑沉默 2021-09-28 阅读 125

标签: 技术

BeautifulSoup

安装

pip install beautifullsoup4

使用

from bs4 import BeautifulSoup;

soup = BeautifulSoup(html);

ul = soup.find('ul',attrs={'class':'county'}); //找HTML中class为county 的元素
ul.find('li');// ul节点下找第一个li节点

更多

https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

Lxml

解析速度比Beautiful Soup更快

安装

https://lxml.de/installation.html

pip install lxml

使用

官网手册：https://lxml.de/api/index.html

import lxml.html;
import lxml.cssselect;

tree = lxml.html.fromstring(html);
result = lxml.html.tostring(tree,pretty_print=True); //格式化输出
print result

td = tree.cssselect('tr#places_area__row > td.w2p_fw ')[0]//按节点找
print td.text_content()

0 条评论

关注