0
点赞
收藏
分享

微信扫一扫

beautifulsoup关于标签的初学习


代码:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
print(r.text)
print("\n")
demo = r.text
print(demo)
soup = BeautifulSoup(demo, "html.parser")
print("递归:\n")
print(soup.head)
print("\n")
print(soup.head.contents)
print("\n")
print('body is:\n'+str(soup.body))
print('body_content is:'+str(soup.body.contents))
print('数量:')
print(len(soup.body.contents))
print("查看一下第一个标签内容:")
print(soup.body.contents[1])#必须从一开始!

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py


<p class="title"><b>The demo python introduces several python courses.</b></p>


<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>



Process finished with exit code 0

先煲成一锅汤。然后利用:

soup = BeautifulSoup(demo, "html.parser")

进行解析。解析成为易读懂的样子:

<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>

然后分别输出“title”,以及title的parent。.

比如:

print(soup.title):
<title>This is a python demo page</title>

print(soup.title.parent):
<head><title>This is a python demo page</title></head>

如图:

beautifulsoup关于标签的初学习_html

标签的前序标签与后续标签:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://python123.io/ws/demo.html")
demo = r.text
soup = BeautifulSoup(demo, "html.parser")
print('\na标签的前一个平行标签')
print(soup.a.previous_sibling)
print('\na标签的下一个平行标签')
print(soup.a.next_sibling)
print('\na标签的下一个平行标签的下一个平行标签')
print(soup.a.next_sibling.next_sibling)

结果:

D:\python_install\python.exe D:/pycharmworkspace/temp1/crawler_1.py

a标签的前一个平行标签
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:


a标签的下一个平行标签
and

a标签的下一个平行标签的下一个平行标签
<a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>

Process finished with exit code 0

OK

举报

相关推荐

0 条评论