Python没有类似php的strip_tags函数,不过有更为强大的HTMLParser。
方法一:HTMLParser
from html.parser import HTMLParser
class StripTagsHTMLParser(HTMLParser):
data = ""
def handle_data(self, data):
self.data += data
def getData(self):
return self.data
parser = StripTagsHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
data = parser.getData()
print(data)
输出:
TestParse me!
方法二:w3lib
from w3lib import html
doc = '<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>'
result = html.remove_tags(doc)
print(result)
w2lib.html还可以remove_tags_with_content
、remove_comments
、remove_entities
。
参考
https://docs.python.org/3/library/html.parser.html
https://w3lib.readthedocs.io/en/latest/