您的位置:首页技术文章
文章详情页

python - 怎么查看Beautiful Soup的prettify(encoding, formatter="minimal")

【字号: 日期:2022-08-24 18:47:42浏览:32作者:猪猪

问题描述

soup=bs(html)html2 = soup.prettify(’utf-8’, formatter=’minimal’)

prettify() 方法的第二个参数 formatter 到底有几个合法的取值呢?

我们只知道有 minimal, 还有什么呢?我们都不知道.怎么找到这些参数的值呢?这是 python 最不好的一个地方,方法的说明里不写明白,使用者怎么寻找这些特定的取值呢?

问题解答

回答1:

在官方的 doc 裡面就有完整的說明了:

Output formatters

The default is formatter='minimal'. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML

If you pass in formatter='html', Beautiful Soup will convert Unicode characters to HTML entities whenever possible

If you pass in formatter=None, Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML

Finally, if you pass in a function for formatter, Beautiful Soup will call that function once for every string and attribute value in the document. You can do whatever you want in this function.

至於要找 code:

In [1]: import bs4In [2]: bs4.BeautifulSoup.prettify.__code__Out[2]: <code object prettify at 0x103f7f5d0, file '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bs4/element.py', line 1198>

我回答過的問題: Python-QA

回答2:

看代码...

HTML_FORMATTERS = {'html' : HTMLAwareEntitySubstitution.substitute_html,'minimal' : HTMLAwareEntitySubstitution.substitute_xml,None : None} XML_FORMATTERS = {'html' : EntitySubstitution.substitute_html,'minimal' : EntitySubstitution.substitute_xml,None : None}

python - 怎么查看Beautiful Soup的prettify(encoding, formatter="minimal")

可以见 https://imgur.com/gallery/VkNUv

不知道怎么显示不出来这个图片

标签: Python 编程