在BeautifulSoup中访问属性时有问题

| 我在使用Python（2.7）时遇到问题。该代码主要包括：

str = \'<el at=\"some\">ABC</el><el>DEF</el>\'
z = BeautifulStoneSoup(str)

for x in z.findAll(\'el\'):
    # if \'at\' in x:
    # if hasattr(x, \'at\'):
        print x[\'at\']   
    else:
        print \'nothing\'

我期望第一个if语句正常工作（即：如果at不存在，则打印\"nothing\"），但它始终不打印任何内容（即：始终为False）。另一方面，第二个if始终是True，当尝试从第二个<el>元素访问at时，代码将引发KeyError，当然这是不存在的。

已邀请:

4 个回复

babsoft

in运算符用于序列和映射类型，是什么使您认为BeautifulSoup返回的对象应该正确实现呢？根据BeautifulSoup文档，您应该使用[]语法访问属性。关于ѭ13，我认为您混淆了HTML / XML属性和Python对象属性。 hasattr是后者，BeaitufulSoup AFAIK不会在自己的对象属性中反映它解析的HTML / XML属性。附言注意BeautifulSoup中的Tag对象确实实现了__contains__-所以也许您正在尝试使用错误的对象？您能否显示一个完整但最少的示例来说明问题？运行此：

from BeautifulSoup import BeautifulSoup

str = \'<el at=\"some\">ABC</el><el>DEF</el>\'
z = BeautifulSoup(str)

for x in z.findAll(\'el\'):
    print type(x)
    print x[\'at\']

我得到：

<class \'BeautifulSoup.Tag\'>
some
<class \'BeautifulSoup.Tag\'>
Traceback (most recent call last):
  File \"soup4.py\", line 8, in <module>
    print x[\'at\']
  File \"C:\\Python26\\lib\\site-packages\\BeautifulSoup.py\", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: \'at\'

这是我所期望的。第一个el具有at属性，第二个不具有-并抛出KeyError。更新2：ѭ23会查看标签内容而不是其属性。要检查属性是否存在，请使用10。

漂汀拦

如果您的代码像您提供的一样简单，则可以使用以下方法以紧凑的方式解决它：

for x in z.findAll(\'el\'):
    print x.get(\'at\', \'nothing\')

惭法搽

仅通过标签名称扫描元素，pyparsing解决方案可能更具可读性（并且不使用不推荐使用的API，例如has_key）：

from pyparsing import makeXMLTags

# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags(\'el\')

# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
    if elTag.at:
        print elTag.at

为了进一步完善，pyparsing允许您定义过滤解析操作，以便仅在找到特定的属性值（或attribute-anyvalue）时，标签才匹配：

# import parse action that will filter by attribute
from pyparsing import withAttribute

# only match el tags having the \'at\' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))

# now loop again, but no need to test for presence of \'at\'
# attribute - there will be no match if \'at\' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
    print elTag.at

脖呐

我通常使用get（）方法访问属性

link = soup.find(\'a\')
href = link.get(\'href\')
name = link.get(\'name\')

if name:
    print \'anchor\'
if href:
    print \'link\'

要回复问题请先登录或注册

在BeautifulSoup中访问属性时有问题

4 个回复

发起人

beautifulsoup

python

attributes

问题状态

在BeautifulSoup中访问属性时有问题

与内容相关的链接

4 个回复

发起人

beautifulsoup

python

attributes

问题状态