xml数据是以标签(元素)为单位,通过标签内容或者标签属性来提供数据。
案例:用json和xml来保存一个订单数据
json版本:
{
"id": "oder028398923",
"create_time": "2022-6-26 22:34:56",
"pay_time": "2022-6-26 22:36:02",
"goods_list":[
{"goods_id": "9234233", "price": 123.00, "count": 2, "name": "XXX防晒霜"},
{"goods_id": "7281911", "price": 32.00, "count": 1, "name": "拖鞋"}
]
}
<order id="oder028398923">
<create_time>2022-6-26 22:34:56</create_time>
<pay_time>2022-6-26 22:36:02</pay_time>
<goods_list>
<goods goods_id="9234233">
<price>123.00</price>
<count>2</count>
<name>XXX防晒霜</name>
</goods>
<goods goods_id="7281911">
<price>32.00</price>
<count>1</count>
<name>拖鞋</name>
</goods>
</goods_list>
</order>
from lxml import etree
f = open(‘files/超市.xml’, encoding=‘utf-8’)
root = etree.XML(f.read())
f.close()
路径写法:
1.绝对路径: /路径 (路径必须从根节点开始往后写)
2.相对路径:./路径 (.表示当前节点; 谁去点的xpath,当前节点就是谁)
…/路径 (…表示当前节点的上层节点)
注意:如果路径是以 ‘./’ 开始的,‘./’ 可以省略
3.任意路径://路径
注意:绝对路径和任意路径,在写路径和获取标签的时候跟xpath前面是哪个标签没有任何关系
# 绝对路径
result = root.xpath('/supermarket/name')
print(result)
result = root.xpath('/supermarket/staffs/staff')
print(result)
# 练习:通过写绝对路径,获取所有商品的商品名对应的标签
result = root.xpath('/supermarket/all_goods/goods/name')
print(result)
# 相对路径
all_goods = root.xpath('./all_goods')[0]
# result = all_goods.xpath('/supermarket/all_goods/goods/name')
# print(result)
result = all_goods.xpath('./goods/name')
print(result)
# 任意路径
result = root.xpath('//name')
print(result)
result = all_goods.xpath('//name')
print(result)
from lxml import etree
root = etree.XML(open('files/超市.xml', encoding='utf-8').read())
# 01. 获取标签内容和标签属性
# 1)获取标签内容:获取标签的路径/text()
result = root.xpath('//goods/name/text()')
print(result) # ['泡面', '火腿肠', '矿泉水', '巧克力']
# 2)获取标签属性值:获取标签的路径/@属性名
result = root.xpath('//staff/@position')
print(result)
esult = root.xpath('//goods[2]/name/text()')
print(result) # ['火腿肠']
result = root.xpath('//goods[last()]/name/text()')
print(result) # ['巧克力']
result = root.xpath('//goods[last()-1]/name/text()')
print(result) # ['矿泉水']
result = root.xpath('//goods[position()<=3]/name/text()')
print(result) # ['泡面', '火腿肠', '矿泉水']
result = root.xpath('//goods[@class="c3"]/name/text()')
print(result)
result = root.xpath('//goods[@id="d1"]/name/text()')
print(result)
result = root.xpath('//goods[@class]/name/text()')
print(result)
[子标签名>值]、[子标签名>=值]、[子标签名<值]、[子标签名<=值]、[子标签名=值]
result = root.xpath('//goods[pirce=1.5]/name/text()')
print(result)
result = root.xpath('//goods/name1/text()|//goods/name2/text()')
print(result) # []
result = root.xpath('//goods/name2/text()|//goods/name/text()')
print(result) # ['泡面', '火腿肠', '矿泉水', '巧克力']
result = root.xpath('//goods/name/text()|//staffs/staff/text()')
result = root.xpath('//goods/name/text()')
print(result)
result = root.xpath('//goods/*/text()')
print(result) # ['泡面', '3.5', '120', '火腿肠', '1.5', '305', '矿泉水', '1.5', '1200', '巧克力', '11.5', '50']
result = root.xpath('//*[@class="c2"]')
print(result)
# 所有第一个员工的所有属性值
result = root.xpath('//staff[1]/@*')
print(result)
# 获取属性值为c2的所有标签
result = root.xpath('//*[@*="c2"]')
print(result)
# 获取id值为'd1'的标签
result = root.xpath('//*[@id="d1"]')