Home>

[Xml basic concept introduction]

xml stands for extensible markup language.

XML is designed to transfer and store data.

Concept one:

<foo>#start tag of foo element
</foo>#end tag of foo element
 #note:Each start tag must have a corresponding end tag to close it.
 Can also be written as<foo />

Concept two:

<foo>#elements can be nested to arbitrary arguments
 <bar></bar>#The bar element is a child of the foo element
</foo>#end tag of parent element foo

Concept three:

<foo lang="en">#The foo element has a lang attribute whose value is:en;corresponding to a python dictionary (name-value) pair;
 <bar lang="ch"></bar>#The bar element has a lang attribute with the value:ch;and an id attribute with the value:001, placed in "" or "" ;
</foo>#The lang attribute in the bar element will not conflict with the foo element,Each element has its own set of attributes;

Concept four:

<title>learning python</title>#element can have text content
 #note:If an element has no text content,There are no child elements,Is an empty element.

Concept five:

<info>#info element is the root node
 <list>a</list>#list elements are child nodes
 <list>b</list>
 <list>c</list>
</info>

Concept 6:

<feed xmlns="http://www.w3.org/2005/atom">#The default namespace can be defined by declaring xmlns,The feed element is in the http://www.w3.org/2005/atom namespace
 <title>dive into mark&​​lt;/title>#The title element is also.
A namespace declaration will not only affect the element that currently declares it,Also affects all child elements of the element
</feed>
You can also define a namespace through the xmlns:prefix declaration and name it prefix.
Each element in the namespace must then be explicitly declared with this prefix.
<atom:feed xmlns:atom="http://www.w3.org/2005/atom">#feed belongs to namespace atom
 <atom:title>dive into mark&​​lt;/atom:title>#The title element also belongs to the namespace
</atom:feed>#xmlns (xml name space)

[Several parsing methods of xml]

Common xml programming interfaces are dom and sax. These two interfaces handle xml files differently.Use occasions are naturally different.

Python has three ways to parse XML:sax, dom, and elementtree:

1.sax (simple api for xml)

The pyhton standard library includes a sax parser, and sax uses an event-driven model.Processes XML files by triggering events and calling user-defined callback functions during XML parsing. sax is an event-driven api. Parsing XML documents using sax involves two parts:a parser and an event handler.

The parser is responsible for reading the xml document and sending events to the event handler,Such as element start and end events;The event handler is responsible for handling the event.

Advantages:Sax streaming reads XML files, which is faster and takes up less memory.

Disadvantages:Users need to implement callback functions (handlers).

2.dom (document object model)

Parse the xml data into a tree in memory,Manipulate xml through operations on the tree. A dom parser reads the entire document at once when parsing an xml document.Save all elements of the document in a tree structure in memory,Then you can use different functions provided by dom to read or modify the content and structure of the document.You can also write the modified content to an xml file.

Pros:The advantage of using dom is that you don't need to track the status,Because every node knows who is its parent,Who are the child nodes.

Disadvantages:dom needs to map xml data to a tree in memory,One is relatively slow,The second is more memory consuming,It's more troublesome to use!

3.elementtree

Elementtree is like a lightweight dom, with a convenient and friendly API. Good code availability,It is fast and consumes less memory.

in comparison,The third method,Easy and fast, we use it all the time! Here's how to use element tree to parse XML:

[Elementtree analysis]

Two implementations

elementtree was born to process xml, and it has two implementations in the Python standard library.

One is a pure python implementation, for example:xml.etree.elementtree

The other is faster:xml.etree.celementtree

Try to use the kind implemented by C language,Because it's faster,And it consumes less memory! You can write this in your program:

try:
 import xml.etree.celementtree as et
except importerror:
 import xml.etree.elementtree as et

Common methods

#When i want to get the attribute value,Use the attrib method.
#When i want to get the node value,Use the text method.
#When i want to get the node name,Use the tag method.

Sample xml

<?xml version="1.0" encoding="utf-8"?>
<info>
 <intro>book message</intro>
 <list>
 <head>bookone</head>
 <name>python check</name>
 <number>001</number>
 <page>200</page>
 </list>
 <list>
 <head>booktwo</head>
 <name>python learn</name>
 <number>002</number>
 <page>300</page>
 </list>
</info>

############

##Load xml

############

Method one:load the file

root=et.parse ("book.xml")

Method two:load the string

root=et.fromstring (xmltext)

############

##Get node

############

Method 1:Get the specified node->getiterator () method

book_node=root.getiterator ("list")

Method 2:Get the specified node->findall () method

book_node=root.findall ("list")

Method 3:Get the specified node->find () method

book_node=root.find ("list")

Method 4:Get the child node->getchildren ()

for node in book_node:
 book_node_child=node.getchildren () [0]
 print book_node_child.tag, "=>", book_node_child.text

############

##Example 01

############

#coding=utf-8
try:#import module
 import xml.etree.celementtree as et
except importerror:
 import xml.etree.elementtree as et
root=et.parse ("book.xml") #Parse the xml file
books=root.findall ("/ list") #Find all children of list under the root directory
for book_list in books:#Iterate over the search results
 print "=" * 30 #output format
 for book in book_list:#Iterate over each child node,Find out your attributes and values ​​inside
 if book.attrib.has_key ("id"):#Id for conditional judgment
 print "id:", book.attrib ["id"] #print the attribute value based on id
 print book.tag + "=>" + book.text #print tags and text content
print "=" * 30

Output results:

================================
head =>bookone
name =>python check
number =>001
page =>200
================================
head =>booktwo
name =>python learn
number =>002
page =>300
================================

ps:Here are several online tools on xml operation for your reference:

Online xml/json conversion tool:

Format xml online/compress xml online:

xml online compression/formatting tool:

  • Previous PHP accurate method to obtain the server IP address
  • Next Detailed introspection (reflection) in Python