Home>

This article will introduce four methods to parse XML in detail.XML has now become a universal data exchange format.The independence of the platform makes it necessary to use xml in many situations.

Four ways to parse xml files in java

1 Introduction

1) dom (jaxp crimson parser)

dom is the official w3c standard for representing xml documents in a platform and language independent manner. A dom is a collection of nodes or pieces of information organized in a hierarchical structure.This hierarchy allows developers to look for specific information in the tree.Analyzing the structure usually requires loading the entire document and constructing the hierarchy,Only then do any work.Since it is based on the information level,The dom is therefore considered to be tree-based or object-based.dom and generalized tree-based processing have several advantages.First, since the tree is persistent in memory,It can therefore be modified so that the application can make changes to the data and structure.It can also navigate up and down the tree at any time,It is not a one-time process like sax.Dom is also much simpler to use.

2) sax

The advantages of sax processing are very similar to those of streaming.Analysis can begin immediately,Instead of waiting for all data to be processed.And, because the application just checks the data as it reads it,There is no need to store the data in memory.This is a huge advantage for large documents.In fact, the application does not even have to parse the entire document;It can stop parsing when a certain condition is met.Generally speaking,sax is also much faster than its replacement dom.

Choose dom or sax?For developers who need to write their own code to work with xml documents, Choosing a dom or sax analytical model is a very important design decision. dom uses a tree structure to access XML documents, while Sax uses an event model.

The dom parser transforms the xml document into a tree containing its content,And can traverse the tree.The advantage of parsing the model with dom is that it is easy to program,Developers only need to call the build instruction,Then use navigation apis to access the required tree nodes to complete the task.You can easily add and modify elements in the tree.However, since the entire xml document needs to be processed when using the dom parser, the performance and memory requirements are relatively high.Especially when encountering large xml files.Because of its traversal capabilities,The dom parser is often used in services where xml documents require frequent changes.

The sax parser uses an event-based model,It can trigger a series of events when parsing the xml document,When a given tag is found, it can activate a callback method,Tells the method that a label has been found.sax's memory requirements are usually relatively low,Because it allows developers to decide which tags to process. Especially when developers only need to process part of the data contained in the document,This expansion capability of sax is better reflected.But coding is more difficult when using the sax parser,And it is difficult to access multiple different data in the same document at the same time.

3) jdom

The purpose of jdom is to be a java-specific document model,It simplifies interaction with xml and is faster than using dom.Since it is the first java specific model,jdom has been vigorously promoted and promoted.It is being considered to be eventually used as a "java standard extension" through "java specification request jsr-102". Jdom development has been under way since the beginning of 2000.

There are two main differences between jdom and dom.First, jdom uses only concrete classes and no interfaces.This simplifies the api in some ways, but it also limits flexibility.Second, the API makes extensive use of the collections classes, which simplifies the use of java developers who are already familiar with these classes.

The jdom documentation states that its purpose is "to solve 80%(or more) java/xml problems with 20%(or less) effort" (assuming 20%​​based on the learning curve). jdom is certainly useful for most java/xml applications,And most developers find that api is much easier to understand than dom.jdom also includes a fairly extensive inspection of program behavior to prevent users from doing anything that is meaningless in the xml.However, it still requires you to fully understand xml in order to do something beyond the basics (or even understand errors in some cases). This may be a more meaningful job than learning either the dom or jdom interface.

jdom itself does not include a parser.It usually uses a sax2 parser to parse and validate the input xml document (though it can also take a previously constructed dom representation as input). It contains converters to output the jdom representation as a sax2 event stream, a dom model, or an xml text document.jdom is an open source release under the Apache license variant.

4) dom4j

Although dom4j represents completely independent development results,But initially, it was an intelligent branch of jdom.It incorporates many features beyond the basic XML document representation,Includes integrated xpath support, xml schema support, and event-based processing for large or streaming documents.It also provides the option to build a document representation,It has parallel access through dom4j api and standard dom interface.From the second half of 2000,It has been under development.

To support all of these features,dom4j uses interfaces and abstract base class methods.dom4j makes heavy use of the collections class in the api, but in many cases,It also provides some alternatives to allow better performance or more direct encoding methods.The immediate benefit is thatAlthough dom4j comes at the cost of more complex APIs, it provides much greater flexibility than jdom.

When adding flexibility, xpath integration, and goals for large document processing,The goal of dom4j is the same as jdom:ease of use and intuitive operation for java developers.It also strives to be a more complete solution than jdom,Achieve the goal of essentially handling all java/xml issues.When this goal is achieved,It places less emphasis on preventing incorrect application behavior than jdom.

dom4j is a very very good java xml api with excellent performance, powerful features and extremely easy to use features.It is also an open source software.Now you can see that more and more Java software is using dom4j to read and write XML. It is especially worth mentioning that even JAXM of Sun is also using dom4j.

Comparison

1) dom4j has the best performance,Even Sun's Jaxm is also using dom4j. Currently many dom4j are used in many open source projects. For example, the famous hibernate also uses dom4j to read XML configuration files.Without portability,Then use dom4j.

2) jdom and dom perform poorly during performance testing,Memory overflow when testing 10m files.It is also worth considering using dom and jdom in small documentation. Although the developers of jdom have stated that they expect to focus on performance issues before the official release,But from a performance perspective,It really has nothing to recommend.In addition, dom is still a very good choice.The dom implementation is widely used in many programming languages.It is also the basis for many other XML-related standards,Because it is officially recommended by w3c (as opposed to being based on a non-standard java model), it may also be needed in some types of projects (such as using dom in javascript).

3) Sax performs better,It depends on its specific parsing method-event-driven.A sax detects the upcoming xml stream, but it is not loaded into memory (of course when the xml stream is read in,Some documents are temporarily hidden in memory).

3. The basic usage of the four xml operation methods.

xml
<?xml version="1.0" encoding="gb2312"?>
< result >
< value >
<No>a1234</no>
<Addr>No. xx, Section xx, xx Road, xx Town, xx County, Sichuan Province
</Value>
< value >
<No>b1234</no>
Add add<addr>xx group, xx village, xx town, xx city, Sichuan province
</Value>
</Result>

1.dom

myxmlreader.java
import java.io. *;
import java.util. *;
import org.w3c.dom. *;
import javax.xml.parsers. *;
public class myxmlreader {
Public static void main (string arge []) {
long lasting=system.currenttimemillis ();
try {
File f=new file ("data_10k.xml");
Documentbuilderfactory factory=documentbuilderfactory.newinstance ();
Documentbuilder builder=factory.newdocumentbuilder ();
Document doc=builder.parse (f);
Nodelist nl=doc.getelementsbytagname ("value");
For (int i=0;i<nl.getlength ();i ++) {
system.out.print ("License plate number:" +
doc.getelementsbytagname ("no"). item (i) .getfirstchild (). getnodevalue ());
system.out.println ("Owner Address:" +
doc.getelementsbytagname ("addr"). item (i) .getfirstchild (). getnodevalue ());
}
} catch (exception e) {
E.printstacktrace ();
}

2.sax

myxmlreader.java
import org.xml.sax. *;
import org.xml.sax.helpers. *;
import javax.xml.parsers. *;
public class myxmlreader extends defaulthandler {
Java.util.stack tags=new java.util.stack ();
Public myxmlreader () {
super ();
}
Public static void main (string args []) {
long lasting=system.currenttimemillis ();
try {
Saxparserfactory sf=saxparserfactory.newinstance ();
Saxparser sp=sf.newsaxparser ();
Myxmlreader reader=new myxmlreader ();
Sp.parse (new inputsource ("data_10k.xml"), reader);
} catch (exception e) {
E.printstacktrace ();
}
system.out.println ("Run time:" + (system.currenttimemillis ()-lasting) + "ms");}
public void characters (char ch [], int start, int length) throws saxexception {
string tag=(string) tags.peek ();
if (tag.equals ("no")) {
System.out.print ("License plate number:" + new string (ch, start, length));
}
if (tag.equals ("addr")) {
system.out.println ("Address:" + new string (ch, start, length));
}
}
public void startelement (string uri, string localname, string qname, attributes attrs) {
tags.push (qname);}
}

3.jdom

myxmlreader.java
import java.io. *;
import java.util. *;
import org.jdom. *;
import org.jdom.input. *;
public class myxmlreader {
Public static void main (string arge []) {
long lasting=system.currenttimemillis ();
try {
Saxbuilder builder=new saxbuilder ();
Document doc=builder.build (new file ("data_10k.xml"));
Element foo=doc.getrootelement ();
List allchildren=foo.getchildren ();
For (int i=0;i<allchildren.size ();i ++) {
system.out.print ("License plate number:" +
((element) allchildren.get (i)). getchild ("no"). gettext ());
system.out.println ("Owner Address:" +
((element) allchildren.get (i)). getchild ("addr"). gettext ());
}
} catch (exception e) {
E.printstacktrace ();
}
}

4.dom4j

myxmlreader.java
import java.io. *;
import java.util. *;
import org.dom4j. *;
import org.dom4j.io. *;
public class myxmlreader {
Public static void main (string arge []) {
long lasting=system.currenttimemillis ();
try {
File f=new file ("data_10k.xml");
Saxreader reader=new saxreader ();
Document doc=reader.read (f);
Element root=doc.getrootelement ();
Element foo;
For (iterator i=root.elementiterator ("value");i.hasnext () {
foo=(element) i.next ();
system.out.print ("License plate number:" + foo.elementtext ("no"));
system.out.println ("Owner address:" + foo.elementtext ("addr"));
}
} catch (exception e) {
E.printstacktrace ();
}
)
  • Previous Android Custom Rounded ImageView Control
  • Next Android code to write controls instead of XML simple examples