 |
Fast XML Access in .NET Using the XMLTextReader Class
XmlTextReader, contained in the .NET Framework's System.XML namespace, reads data from an XML file quickly without placing high demands on system resources. Use it to read data from an XML file and output it as HTML for display in a browser.
|
by Peter G. Aitken
 |
What you need:
Some knowledge of the .NET Framework (particularly ASP.NET), as well as XML, HTML, and the C# language.
|
 |
|
icrosoft's .NET Framework offers lots of goodies for developers. With the ever-increasing importance of XML, you would expect a full set of powerful XML tools to be included. And you'd be right; the .NET Framework does not disappoint. Grouped together in the System.XML namespace are the following classes:
- XmlTextReader provides fast, forward-only, non-cached access to XML data. (Forward-only means you can read the XML file from beginning to end but cannot move backwards in the file.)
- XmlValidatingReader is used in conjunction with the XmlTextReader class to provide the capability for DTD, XDR, and XSD schema validation.
- XmlDocument implements random, cached access to XML data following both Level 1 and Level 2 of the W3C Document Object Model specification. Level 1 contains the most fundamental parts of the DOM, while Level 2 adds various enhancements, including support for namespaces and cascading style sheets.
- XmlTextWriter generates XML documents that conform to the W3C XML 1.0 specification.
This article describes the first of these classes, XmlTextReader, which is designed to read data from an XML file quickly without placing high demands on the system's resources (mainly memory and CPU time). The class works by stepping though the XML file one node at a time under the control of the parent program. At each node in the XML file, the program can determine the type of the node, its attributes and data (if any), and other node information. Based on this information the program can process the node or ignore it, as dictated by the needs of the application. This is called a pull processing model because the program requests, or pulls, each node from the XML file and then deals with it (or not) as needed.
The XmlTextReader class can be compared to the Simple API for XML, or SAX, another technique for reading XML data that is very popular with programmers. XmlTextReader and SAX are similar in that they both provide fast access to XML data without placing heavy demands on resources. In contrast to XmlTextReader's pull model, however, SAX uses a push model, in which the XML processor uses events to inform the host program that node data is available, and the program can respond to these events (or not) as required. In other words, the data is pushed from the SAX processor to the host. Programmers will argue for hours over whether the pull model or the push model is superior, but the bottom line is that both work perfectly well. SAX is not supported by the .NET Framework, but you can use existing SAX tools, such as the MSXML parser, in your .NET programs.
The XmlTextReader class has several constructors that allow for various situations, such as reading from an existing stream or from a URL. Most often, perhaps, you will want to read XML from a file, and there's a constructor for this as well. Here's an example (all my code examples use the C# language, but should be easily translated into Visual Basic if that is your preference):
XmlTextReader myReader;
myReader = New XmlTextReader("c:\\data\\sales.xml")
Set up a repeating loop that calls the Read() method. This method returns True until the end of the file has been reached, then it returns False. In other words, the loop starts at the beginning of the file and reads all the nodes, one at a time, until the end of the file is reached:
While (myReader.Read()) {
...
// Process each node here.
...
}
After each successful call to Read(), the XmlTextReader instance contains information about the current node (the node just read from the file). You obtain this information from the XmlTextReader's members, as described in Table 1. You can tell the type of the current node by the NodeType property. Based on the type of node, your code can read its data, check if it has attributes, ignore it, or do whatever is appropriate to the needs of the program.
When using the NodeType property, it is important to understand how nodes relate to XML elements. For example, look at the following XML element:
<city>Chicago</city>
The XmltextReader sees this element as three nodes, in the following order:
- The <city> tag is read as a type XmlNodeType.Element node. The name of the element, "city," is available in the XmlTextReader's Name property.
- The "Chicago" text data is read as a type XmlNodeType.Text node. The data "Chicago" is available in the XmlTextReader's Value property.
- The </city> tag is read as a type XmlNodeType.EndElement node. Again, the name of the element, "city," is available in the XmlTextReader's Name property.
These are three of the important node types. Other types are detailed in the .NET documentation.
If the XmlTextReader encounters an error, such as an XML syntax violation, it throws an exception of type System.Xml.XmlException. Code that uses this class should always be protected (inside a Try...Catch block), as you can see in the demo program later.
XmlTextReader in Action
This has been a rather simple introduction to the XmlTextReader class. It has many members that cannot be covered here and provides a great deal of flexibility when it comes to reading XML. Even so, I have presented enough information for you to create a program that demonstrates a task that is frequently needed in the real world, namely to read data from an XML file and output it as HTML for display in a browser.
| |
 |
|
| |
Figure 1. Browser display of the HTML output created by the script. (Click to enlarge.)
|
This ASP.NET program (script) runs on the server to generate an HTML page that is returned to the browser. The script is shown in Listing 1 and the XML data file it works with appears in Listing 2. You can see that this XML file contains a list of contacts; the goal of the program is to display the list formatted for easy viewing. Figure 1 shows the output of the program viewed in Internet Explorer.
To run the program:
- Save Listing 1 as XmlTextReader.aspx and Listing 2 as XmlData.xml. (Or just download the code here.)
- Placing both files in a virtual folder of a Web server with the .NET framework installed.
- Open Internet Explorer and navigate to the .aspx file. On a local server, for example, the URL will be http://localhost/xmltextreader.aspx.
The bulk of the program's work is done in the XmlDisplay class, specifically in the ProcessXml() method. It reads the XML data one node at a time. For the elements of interest, the node name followed by a colon and the node data are written to the output, along with the appropriate HTML formatting tags. At this stage, the "output" consists of a StringBuilder object where the HTML text is stored temporarily.
The ProcessXml() method is called from the LoadDocument() method. This method performs the tasks of creating an XmlTextReader instance and loading the XML file before calling ProcessXml. It also deals with exceptions, creating an informative error message to be displayed in the browser. Finally the method returns a string containing either the generated HTML or, if an exception occurred, the error message.
The program execution begins in the Page_Load() procedure, which is executed automatically when the page is requested by a browser. Code here instantiates the XmlDisplay class and calls its LoadDocument() method. The return value, which will be the formatted HTML if everything works properly, is copied into a <div> tag on the page. The resulting HTML document is returned to the browser for display.
What about the XmlDocument class, the .NET Framework's other class for reading XML? It differs from the XMLTextReader class by creating a node tree of the entire XML document in memory. This permits random access to the XML data (as opposed to linear access with XmlTextReader) and also provides complete flexibility when it comes to modifying the data and structure of the XML file. In addition, XmlDocument permits XSLT transformations to be carried out. These extra capabilities come at the cost of slower operation and greater consumption of system resources.
Peter G. Aitken has been writing about computers and programming for over 10 years, with some 30 books and hundreds of articles to his credit. Recent book titles include Developing Office Solutions With Office 2000 Components and VBA, Windows Script Host, and the soon to be published XML the Microsoft Way. He is a regular contributor to OfficePro magazine, and for several years was a contributing editor for Visual Developer Magazine where he wrote the popular Visual Basic column. Peter is the proprietor of PGA Consulting, providing custom application and Internet development to business, academia, and government since 1994. You can reach him at peter@pgacon.com.
|
|
FEATURE SOFTWARE:
SQLDC
Create SQL applications quickly and easily. Buy Now!
FEATURE BOOK:
VBCommander
Speed app and component design time with this set of more than 20 add-in tools.
Buy Now!
|
|
DevX Guide to .NET
is your jumping-off-point to all the .NET coverage by DevX and Fawcette Technical Publications. Read our expert overviews, technical articles, and insightful editorials! |
|