| |
XML: Leading the March to Web Services
http://www.devx.com/javaSR/articles/jones/jones-1.asp
The concept of Web services wasn't so much a good idea that people backed with new technology, but the logical, inevitable result of the creation of a universal file format. Find out why XML is the genesis of just about everything there is to love about future Web-based business and commerce.
by A. Russell Jones, Executive Editor
|
| |
Web services are getting a lot of attention lately, but it's worth noting that nothing about Web services would have been possible without the rapid evolution of the underlying enabling technology, XMLan evolution that led directly and inevitably to Web services.
It's amazing that the simple ideas behind XMLtagging content with names and arranging it hierarchicallyhave led to such rapid and resounding change in the risk-averse business community. And Web services are only one facet of XML's evolution.
|
|
XML is a revolution in data processingnot because it's a more efficient way to store contentbut because it was the first technology that could adequately address the intimidating requirements of complex data. XML embodies several ideas that, up to now, were lacking or difficult to achieve in other data storage implementations:
- information and its relationship to other information is dynamic; therefore you must easily be able to update not only the data content but also the data relationships. XML is excellent at managing relationships among constantly-changing data.
- information is often irregular or hierarchicalmuch useful information relies either on its parent or child relationship with other information, or has no easily-defined relationship to any other part of an information store. XML handles both irregular and hierarchical data with ease.
- meta-information is needed to explain not only the data type, but also the context and (at its best) the meaning of information in the data store. Meta-information is intrinsic to the XML format.
These advantages drove the rapid adoption of XML, which in turn paved a direct path to the looming potential of Web services. As we enter what will undoubtedly be a watershed year for Web services adoption, the signposts along the path to its creation are noteworthyboth historically and as a guide to future change that will arise from the evolution of XML.
The big advantages of XML documents are that the syntax and formatting rules are very simple, that XML is text-based, and that it is an accepted standard. But it was the ability for disparate systems and programs to read a common, universal file format that provided the first major step in the evolution of Web services.
|
|
Step 1: Creation of a Universal File Format
Using a very simple text-based markup mechanism, XML lets programs determine what the data is and how individual data items relate to each other. It's such a simple idea that even humans can read XML with few problems. You can take almost any information store and see how you might improve it using XML. For example, most people need to keep lists of contact names and addresses. But such lists tend to grow quickly in complexity, from simple flat lists into intricate hierarchies that will accommodate grouping of contacts by organization, product, location, or any other higher-level organizational category.
Before XML, programmers who wanted to create hierarchical information had two realistic choices: Create a proprietary file format or use a relational database. Fortunately for every computer user, the days of proprietary text file formats are just about over. Relational databases are powerful, but they are usually expensive and difficult to distribute. And they are simply overkill for many small applications that need database-like searching and sorting capabilities.
Before XML, contact information might have been stored in a comma-delimited text file similar to this (I've included the field names to make the data marginally readable, even though many data file formats do not include that information):
--CATEGORIES--
"CategoryID","CategoryName","CategoryParentID"
1,"My Company",0
2,"My Department",1
3,"Customers",0
… more categories
--CONTACTS--
"ContactID, Lastname","Firstname","Email","Date",CategoryID
1, "Doe","John","jdoe@mycompany.com",11/02/2001",1
2, "Someone","Bob","bsomeone@mycompany.com","11/02/2001",2
3, "Someoneelse","Bob","belse@mycompany.com","11/02/2001",2
4, "Doe","Jane","jndoe@mycompany.com","12/01/2000",1
5, "Jones","Fred",fjones@someplace.com","02/02/2002",3
... more contacts
Looking at the file format, it's difficult to tell exactly where or how contacts would appear in a list. In contrast, the XML representation looks wordy, but it's very easy to see that the contacts are arranged in categories containing contacts, that there are two top-level categories, and that the category named "My Company" contains a sub-category named "My Department."
<contacts>
<categories>
<category id="1" name="My Company">
<contact id="1" lastname="Doe" firstname="John"
email="jdoe@mycompany.com" date="11/02/2001" />
<contact id="4" lastname="Doe" firstname="Jane"
email="jndoe@mycompany.com" date="12/01/2000" />
<category id="2" name="My Department">
<contact id="2" lastname="Someone" firstname="Bob"
email="bsomeone@mycompany.com"
date="11/02/2001" />
<contact id="3" lastname="Someoneelse" firstname="Bob"
email="belse@mycompany.com"
date="11/02/2001" />
</category>
</category>
<category id="3" name="Customers">
<contact id="5" lastname="Jones" firstname="Fred"
email="fjones@someplace.com" date="02/02/2002" />
</category>
</categories>
</contacts>
When XML emerged, large companies and open source groups quickly invested in building robust XML parsers. Because the format rules are extremely simple, XML parsers are not only fairly fast, but any XML parser can read any XML document that conforms to the rules (is "well-formed"). Thus, for the first time in computing history, a program or individual can create a hierarchical well-formed document with completely customized meta-information markup, and another programeven one running on a different operating system and built with a different programming languagecan read the tags and content from that document using any XML parserwithout even knowing in advance what the file contains. Although that capability is useful in many situations, it's absolutely critical on the Weband in particular, to Web serviceswhere companies cannot control the system choices of their information sources, business partners, and customers.
The big advantages of XML documents are that the syntax and formatting rules are very simple, that XML is text-based, and that it is an accepted standard. But it was the ability for disparate systems and programs to read a common, universal file format that provided the first major step in the evolution of Web services.
Step 2: The Need for Data Types and Schema
Until recently, relational databases had one huge advantage over XML, which is that they store strongly typed data. If you look at the sample XML document shown above, you'll see that it consists of text items, numeric items, and date items, all stored as quoted strings. When you retrieve an item from a database, you almost always know its data type.
| Schemas are like a contract, because they both codify the format of documents that systems on either end of a transaction can use and guarantee that the documents themselves adhere to a specific format and content. |
|
From its progenitor, the Standard Generalized Markup Language (SGML), XML inherited the concept of creating an associated document that defines the rules for tags in other documents, called a Document Type Definition, or DTD. The concept was powerful, but DTDs were difficult to parse (because they weren't XML documents) and had some other weaknesses. A W3C-recommended replacement technology, called XML Schema, maintained the DTD concept that one document can define the content of other documents but changed the defining document's format to XML. With XML Schema, a standard parser could read both the defining document, or schema, and the data document itself.
XML Schema documents define a common set of data type names, such as string, integer, and date, so you can create a schema to accompany your XML document that defines the data types that each tag or attribute can hold. By reading the schema first, parsers can then cast data values to the correct types as they read the data in associated documents.
| W3C-recommended replacement technology, called XML Schema, maintained the DTD concept that one document can define the content of other documents but changed the defining document's format to XML. With XML Schema, a standard parser could read both the defining document, or schema, and the data document itself. |
|
In addition, schema documents define boundaries for the order and arrangement of tags within conforming XML documents, the range or set of valid values for attributes, and the minimum/maximum occurrence of tags and attributes. Parsers that can read XML schema and compare the content of XML documents with a schema are called "validating" parsers; an XML document that conforms to the rules of its schema is called a "valid" document.
In other words, after you have the schema for an XML document you can ensure not only that the document is well-formed, but also that it is valid. That's extremely important, because it means that programs using validating XML parsers can pinpoint bad data in XML documentswithout the program knowing anything specific about the document in advance. Validation capabilities can reduce or sometimes even eliminate the need to write custom validation code.
Schemas were the second step in the evolution of Web services. Schemas are like a contract, because they both codify the format of documents that systems on either end of a transaction can use, and guarantee that the documents themselves adhere to a specific format and content. Perhaps most important, schemas provide a way for machines to perform codification and validation processes generically and without human intervention.
Step 3: Achieving Firewall Transparency
Once you have XML documents and XML schemas, it isn't a huge conceptual leap to imagine passing typed, structured information generically between disparate systems. So long as systems at both end of the transaction can translate the data types specified by XML schema from a text representation into a native machine representation, there is nothing to constrain the sharing of data via XML. An XML document created on one system can be read accurately by a program running on a different system.
| While binary format is fine for trusted connections, most companies are unwilling to let binary data flow freely into and out of the company over port 80the standard port used by HTTP. In contrast, it's relatively safe to let text information flow back and forth. And that's exactly how XML Web services behave. |
|
But passing XML across the Web has inherent inefficiencies because it must be translated at both ends of the connection. In contrast, other types of distributed applications, such as financial systems, object remoting, and terminal emulation programs, were initially conceived when bandwidth was very limited and therefore were designed to transmit information as efficiently as possible. Further, these systems generally transmit non-textual data in binary formats, which are more efficient because they require less translation, and because the formats typically contain less meta-information. But while a binary format is fine for trusted connections, most companies are unwilling to let binary data flow freely into and out of the company over port 80the standard port used by HTTP. In contrast, it's relatively safe to let text information flow back and forth. And that's exactly how XML Web services behave. They are first translated on the sender's side into a text representation, transmitted via HTTP, and then retranslated on the receiver's side back into the original form. Translating to and from text is safe, but inefficient.
| To be truly useful, Web services need to handle not only pure datanumbers, text, and datesbut should also handle data bound up in objects. Serialization to XML solves this problem. |
|
To be truly useful, Web services need to handle not only pure datanumbers, text, and datesbut should also handle data bound up in objects. Serialization to XML solves this problem. In combination with a schema, you can take any object and serialize (save) it to XML, pass it to a remote machine and then deserialize it, effectively recreating the object remotely. Obviously, the programs involved need to understand how to use the objects, but object serialization via XML solves the problem of a common object representation very nicelyexactly the problem that Distributed COM (DCOM) and CORBA were intended to solve. In some respects, Web services are a step backward: DCOM and CORBA have capabilities that Web services and XML still do not have. But it's just a matter of time until Web services catch up.
Despite lingering inefficiencies, XML's text format and its ability to represent serialized objects are the third step in the evolution of Web services. Using existing Web servers, companies can pass XML-formatted data and serialized objects through their firewalls using the ubiquitous HTTP over port 80a port which is already both exposed (due to the Web) and protected (by firewalls, routers, proxies, and virus scanners) at most companies because of the nature of the Web.
The Final Step: Invoking Remote Components Using SOAP
Taken altogether, XML schema, XML documents, and object serialization to XML, the final step toward Web services is rather obvious. If you could only get companies to agree on a schema, then one machine could take advantage of code running on any other machine accessible via the Web by sending an HTTP-encoded, XML-based message to a "listener" on the receiver's Web server. The listener would read the message, pass any enclosed message data to the appropriate program or component, and then encode the response in XML and return that to the calling machine, which would parse the return message and extract the response. The message format would need to be simple and flexible, as well as easy to create, parse, and understand; yet robust enough to be useful for arbitrary messages. Of course, that step was taken, and the result is the Simple Object Access Protocol (SOAP). You can find reams of information explaining the SOAP "wrappers" elsewhere, but you don't need to know the details to appreciate the fact that XML is the enabling technology.
| If you could only get companies to agree on a schema, then one machine could take advantage of code running on any other machine accessible via the Web by sending an HTTP-encoded, XML-based message to a "listener" on the receiver's Web server. |
|
You can invoke the methods of remote components by sending a SOAP message and parsing the return message, but you must first know the location, the method names, parameter types, and return types of the methods you want to invoke. But how can you find available Web services that match your needs? And once located, how can you discover their method names and parameter types?
XML comes to the rescue again. Two other emerging XML-based standards, Web Service Description Language (WSDL) and Universal Description, Discovery and Integration (UDDI), provide Web service discovery and description services. UDDI is a specification for building XML-document based repositories for finding Web services. Companies store descriptions of their Web services in a repository, and potential customers find the Web services by searching the repository for specific business types, service types, or keywords. You can write a program to search the repository for the location of Web services that match your needs by searching the Web service descriptions available in the UDDI repository. Thus UDDI repositories solve the first problemlocation. After finding a Web service, a WSDL document describes the public interfaces to the servicethe method names, parameter types, and the return types.
| It has been interesting to see how, with few exceptions (and in contrast to the usual foot-dragging on standards), companies throughout the world stampeded toward agreement on SOAP with unprecedented speed. |
|
SOAP obviously fills an immediate need. It has been interesting to see how, with few exceptions (and in contrast to the usual foot-dragging on standards), companies throughout the world stampeded toward agreement on SOAP with unprecedented speed. SOAP and XMLand thus Web serviceshave been adopted and put into production more quickly than any other computer technologyeven more quickly than HTML and Java. And the stampede is just beginning. Over the next few years, although the term "Web services" will probably fade into the background as the use of XML wrappers (such as SOAP and XML-RPC) for calling remote procedures become a standard part of interaction between systems, the technology itself will not.
As you can see, XML is the real hero behind Web services. Without XML, Web services wouldn't even exist. It's amazing that the simple ideas behind XMLtagging content with names and arranging it hierarchicallyhave led to such rapid and resounding change in the risk-averse business community. And Web services are only one facet of XML's evolution.
A. Russell Jones can be reached at rjones@devx.com.
|
|