Online
 
Thursday, 20 November 2008
 
 

XML - XML is a lot like the ubiquitous plastic containers of Tupperware® | Print |  E-Mail
 

XML is a lot like the ubiquitous plastic containers of Tupperware®. There is really no better way to keep your food fresh than with those colorful, airtight little boxes. They come in different sizes and shapes so you can choose the one that fits best. They lock tight so you know nothing is leaking out and germs can't get in. You can tell items apart based on the container's color, or even scribble on it with magic marker. They're stackable and can be nested in larger containers (in case you want to take them with you on a picnic). Now, if you think of information as a precious commodity like food, then you can see the need for a containment system like Tupperware®.

XML contains, shapes, labels, structures, and protects information. It does this with symbols embedded in the text, called markup. Markup enhances the meaning of information in certain ways, identifying the parts and how they relate to each other. For example, when you read a newspaper, you can tell articles apart by their spacing and position on the page and the use of different fonts for titles and headings. Markup works in a similar way, except that instead of spaces and lines, it uses symbols.

Markup is important to electronic documents because they are processed by computer programs. If a document has no labels or boundaries, then a program will not know how to distinguish a piece of text from any other piece. Essentially, the program would have to work with the entire document as a unit, severely limiting the interesting things you can do with the content. A newspaper with no space between articles and only one text style would be a huge, uninteresting blob of text. You could probably figure out where one article ends and another starts, but it would be a lot of work. A computer program wouldn't be able to do even that, since it lacks all but the most rudimentary pattern-matching skills.

XML's markup divides a document into separate information containers called elements. Like Tupperware® containers, they seal up the data completely, label it, and provide a convenient package for computer processing. Like boxes, elements nest inside other elements. One big element may contain a whole bunch of elements, which in turn contain other elements, and so on down to the data. This creates an unambiguous hierarchical structure that preserves all kinds of ancillary information: sequence, ownership, position, description, association. An XML document consists of one outermost element that contains all the other elements, plus some optional administrative information at the top.

 

Example of typical XML document containing a short telegram.

<?xml version="1.0"?>

<!DOCTYPE telegram SYSTEM "/xml-resources/dtds/telegram.dtd">

<telegram pri="important">

  <to>Sarah Bellum</to>

  <from>Colonel Timeslip</from>

  <subject>Robot-sitting instructions</subject>

  <graphic fileref="figs/me.eps"/>

  <message>Thanks for watching my robot pal

    <name>Zonky</name> while I'm away.

    He needs to be recharged <emphasis>twice a

    day</emphasis> and if he starts to get cranky,

    give him a quart of oil. I'll be back soon,

    after I've tracked down that evil

    mastermind <villain>Dr. Indigo Riceway</villain>.

  </message>

</telegram>

Can you tell the difference between the markup and the data? The markup symbols are delineated by angle brackets (<>). <to> and </villain> are two such symbols, called tags. The data, or content, fills the space between these tags. As you get used to looking at XML, you'll use the tags as signposts to navigate visually through documents.

At the top of the document is the XML declaration, <?xml version="1.0"?>. This helps an XML-processing program identify the version of XML, and what kind of character encoding it has, helping the XML processor to get started on the document. It is optional, but a good thing to include in a document.

After that comes the document type declaration, containing a reference to a grammar-describing document, located on the system in the file /xml-resources/dtds/telegram.dtd. This is known as a document type definition (DTD). <!DOCTYPE...> is one example of a type of markup called a declaration. Declarations are used to constrain grammar and declare pieces of text or resources to be included in the document. This line isn't required unless you want a parser to validate your document's structure against a set of rules you provide in the DTD.

Next, we see the <telegram> tag. This is the start of an element. We say that the element's name or type (not to be confused with a data type) is "telegram," or you could just call it a "telegram element." The end of the element is at the bottom and is represented by the tag </telegram> (note the slash at the beginning). This element contains all of the contents of the document. No wonder, then, that we call it the document element. (It is also sometimes called the root element.) Inside, you'll see more elements with start tags and end tags following a similar pattern.

There is one exception here, the empty tag <graphic.../>, which represents an empty element. Rather than containing data, this element references some other information that should be used in its place, in this case a graphic to be displayed. Empty elements do not mark boundaries around text and other elements the way container elements do, but they still may convey positional information. For example, you might place the graphic inside a mixed-content element, such as the message element in the example, to place the graphic at that position in the text.

Every element that contains data has to have both a start tag and an end tag or the empty form used for graphic. (It's okay to use a start tag immediately followed by an end tag for an empty element; the empty tag is effectively an abbreviation of that.) The names in start and end tags have to match exactly, even down to the case of the letters. XML is very picky about details like this. This pickiness ensures that the structure is unambiguous and the data is airtight. If start tags or end tags were optional, the computer (or even a human reader) wouldn't know where one element ended and another began, causing problems with parsing.

From this example, you can see a pattern: some tags function as bookends, marking the beginning and ending of regions, while others mark a place in the text. Even the simple document here contains quite a lot of information:

Boundaries

A piece of text starts in one place and ends in another. The tags <telegram> and </telegram> define the start and end of a collection of text and markup.

Roles

What is a region of text doing in the document? Here, the tags <name> and </name> give an obvious purpose to the content of the element: a name, as opposed to any other kind of inline text such as a date or emphasis.

Positions

Elements preserve the order of their contents, which is especially important in prose documents like this.

Containment

The nesting of elements is taken into account by XML-processing software, which may treat content differently depending on where it appears. For example, a title might have a different font size depending on whether it's the title of a newspaper or an article.

Relationships

A piece of text can be linked to a resource somewhere else. For instance, the tag <graphic.../> creates a relationship (link) between the XML fragment and a file named me.eps. The intent is to import the graphic data from the file and display it in this fragment.

An important XML term to understand is document. When you hear that word, you probably think of a sequence of words partitioned into paragraphs, sections, and chapters, comprising a human-readable record such as a book, article, or essay. But in XML, a document is even more general: it's the basic unit of XML information, composed of elements and other markup in an orderly package. It can contain text such as a story or article, but it doesn't have to. Instead, it might consist of a database of numbers, or some abstract structure representing a molecule or equation. In fact, one of the most promising applications of XML is as a format for application-to-application data exchange. Keep in mind that an XML document can have a much wider definition than what you might think of as a traditional document. The following are short examples of documents.

 

This entry was posted on . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a comment.
Users' Comments (0)

Comment an article
  Name
  E-mail
   Title
Available characters: 4000
 Notify me of follow-up comments
This image contains a scrambled text, it is using a combination of colors, font size, background, angle in order to disallow computer to automate reading. You will have to reproduce it to post on my homepage
Enter what you see:

No comment posted

Jumbo Coklat
 
Top! Top!