| XML - XML is a lot like the ubiquitous plastic containers of Tupperware® | | Print | |
XML is a lot like the ubiquitous plastic containers of Tupperware®. There is really no better way to keep your food fresh than with those colorful, airtight little boxes. They come in different sizes and shapes so you can choose the one that fits best. They lock tight so you know nothing is leaking out and germs can't get in. You can tell items apart based on the container's color, or even scribble on it with magic marker. They're stackable and can be nested in larger containers (in case you want to take them with you on a picnic). Now, if you think of information as a precious commodity like food, then you can see the need for a containment system like Tupperware®.
XML contains, shapes, labels, structures, and protects information. It does this with symbols embedded in the text, called markup. Markup enhances the meaning of information in certain ways, identifying the parts and how they relate to each other. For example, when you read a newspaper, you can tell articles apart by their spacing and position on the page and the use of different fonts for titles and headings. Markup works in a similar way, except that instead of spaces and lines, it uses symbols.
Markup is important to electronic documents because they are processed by computer programs. If a document has no labels or boundaries, then a program will not know how to distinguish a piece of text from any other piece. Essentially, the program would have to work with the entire document as a unit, severely limiting the interesting things you can do with the content. A newspaper with no space between articles and only one text style would be a huge, uninteresting blob of text. You could probably figure out where one article ends and another starts, but it would be a lot of work. A computer program wouldn't be able to do even that, since it lacks all but the most rudimentary pattern-matching skills.
XML's markup divides a document into separate information containers called elements. Like Tupperware® containers, they seal up the data completely, label it, and provide a convenient package for computer processing. Like boxes, elements nest inside other elements. One big element may contain a whole bunch of elements, which in turn contain other elements, and so on down to the data. This creates an unambiguous hierarchical structure that preserves all kinds of ancillary information: sequence, ownership, position, description, association. An XML document consists of one outermost element that contains all the other elements, plus some optional administrative information at the top.
Example of typical XML document containing a short telegram.
<?xml
version="1.0"?>
<!DOCTYPE
telegram SYSTEM "/xml-resources/dtds/telegram.dtd">
<telegram
pri="important">
<to>Sarah Bellum</to>
<from>Colonel Timeslip</from>
<subject>Robot-sitting
instructions</subject>
<graphic
fileref="figs/me.eps"/>
<message>Thanks for watching my robot
pal
<name>Zonky</name> while I'm
away.
He needs to be recharged
<emphasis>twice a
day</emphasis> and if he starts to
get cranky,
give him a quart of oil. I'll be back soon,
after I've tracked down that evil
mastermind <villain>Dr. Indigo
Riceway</villain>.
</message>
</telegram>
Can you tell the difference between
the markup and the data? The markup symbols are delineated by angle brackets
(<>). <to> and </villain> are two such symbols, called tags. The data, or content, fills the
space between these tags. As you get used to looking at XML, you'll use the
tags as signposts to navigate visually through documents.
At the top of the document is the
XML declaration, <?xml
version="1.0"?>. This
helps an XML-processing program identify the version of XML, and what kind of
character encoding it has, helping the XML processor to get started on the
document. It is optional, but a good thing to include in a document.
After that comes the document type
declaration, containing a reference to a grammar-describing document, located
on the system in the file /xml-resources/dtds/telegram.dtd. This is
known as a document type definition
(DTD). <!DOCTYPE...> is one example of a type of markup called a declaration. Declarations are used to constrain
grammar and declare pieces of text or resources to be included in the document.
This line isn't required unless you want a parser to validate your document's
structure against a set of rules you provide in the DTD.
Next, we see the <telegram> tag. This is the start of an element. We say that the
element's name or type (not to be confused with a data type) is
"telegram," or you could just call it a "telegram element."
The end of the element is at the bottom and is represented by the tag </telegram> (note the slash at the beginning). This element contains
all of the contents of the document. No wonder, then, that we call it the document
element. (It is also sometimes called the root element.) Inside, you'll see more elements with
start tags and end tags following a similar pattern.
There is one exception here, the
empty tag <graphic.../>, which represents an empty element. Rather than containing
data, this element references some other information that should be used in its
place, in this case a graphic to be displayed. Empty elements do not mark
boundaries around text and other elements the way container elements do, but
they still may convey positional information. For example, you might place the graphic inside a mixed-content element, such as the message element in the example, to place the graphic at that
position in the text.
Every element that contains data has
to have both a start tag and an end tag or the empty form used for graphic. (It's okay to use a start tag immediately followed by an
end tag for an empty element; the empty tag is effectively an abbreviation of
that.) The names in start and end tags have to match exactly, even down to the
case of the letters. XML is very picky about details like this. This pickiness
ensures that the structure is unambiguous and the data is airtight. If start
tags or end tags were optional, the computer (or even a human reader) wouldn't
know where one element ended and another began, causing problems with parsing.
From this example, you can see a
pattern: some tags function as bookends, marking the beginning and ending of
regions, while others mark a place in the text. Even the simple document here
contains quite a lot of information:
A piece of text starts in one place and ends in another. The tags <telegram> and </telegram> define the start and end of a collection of text and
markup.
Roles
What is a region of text doing in
the document? Here, the tags <name> and </name> give an obvious purpose to the content of the element: a
name, as opposed to any other kind of inline text such as a date or emphasis.
Positions
Elements preserve the order of their
contents, which is especially important in prose documents like this.
Containment
The nesting of elements is taken
into account by XML-processing software, which may treat content differently
depending on where it appears. For example, a title might have a different font
size depending on whether it's the title of a newspaper or an article.
Relationships
A piece of text can be linked to a
resource somewhere else. For instance, the tag <graphic.../>
creates a relationship (link) between the XML fragment and a file named me.eps.
The intent is to import the graphic data from the file and display it in this
fragment.
| Users' Comments (0) |
|
No comment posted




