Tools of the Trade: Part Three of a Series. XML Transformation

Home/Publications/Best Practices Newsletter/2002 – Best Practices Newsletter/Tools of the Trade: Part Three of a Series. XML Transformation


April 2002

Tools of the Trade: Part Three of a Series. XML Transformation

CIDMIconNewsletter David Walske, David Walske, Inc.

Bite the Wax Tadpole

A classic Ricky and Lucy moment in the film, The Long Long Trailer, finds Lucy playing navigator to Ricky as he sits behind the wheel with a very long travel trailer in tow. As they approach a critical juncture in their travels, Lucy is not sure which way they should turn. Ricky becomes increasingly agitated as they draw ever nearer to the intersection, begging, pleading, demanding direction. “Turn left or turn right?! Turn left or turn right?!! Turn left or turn right?!!!” Arriving at the point of no return, Lucy at last makes up her mind and suddenly shouts “Turn left right now,” causing Ricky to skid wildly into a right turn.

Lucy’s instruction brought about the opposite result of what she had intended. Both navigator and driver were speaking in a common language and clearly understood the nomenclature particular to driving and navigation. Yet there was a complete breakdown of communication.

Mark Pendergast documents another classic case of miscommunication in his book, For God, Country and Coca-Cola: The Unauthorized History of the Great American Soft Drink and the Company That Makes It. According to Pendergast, when Coca-Cola was first brought to China the name of the product was transliterated as Ke-kou-ke-la. Not until millions of dollars worth of marketing materials had been produced was it discovered that Ke-kou-ke-la translates roughly to “Bite the wax tadpole.”

Where there is translation, there is miscommunication. Tools such as Adobe FrameMaker+SGML and SoftQuad XMetaL allow you to work in a view that approximates the final output-or more accurately, one of many possible outputs. Whenever a tool offers such a WYSIWYG (what you see is what you get) view, some kind of translation is occurring between the native code of the document and what you see displayed.

XMetaL saves XML files using their native XML but must necessarily employ a set of Extensible Stylesheet Language (XSL) rules and other support files to produce a suitable WYSIWYG editing view. Many XML implementations involve an XSL transformation (XSLT) of content to one or more output formats before publishing. Whatever tools and processes you use to edit, manage, and publish structured documents, you’ll need effective tagged language translation to ensure that your documentation base doesn’t “Bite the wax tadpole.”

This, the third article in the series explores how structured content is translated and transformed. In the first two articles of this series, we discussed divining inherent structure in existing unstructured documentation, developing a structural specification or schema, and using FrameMaker+SGML to convert unstructured content to structured files. In this article, we’ll see how tools such as Extensibility XML Authority, SoftQuad XMetaL, and Architag XRay XML Editor can be used to edit and transform XML content for publishing in a display language such as HTML. (See All in the Family in Figure 1.)



Figure 2: SGML, more than just an idle patriarch, is in active use today.

Casting XML

One of the earliest HTML editors to transcend simple plain text editing is SoftQuad’s HoTMetaL. The program name and its unusual capitalization is meant to draw a romantic association between modern HTML display technology and the era of the Guttenberg press in which moveable type was cast in molten lead-sometimes referred to as “hot metal.” HoTMetaL was an instant success. Web page authors like how its visual editing window relieves the drudgery of the more repetitive and mundane aspects of HTML coding. With the advent of XML came the release of XMetaL. While there are many other available XML editors today, XMetaL remains a popular and reasonably priced choice.

Crossing the generation gap
Okay, we’re finished with the history lesson. Get ready to cross over from SGML to XML. Let’s begin by examining an XML version of the document, “Installing the PrintRight Laser Printer Optimizer.” This is the same sample content for a fictional program that we worked with in article number two of this series, “The Black Box: Converting Legacy Documentation.” We’ll start with the FrameMaker+SGML file that we finished with in the last article: Because this file is in FrameMaker binary SGML format, we’ll need to translate it to XML. Fortunately with FrameMaker+SGML 6.0, we have the option of saving documents as XML. We’ll save the file as PrintRight.xml and then view it in Notepad.

The first two lines are XML processing instruction (PI) lines. The first of these PI lines is the declaration. It identifies the file as an XML document and specifies the Unicode UTF-8 character-encoding format. The Unicode designation UTF-8 specifies standard ASCII plain text characters. We’ll discuss the second PI line shortly. The remainder of the file consists of chunks of content encapsulated in element tags. While there are distinct differences between XML and SGML, so far, we’re not seeing anything all that much different than what we’ve seen in native SGML.

Validating both syntax and structure
When an SGML or XML file conforms to standard coding syntax conventions, it is said to be well formed. When an SGML or XML file follows the rules of structure as defined in an associated schema, the file is said to be valid. Binary FrameMaker+SGML files should always be validated against an Element Definition Document (EDD). The tagged-text files of native SGML should always be validated against a Definition Type Document (DTD).

When working with XML files, it is not considered mandatory to validate structure. So long as the code follows proper syntax, many XML applications are able to process the file. However, this author highly recommends that you always validate your XML content against a DTD or schema. Document validation is the key to maintaining the orderliness that draws us to structured languages in the first place.

Before we begin working with our sample content as XML, we’ll create a DTD and associate it with the document file. We’ll produce our DTD by sampling the content using XML Authority. This program scans the XML file, deduces its structure based on the arrangement of its elements, displays a graphical depiction of the schema, and allows us to output a ready-to-use XML DTD. (See the figure.)


Figure 3: XML Authority scans the XML file, deduces its structure based on the arrangement of its elements, displays a graphical depiction of the schema, and allows us to output a ready-to-use XML DTD.

Now that we’ve created an XML DTD for our file, we need to link our sample content file to this new DTD. We’ll use Notepad to insert the following PI line immediately after the XML declaration:

<! DOCTYPE Install SYSTEM “printright.dtd”>

This PI line, known as the doctype statement, begins with the name of the highest-level element: Install. The rest of the line indicates the name of the associated DTD preceded by the type of notation used to describe its location. In this case, SYSTEM refers to the fact that printright.dtd is a file on the local computer’s file system. The DTD location can also be described as PUBLIC, in which a DTD on a remote computer or an alias to a file on the local system is specified. (See Is it a schema or is it a schema?.)


Figure 4

First look at XMetaL
XMetaL has three standard editing views: Normal, Tags On, and Plain Text. The Normal view provides a WYSIWYG visual editing mode. The Tags On view provides a similar WYSIWYG mode but adds graphical markers that indicate the location of the element boundaries. The Plain Text view provides the same type of view we saw in Notepad. Now we’ll open our sample XML file in XMetaL. By default, XMetaL opens in the Normal view. (See the figure.)


Figure 5: The Normal view provides WYSIWYG visual editing.

The left pane displays the elements and their hierarchical arrangement. The right pane shows a visual editing display. But note that the rendering of the text is not exactly as we would expect or want it to be. The bulleted lists appear without bullets of any kind, and the numbered list of installation instructions appears without sequential numbering. A Cascading Stylesheet (CSS) determines the formatting of the content displayed in the right pane. The first PI line of our XML file specifies that the PrintRight.css stylesheet should be employed in rendering our XML content. Where did this come from? FrameMaker automatically created the PI line and the CSS file that it references when we exported our content from a FrameMaker+SGML file to an XML file. This automatically generated CSS file is only a rough approximation of possible formatting for the content. The rendering of the XML that it produces is not necessarily the final output of our content. Remember, ideally we’ll be transforming the XML code to some more ubiquitous display language such as HTML before publishing. Even so, for ease of editing it might be important to display this content more accurately in the XMetaL normal view. We’ll edit the CSS file and then refresh the right pane display. (See the figure.)

CSS232 Screenshot0436

Figure 6: CSS determines the formatting of the content in XMetaL.

Now that’s more like it. Remember, this is still an approximation of the final output. The display language to which we output the content will ultimately determine the final rendering of the published content.

Transforming XML with XSL

Now we’re ready to output our content for publication. To do this, we’ll use an Extensible Stylesheet Language Transformation (XSLT) to create an HTML file for display in virtually any Web browser. We could choose any number of output formats, including RTF or MIF if we wanted to output content for print media.

Because much of the processing performed by XSLT depends upon predictable structure, it is vitally important that authors maintain good structure in their writing. Following an agreed-upon information model is the best assurance that consistent structure is maintained.

We’ll create an XSLT stylesheet to extract the content from the XML file and write it out to an HTML file. Each bit of content from each XML element is isolated, wrapped in HTML code, and written out to an HTML file. There are several programs for use with XML and XMLT stylesheets. We’ll be using Architag XRay XML Editor. This tool allows us to create, edit, and process XML transformations in real time. As you edit the template or XML content file, the XML to HTML transformation is updated instantly. (See the figure.)


Figure 7: Architag XRay XML Editor allows us to create, edit, and process XML transformations in real time.

Using any program or process designed to apply XSLT, we can create a perfect HTML or other format, output file from any XML data file. But XSLT can do far more than just put a pretty face on our content. The powerful XSLT template coding language allows us to manipulate the output to serve any number of purposes. We can rearrange the order in which chunks of content appear in the final output. We can choose to suppress particular content altogether. And we can repeat content that appears only once in the original XML.

Imagine the applications in your documentation. For example, a single XML file could serve many flavors of a document. Our sample content referenced a Windows installation, but if our product were available for Microsoft Windows, Apple Macintosh, and Sun Solaris, we would need three different but similar sets of installation instructions. With XSLT, we could store all of the information in one XML file and then apply three different transformation templates to produce three unique output documents, each specific to one of the three platforms.

Further Exploration

This article has only scratched the surface of XML and XSLT. Subsequent articles in this series will investigate additional tools for harnessing the power of structured tagged text languages for content management and reuse. Future articles will also explore the XML Schema in greater depth. CIDMIconNewsletter

About the Author