April 2004

Word as an XML Editor

CIDMIconNewsletter Tina Hedlund, Senior Consultant, Comtech Services, Inc.

It’s always interesting when Microsoft decides to throw its hat into the arena of standards-based technology. Usually its implementation isn’t standard at all and, due to its massive lock on the industry as a whole, tends to impede rather than propagate that standard.

When I saw Microsoft’s implementation of XML, I expected to feel the same sense of horror that I felt when I saw Word’s “Save as HTML” feature. I expected to see superfluous mark-up, which I like to call the Microsoft junk, that never seemed to add value to any of my work.

It turned out that I was wrong, and I was right.

Office 2003 has powerful XML functionality that can be harnessed to create powerful solutions, but it’s still burdened by the Microsoft “junk” and still isn’t a full solution for a technical publications organization.

Integrated Solutions

Microsoft is very proud of its integrated solutions in Word, which can, among other things, assist writers to create documents. In one demo, the Smart Document solution is an interactive CMS. The author types the word Microsoft, and every logo available for Microsoft immediately appears in a pane on the right of the screen. The same could occur with document fragments (or modules of information). It was incredible!

I was really impressed until I realized that it was all built on custom programming. Since the Word document was actually XML, programs could be written to recognize content and offer up options that existed in other databases. It wasn’t magic. And Word wasn’t really integral to the process. XML and the custom programming actually made this miracle possible (somewhat facilitated by Smart Document features).

Although impressive, I would never recommend this solution to anyone in need of a CMS. Custom programming is always expensive and can be compromised when Word is upgraded. If you don’t have dedicated programming staff and a lot of money already invested in disparate databases, forget about using Office as a CMS.


In their XML-enabled Word, Microsoft decided to validate XML through schemas rather than through document type definitions (DTDs). Validation of XML content is important because it allows you to design a document structure that must be followed by anyone using that schema or DTD. If a writer decides to put a procedure step inside an introduction, the editing software, such as Word, does not allow the author to save the document.

The defacto validation mechanism has until recently been the DTD, but it is slowly being replaced by the schema standard because of the schema’s ability to validate the content type more accurately. For instance, a DTD may specify that you can enter any text, but a schema may specify that you must enter a date in the format: 12/01/05.

Although not a huge barrier to anyone currently using DTDs, since you can easily convert a DTD to the schema standard, it does pose some barriers for groups wanting to single-source modular content. XInclude, a schema standard used to insert modular content by reference into a document, is not supported in Word.

Word also allows you to turn off validation. With validation on, a user can choose to save an invalid XML document in the Word binary format, but not as XML. With a click of a mouse, however, a user can turn validation off. Word makes nary a beep when a user saves the invalid XML.


WordML-or as I like to call it, the Microsoft “junk”-can fortunately be turned off in the Professional and Enterprise editions of Office using Windows 2000 and Windows XP. In less expensive versions, WordML is the only form of XML that a user can create.

WordML, or Word (processing) Markup Language, tags content based on the style applied to it. The mark-up merely specifies how content should look. For example, this previous sentence would look like this in WordML:

<w:pStyle w:val=”BodyText”/>

<w:proofErr w:type=”gramEnd”/>

<w:r><w:t>The mark-up merely specifies how content should look.

Each WordML tag is preceded by a “w:” and then indicates what style I had applied to it and other tags related to the grammar/spell check. These are not tags that I created in a DTD or schema.

WordML is used to recreate the text and style as it looked when it was originally created in Word. If you own the Professional or Enterprise edition of Word, you can also create documents with your XML tagging structure, in addition to the WordML tagging. (Fortunately, you can also choose to preserve your own tagging structure and get rid of WordML entirely.)

The benefit of authoring in WordML is that Microsoft has kindly provided transforms that convert WordML documents into HTML, and it’s safe to bet that other transforms aren’t far behind.

Creating Deliverables

For organizations with Word templates, it is fairly simple to create XML templates (Word styles with XML tags) and styled deliverables from Word without developing XSL style sheets. It won’t be easy or intuitive, however.

An author will have to apply a style and select an element. In most XML editors, you can associate a style sheet with your XML document and the author need only select an element (and the style is applied automatically).

If you associate a style sheet with an XML document, it will format every element the same way. For example, any text tagged as a title will be displayed as bold 24-point Times New Roman font. This is how most editors work-every editor except Word, that is.

Using Word, you associate an XSL style sheet to an XML instance when you save the document as XML, and Word acts as a transformation engine. All of the elements are permanently converted to the elements specified in the style sheet. Any data not styled in the style sheet (or skipped to accommodate output-specific needs) is permanently deleted.

There is no way to associate XSL style sheets to content in an XML document for on-screen formatting. Even more shocking, there isn’t any easy or intuitive way to map the elements in your schema to a style in an existing Word template.


Fortunately, companies like Arbortext have already recognized these deficiencies and have created add-ons to make content creation in Word easier. Arbortext’s Word Companion provides on-screen formatting through XSL style sheets and the ability to create output in additional media, such as PDF and HTML Help, through buttons on a task bar.

Unfortunately, the cost of an add-on may be prohibitive for smaller departments. In addition to the cost of Office 2003 Professional edition (approximately $330 for an upgrade), Companion is an add-on to Arbortext’s product E3, which costs approximately $50,000. Companion costs $5,000 per server, with an additional per-user cost of $40 to $60.

Many organizations wanting to take advantage of the practically “free” nature of Office should reconsider how “free” it actually is and purchase a dedicated XML editor instead.

Other features

While Word may not be the robust authoring editor that technical publications organizations need, there have been some improvements worth noting.

Word is backward compatible. You can create a Word document in Office 2003, and users of the previous version can open the document without a hitch (as long as you are sharing Word binary files and not XML).

Word 2003 ships with functionality to restrict styles in a document. Anyone familiar with using styles in Word knows how easy it is to corrupt a document simply by copying and pasting from other documents that have different styles. Soon, instead of just one “Body Text” style, you have five or six different body styles with slightly different names available in the document, and you’re not sure what to use anymore.

I expected that Word-after I set the “Protect Document” feature and selected from a long, confusing list of styles that weren’t part of my style sheet-would warn anyone who tried to paste text with a different style that the style was not allowed. It did not. It allowed me to paste the text with the new style and added it to the style list, just like it had in previous versions. When I opened the allowed list of styles, my new style was indeed added as an allowed style. When I unchecked that style, the text in the document reverted to “Normal.”

“Protect Document” is better than nothing, but users are better off using the small pull-down menu that is created when text is pasted and immediately choosing either “Match Destination Formatting” or “Keep Text Only.”


Word 2003 is a good start as an XML editor, but it is far behind the competition and lacks many of the features standard in XML editors today.

Despite the issues for professional technical writers, I predict that developers and programmers will love it. With its easy integration with .NET and other Microsoft networking solutions, it will provide interesting options in organizations with dedicated information technology staff. CIDMIconNewsletter

About the Author


Tina Hedlund
Senior Consultant
Comtech Services, Inc.

Tina Hedlund is a Senior Consultant with Comtech Services, Inc. Tina has worked with companies in North America and Europe, helping them solve their information-management and content management problems. She has also been heavily involved in many benchmarking projects related to information management and process improvement. Recently, Tina has focused on the technical aspects of content management and single-sourcing processes.