JoAnn T. Hackos, PhD
Director, The Center for Information-Development Management

Lately, I have been curious about the slowness of XML implementation in information-development circles. XML has been around for nearly three years as a reliable standard from the World Wide Web Consortium (W3C). When desktop publishing was introduced in the mid-80s, it took off a lot faster. In our recent survey of single-source activities and an earlier survey of authoring tools, we found 18 percent of respondents authoring in XML. It still appears to be an Early Adopters market.

We definitely need to move faster but carefully into XML. XML has tremendous potential for the dissemination of technical information. If we do it right, we’ll come much closer to solving the access problems that plague our field. By using XML tags to identify content, we make that content more easily searchable.

Let’s take an example. You want to find all the information on your company’s Web site about leave to attend a family funeral. In many companies, you would have to know that this information is located in the Human Resources area, under Employee Benefits, under the Bereavement Policy. You would actually have to know the name of the policy to locate the information. If you did a general keyword search based on policy/funerals/leave/grandfather, you would most likely find hundreds of topics in which these words occurred somewhere in the text.

If your company had developed XML-tagged content, you could search on some combination of policy/funerals/leave/grandfather and find exactly the topic you want. The topic would be tagged as a policy; as a policy related to leave granted for funerals; and somewhere within the policy would be a tag that would identify the relationship to the deceased (most companies require that the deceased be a relative for leave to apply). Because of the inheritance of properties in an XML hierarchy, you would not find all references to grandfather (if there were more than one) but the specific reference that related to policies about leave taken for funerals.

XML allows us to create very specific sets of categories and tags to help users pinpoint information topics. However, to create these categories and tags, we need to know the answers to two key questions:

  • What are our customers likely to be searching for?
  • How does the information we write relate to the customers’ information need?

Without those insights, we won’t know which labels to apply as XML tags and when to apply them.

I’m quite dismayed about some of the implementations of XML I’ve seen. By using conversion tools, many technical publications groups are simply taking their existing format tags and turning them into XML. We see tags for paragraphs, headings, and lists but few or no tags that define the content. The reason–it’s easier and doesn’t require restructuring or rethinking about content. However, those format tags don’t contribute anything to the users’ search for information. The recommendation–start thinking about restructuring and labeling your information topics with XML content-oriented tags.

For more on writing structured content, come to our new seminar Structured Writing for Single Sourcing (dates to be announced).