Forgetting About Documents with Structured Writing

Home/Publications/Best Practices Newsletter/2010 – Best Practices Newsletter/Forgetting About Documents with Structured Writing


August 2010

Forgetting About Documents with Structured Writing

CIDMIconNewsletterMysti Berry,

When a user assistance organization is challenged to convert FrameMaker or other proprietary source into XML for assembly using DITA or similar schemes, the focus is often on the technical: which conversion tools to use, how to store the XML source, and how to manage translation drops. However, writing structured documentation, especially if you have a few thousand pages of documentation to convert, requires a major shift in thinking which can be difficult for experienced writers. It requires that they abandon entirely the concept of a book and start thinking about units of information as topics and writing in small units. And further, they must always be aware of the type of information the topic contains. Most writers believe they do this, but in practice may discover that the book metaphor interferes with how they write and, therefore, how well they can reuse information.

If documentation teams convert existing documents into well-structured units of information before converting to XML, the team can save itself a huge amount of pain. This article explains the issues that have confronted some companies and how the book metaphor contributed to problems. Note: Several company’s experiences have been combined into one example for clarity.

The Setup

Consider a company’s API Developer’s Guide. It had been written by a series of contractors and maintained by writers who could only devote a small amount of time to it. Conceptual information was tucked into procedural sections, procedures were sometimes hidden in conceptual sections, and a good deal of conceptual information was tucked into reference sections. In addition, every mention of a call, object, and data type was linked to the section of the guide describing them. For example, in all the object topics, each mention of the call “create” was linked to the “create” topic.

Because unstructured FrameMaker makes it so easy, the writers had created nested topics in a single file instead of managing the structure with the FrameMaker book file. Nesting topics makes reuse very difficult—if you want to reuse only the child topic, references to the parent topic may make that impossible. If you want to reuse only the parent topic, it can be difficult to “hide” the child topic in the new document.

What the team did not have was an information strategy, a clear definition of what valid topics looked like: the information type (conceptual, reference, procedural, and others), the elements allowed in topics of each information type, required elements per information type, and any required ordering. At the time, the team didn’t recognize this lack, because they could look at the existing book and see a decent amount of topic separation. The document being converted was divided into what looked like well-structured parts with distinct information types for each part:

  • Getting Started (conceptual and procedural)
  • Reference (reference)
  • Using the API (procedural)

However, each section in each chapter in each part contained different types of information:

  • The “Getting Started” section contained a large amount of core reference material about data types.
  • The “Reference” section contained a large amount of conceptual information.
  • The “Using the API” section contained one chapter that really belonged in the “Getting Started” chapter.

Why was this a problem? Because the company was expanding rapidly, providing new tools to developers to work on its platform. These new tools sat on top of the API, which meant that a number of tools all needed to reuse about ten topics from the API guide, but reuse was impossible because of the unstructured content in the topics. Long asides about an API-only issue did not belong in the other guides that needed to borrow only the reference information.

In addition, every chapter file contained nested topics. And because nearly every paragraph in the API Reference contained links to some other part of the Reference, un-nesting them would have taken a very long time. Most of the links would have to be re-coded by hand.

Because the team was still thinking of a single book instead of considering the units of information it contained (and because the expansion of the company didn’t happen until after the conversion), the team did the least amount of pre-conversion work possible. Did it really make any difference if the internals of each chapter (which were about to become dozens of individual topic files) were not pristinely organized according to some theory? No, the team decided, much to their continued regret. Where topics were easy to unnest before the conversion, they did so. Where they were difficult or time-intensive, they did not. They did not rethink the structure of the book at all, because they did not realize they were thinking of the book as the unit of information, instead of thinking of the book as a unit of presentation.

The conversion guru worked hard on conversion scripts using Ant, XSLT, and other tools, because at this time, very few software packages supported Frame to XML conversion. The team discussed getting rid of all those links, discussed making a rule that no topic could be nested, but because they were still thinking of the information as a book instead of a storehouse of individual topics that might be reassembled in a different way at a later date, they ignored the structure issues and just converted the book that existed into XML files and ditamaps that would assemble the book in exactly the same way.

The Problem

At that time, there were only about 40 objects to be documented. Now, there are 200. And now different tools in the company’s toolkit use portions of the API. In a best-case scenario, each tool might include a dozen or so topics from the API Reference. Because of all the links and nesting, and because of a lack of explicit and rule-based information types, the other tools’ documentation cannot reuse any of the topics in this guide, forcing customers to use multiple books at the same time—something they complain about early and often. Documentation for the newer tools can only point to the information contained in what has become the API silo. And, because 200 object topics contain multiple links each to the data-type file, the badly nested data-type topic cannot be reused nor can any of the object topics. It’s going to take one of the team’s writers quite some time to strip out the links and nesting, even with smart tools and scripts.

Additionally, topics contain different types of information, making it hard for the reader to guess where a piece of information might be. “Everybody searches now” was the team’s response, until they realized that the multiple hits returned on many search words made search of this long and complex reference document difficult. For example, information about a search query language contains frequent and explicit references to API scenarios in the reference material. However, this reference material can’t be displayed in the new tools because the API references will make no sense. If the reference topics contained just the information about the query language (syntax and query examples and caveats about where they apply to the query language, not the API), they could have been reused.

The Solution

Small solutions have been implemented in the department in an ad-hoc fashion. After several writers noticed how impossible it was to reuse content from nested topics, the team agreed that no topic could contain another topic (even though this is allowed by the DITA specification), thereby increasing the likelihood that any topic can be reused. The department has also launched a project to define information types, such as conceptual, reference, and procedural. While all experienced writers know that a bit of concept might be required in a procedure, each topic must be primarily of one type. The order of elements and the set of allowed elements will be defined and enforced by the writing tools. Some writers are excited by these changes, and some dread them. But they will contribute to an information set that is actually well structured, not one that has simply been converted from proprietary formats to XML.

After working around reuse issues and listening to users complain about needing to have multiple books open at the same time, the team decided to reduce inline links to the fewest number possible. Link reduction was necessary because if topic 1, for example, contains a link to topic 2 in document A, there is no problem. But if document 2 contains topic 1 but not topic 2, the link is rendered useless, and the reference to it is useless as well (see Example 1 for an illustration).


Example 1: Reducing inline links

Problems of this kind arise because writers think about their content as a book, a silo, an inviolate container. In a structured authoring world, writers must strive to make topics as atomic, as small, and as modular as possible, and count on the ditamaps (or similar tool) to assemble different topics for different user needs. It’s a difficult shift to make, but once writers adapt, they can see the advantages. Material that was once locked into a particular book can now be shared among multiple books. This capability is especially important for enterprise software documentation, which adds languages and capabilities that are related to the existing tools. Reuse is critical in an enterprise setting and is a cost-saver for those who translate into multiple languages.

Far and away, the most difficult change will be to strip out the links from over 500 pages of reference documentation and reorder or reorganize topics that contain a mishmash of conceptual, reference, and procedural information. But once the links are gone, it will be easier to unnest the last few remaining topics, like the data-type topics that had links to it in nearly every file of the API Developer’s Guide.

Interesting After-effects

One of the best writers on the team complained that with topic-based writing, she couldn’t “see” the book. No amount of cajoling and reminding her that we were now to create units of information no larger than a topic at a time could dissuade her from her need to see the book as a whole. Luckily, we discovered our authoring tool could read a ditamap and display a WYSIWYG version of an entire document. However, the philosophical disagreement remains. One side argues that a writer should be able to compose a topic without any thought to the context of the other topics, since those topics might be assembled with different topics or in a different order. The difference was resolved when the writers acknowledged the difference between a book that presented a story, such as a how-to or white-paper, versus documents that do not rely on the reader to consume the material in a particular order, such as application help, developer’s guides, or API reference guides.

Suggested Best Practices

Before you begin to convert existing books into structured files such as XML/DITA, consider the following steps:

  • Design an information model, including information types. Specify the elements that are allowed in each type of information, describe what the type is used for, and if possible, configure your writing tools to enforce the rules. Ask yourself—if the topic were to appear in a different book, would everything still make sense?
  • Analyze your documentation to look for issues such as nested topics, excessive inline links, and wildly different information types mixed together in a single section or paragraph.
  • Consider writing from the bottom up instead of the top down. This practice may be especially useful if you work in an agile environment, with short cycles and more frequent change. Of course, you want a top-level outline to ensure that you, the developers, and QA teams are on the same page, but then consider developing topics, starting with the ones least likely to change, before creating the ditamap.
  • Consider storing as many names and boilerplate text as possible in separate files that contain only reused material (or a similar mechanism if you work with a CMS). This practice helps in situations when marketing changes feature names or with structured topics that use the same heading names or introductory text over and over. Writers can just drag the correct word or phrase into their topic, and if the name changes, they can change it in one place. You do, of course, need to inspect affected topics for issues such as article agreement (for example, if the name changes from vowel-initial to consonant-initial, all your “an” articles before the changed element must become “a”). However, it is some help to have reuse files available to all writers on a team.
  • Of course, learn to think about units of information at the topic level and units of presentation at the book level.
  • Usability test your documents, either informally if no budget exists or formally if your company supports a user experience team. Card sorts, heuristics, and other tools are available. Usability testing can uncover problems like having to use multiple documents to complete a single user task. CIDMIconNewsletter

Mysti Berry

Mysti Berry

Mysti Berry is a Lead Technical Writer for, and is completing her twentieth year as a software technical writer. She has presented at STC meetings and taught UC Berkeley Extension courses, and survived multiple conversions from proprietary formats to XML-based structured authoring.