Eric Severson, Flatirons Solutions


In the prior article of this series, we looked at DITA best practices from the author’s point of view. This included consideration of what it takes to think in terms of topics rather than books, managing content variations and reuse, and using links and cross-references appropriately.

In this article, we’ll turn to the content manager’s point of view – looking at how to optimize the use of DITA content models, migrate legacy content to DITA, and implement topic-based review and publication processes.

DITA Content Models and Specializations

DITA uses a very flexible content model in which any number of standalone, reusabletopics can be assembled hierarchically into a map that defines the structure for each output publication or deliverable. In addition to promoting comprehensive information reuse, this approach greatly simplifies the complex document structures that have characterized pre-DITA uses of XML. Inside DITA’s generic topic, the content model is simple. Essentially, a topic must have a title, a body, and an ID by which it can be referenced from a map. Inside the body, the topic may include any number of paragraphs, sections, lists, tables, and figures in whatever order you like.

To be able to more precisely describe topic types – and to map them to your own information structures – DITA also provides a powerful and unique feature known as specialization. Consistent with the information typing aspect of DITA, specialization allows you to create your own variations of the generic topic structure, each of which becomes a different topic type. Specializations are derived directly from the parent topic type and can be hierarchical. Their structure is always consistent with the parent type’s structure but can be increasingly more precise and restrictive at lower levels. Three out-of-the-box specializations are included with the DITA standard: conceptreference, and task topic types.

Previous standards have been tended to promote relatively complex and restrictive content models. This practice has had the advantage of precisely controlling document structure and format but the disadvantage of making XML difficult to author and maintain. In contrast, DITA gives us a wide spectrum of choices as to how simple or complex our content models need to be. A fully-compliant DITA implementation can actually be quite simple–or as sophisticated as you feel it needs to be. The trick is to know where it’s worth making it more sophisticated and where it’s not.

With this in mind, the first question to ask is whether there’s really a compelling reason to use anything other than the standard DITA specializations. In fact, for some applications, the question is whether we need to use anything other than the generic DITA topic itself. In cases where specialization is needed, we recommend specializing directly from the standard concept, reference, and task types. This practice is the best way to ensure future compatibility with both changes to the DITA specification and to the off-the-shelf tools that implement it.

If you can’t use the standard types as the basis for specialization, then we recommend staying as close to the standard types as possible. For example, many organizations have found it difficult to align their content with the standard task specialization, preferring to create their own ” my_task” specialization directly from the generic DITA topic. When doing this, however, we recommend starting with the standard taskspecialization and incrementally adapting it as needed–rather than inventing something completely new. This tactic will give you the greatest chance of staying consistent with vendor tools as the standard evolves and makes it much easier to “switch back” if you find that future versions of the standard specializations fit your needs.

Migrating Existing Content

As with authoring new content, the most difficult part of converting legacy content is to make it topic-oriented. Topic orientation includes all of the following considerations:

  • Deciding what level of information should constitute a “topic” in the new system. This decision should keep in mind that a topic should have a specific subject and a specific purpose – for example, describing a single concept or a single, well-defined task.
  • Ensuring that each topic is self-contained. Developing self-contained topics includes removing context-specific assumptions and references (e.g., assuming you’ve just read the previous section of the book, or saying “see below”).
  • Ensuring that topics are reusable across multiple contexts. Reuse includes generalizing context-specific descriptors (e.g., changing “replacement memory card” and “new memory card” to simply “memory card”).

Where there’s opportunity for content reuse, the challenge also becomes making one topic out of many. For example, the following variations might occur across three existing documents:

To install the widget, remove the screw on the right-hand side of the tray, slide the widget into the tray, and replace the screw to secure the widget.

You will need a standard Phillips screwdriver to install the widget. First, locate the tray and remove the screw. Then slide the widget in and replace the screw.

Locate the tray and remove the screw with a Phillips screwdriver. After sliding in the widget, replace the screw.

When legacy content is converted to DITA, all three of these versions will still exist. Ideally, authors will consolidate these into a single topic that can be reused across all three of the original publications. This consolidation can be done by picking the best, most reusable version—or by creating a new version that captures the best of each. In this example, perhaps the following:

Locate the tray and remove the screw with a Phillips screwdriver. Then slide the widget into the tray, and secure the widget by replacing the screw.

Finally, the new single reusable topic must be linked back into a set of DITA maps that allow the output deliverables to be assembled and produced.

Of course, taking all these actions across your entire set of content can be a tremendous amount of work. Luckily, DITA doesn’t have to be an all-or-nothing approach. In practice, there is usually a “sweet spot” of content that’s really worth the effort, while other content can be used as-is until there’s time and motivation to work on it. Content in the sweet spot typically

  • is core material as opposed to introductory or supplementary information
  • has the potential for significant reuse
  • changes frequently
  • has significant cost or risk if it’s inaccurate or inconsistent

Other content, even though it may not meet the strict definition for standalone and reusable topics, can still be broken up into “topics” and linked into DITA maps. However, such topics should not yet be marked as reusable. It’s also okay if we continue to have some redundancy across these lower-priority topics. We can keep multiple versions of topics and include them in different maps. Later, we can work to consolidate them and make them reusable as time permits.

Topic-Based Review and Publication Processes

In the classic book-oriented world, each publication is sent out as a whole to reviewers and then published as a whole once it’s been approved. This review process is straightforward, but it typically results in multiple, redundant reviews of the same information.

With DITA, topics are written to be standalone and reusable, and information is only authored once. This process means that they should be able to be reviewed only once, independent of any specific publication and use.

But how does this work in practice? Does a new review cycle begin each time an individual topic is completed?

The answer to this question depends on several factors:

  • How the reviewers or subject-matter experts are organized. In a topic-oriented world, reviewers should focus on the set of topics for which they are experts—regardless of the output deliverables in which the topics appear. Therefore, reviewers should get an extract of the topics in their specific area—not the whole output deliverable—usually once all topics in their area of expertise are complete.
  • How the output deliverables are organized. In some applications, the core set of output deliverables are already arranged according to subject matter—even though there may be reuse beyond this core set. In this case, it would make sense to have the reviewers work directly on the core deliverables.
  • How often changes are made. Normally, it would be inefficient to feed reviewers one topic at a time—and it might be difficult to have sufficient context to review. But for certain information that changes infrequently, such as legal boilerplate, it might make sense to put a single topic through the review cycle.

Typically, we recommend using separate DITA maps for review, organized to fit the needs of reviewers. Review maps can include content for the entire publication (or a portion) if appropriate. Or they can include groupings of similar material for a particular subject-matter expert, independent of a publication.


In the next and final article of this series, we’ll look at things from the globalization manager’s point of view–exploring how DITA can be used to significantly reduce the cost, complexity, and lag times associated with localizing content.

Flatirons Solutions specializes in content management and XML-based publishing and is located in Boulder, Colorado. Eric is a co-founder and Chief Technology Officer.