Why DITA Alone Just Won’t Do


December 2005

Why DITA Alone Just Won’t Do

CIDMIconNewsletter Jon Parsons, XyEnterprise Solutions, Inc.

Recent widespread interest in the Darwin Information Typing Architecture (DITA) has accelerated adoption of this open, standards-based foundation for the creation of topic-oriented content. The DITA information architecture can help documentation groups to automate their workflow, achieve reuse of content, and deliver that content to multiple delivery channels. However, when used in concert with a content management system (CMS), DITA’s capabilities are expanded to include a powerful set of tools that includes automated workflow, search capability, and a scalable solution for multi-channel delivery.

What is DITA?

In May of 2005, the Organization for the Advancement of Information Standards (OASIS) produced a formal standard for marking up content using the eXtensible Markup Language, better known as XML. The new standard, Version 1.0 of the OASIS Darwin Information Typing Architecture, or DITA, was released to the world in the form of two documents: a language reference guide and an architectural specification.

This standard is not just another DTD or schema for use in a fixed vertical application. As the word “architecture” in the name implies, DITA is a broader approach to the use of XML in content mark-up than previous doctypes and schemas have been.

DITA’s broader approach stems from two characteristics. First, it is primarily an information architecture: a way of thinking about content and the way it is, or should be, constructed and manipulated. DITA defines several key ideas about the nature of content and the way information sets can be structured and built.

Second, a characteristic of DITA that is alluded to in the use of the name “Darwin” is that it is extensible. That is, although DITA is built on a particular set of tags for a particular purpose, it contains within itself the rules for extending the DITA approach to other domains. So, in addition to offering a solution to authors and information developers who are charged with creating and delivering PDF documents, HTML, and HELP files, the DITA architecture offers the flexibility to add new tags when encountering new information structures. It opens the possibility of accommodating special cases within the product information domain, as well as applications in other domains where the structure of these same information types might be useful under different names.

The ability to extend the tag set is a powerful one. Those who have used software tools with more static approaches to generic mark-up know that any change to the content model, any additional tag, or even an added attribute to a tag can cause difficulties in the tools that are used to edit, format, transform, and deliver the content. The ripple effect of such a change can be very costly, not to mention a nightmare to manage. A key premise of DITA is structure support. By using the same structures under different names, changes to tools are minimized or even eliminated when elements are added to the DTDs or schemas in use. Adding new tags to the open information architecture provided by DITA is called specialization and is a process described in the DITA specification itself.

This article focuses on two key questions often asked by content professionals who are considering whether or not to adopt a DTD or schema, or may be evaluating the DITA specification for their own use:

  • What do I get with an information architecture like DITA?
  • Where does content management fit in?

What Does an Information Architecture Provide Me?

An information architecture is broader than just a DTD or schema designed to capture the structure of a particular type of document or content. It is a fundamental way of thinking about information in the abstract. DITA provides such an approach to content and how it should be organized by referring to content as a collection of topics. Topics are individual units that make sense when they stand alone. DITA defines three types of topics: concept, reference, and task.

In adopting a particular information architecture, you adopt a basic framework that orients you to your content, and helps you to think about how to organize it, define it, and ultimately, how to break it up and manipulate it. It also influences the way you deliver it to your end user and the form in which the content will ultimately be used. Because DITA was created to address requirements within IBM in order to create content in support of software and hardware products, its approach fits well for documentation groups whose charter is to produce information in the form of online help. So, an information architecture provides far more than just a way of marking up your content. It gives you an organizing principle and a way of approaching the material you must create and maintain.

Another added benefit of a standards-based information architecture like DITA is that when content is marked up and managed according to these principles, it is readily automated. Content organized into discrete topics is contained in well-defined pieces that are more easily edited, managed, assembled, and delivered. And because the topics are self-contained, they can be reused in more than one publication or information set. This in turn leads to increased efficiencies, resulting in both time and cost savings.

Because DITA is based on standards and is designed to be extensible, adopting DITA is a good investment for the future. It can evolve in a managed way and accommodate new content and requirements without disrupting the existing tools or the established environment.

In addition to the more obvious benefits of DITA, there is a community that has bubbled up around the standard. There is both an active community where expertise and advice is widely available, as well as a community making open source tools available. In the case of DITA, an Open Toolkit is available for download. It contains standard transformation tools (XSL style sheets) for delivering DITA content in HTML, Help, and PDF formats.

Why Do I Need a CMS?

A question commonly asked by information professionals evaluating DITA is, “If DITA can provide me with content reuse and a set of open source tools to process my content, why would I need a content management system?”

The answer to that question really depends on the nature of the information that you are creating and delivering. There are several factors to consider. For a small organization, DITA and the open tools may provide you what you need. However, several key characteristics are important to consider as you investigate DITA and its use in your environment. They are

  • Scalability
  • Workflow
  • Content life cycle
  • Other forms of content

Let’s examine each of these in more detail.

There are a number of metrics to be assessed when you consider whether or not a content management system would add value to your information environment.

The first place to look is at the content itself and how many topics you will be handling. How complex are the products you are documenting, and how many topics will be required to describe them? While you might be able to develop a good proof-of-concept with a structured editor, the Open Toolkit, and a few topics stored in the file system, will your final deliverable be measured in hundreds of topics, or perhaps thousands? If so, keeping track of these topics by relying upon file-naming conventions will not be easy. As the number of topics increases, the need for creating and tracking metadata about each topic grows. Finding what’s already been written to reuse it in a new project becomes time consuming without the ability to query by metadata or search by content. A good content management system will enable you to control a large number of

Another place to look at the numbers is how many people will be working on a given project. If the answer is one to five, coordination among them can be handled by conventions and a set of best practices. If the answer is ten or more, chances are you will need some mechanism to ensure that edits are not being done simultaneously by two or more people, resulting in lost work.

Another metric that is crucial is the number of products being documented, or the number of releases for a given product. As the content begins to be used in more than one context or to be applied to more than one version, tracking versions and reuse becomes very important. A good content management system will give you versioning and rollback capability and the ability to create “where used” reports on topics that appear in multiple

The importance of controlling workflow grows with the number of people involved. If multiple people are working on the same information set, you will need a way to support their collaboration and to coordinate their activity. A good content management system will provide the ability to track topics through the development and review stages and distinguish between topics that are still under development (“works in progress”) or are finished goods. Outsiders coming to the repository will receive the latest published version. Members of the development team will see and share the drafts under development.

Workflow can be automated so that interested parties are notified as topics or information sets move through the review and approval stages. Reports can be generated on current status and problem areas or bottlenecks. In addition, workflow provided by a content management system will ensure business processes are applied consistently throughout the content life cycle.

Content life cycle
Another important aspect of your content environment to consider is the lifecycle of your content. If it is measured in years, content management can provide real value to your organization. A single content repository that allows you to roll back to previously published versions enables you to support many fielded versions of your product over time. As the size of your repository grows, finding relevant topics for reuse can be challenging. A good content management system will speed that process and enable you to locate previous versions, find the topics you need to reuse, and ensure that what you deliver to the outside world is final, complete, and reliable. A content management system contributes to content integrity.

Other forms of content
Stepping back from the focus on XML and DITA, there are many other forms of content that can benefit from a content management system. Storing PDF versions of published goods with their source files, for example, or managing Word documents in the same system used for XML can be valuable to several groups within an enterprise. Managing Excel spreadsheets or PowerPoint presentations, graphics, or Frame binaries can all be done by a good content management system. When considering whether you need a content management system, think about the kinds of content created in the enterprise around you, as well as in your department.


There are many good reasons to consider DITA as the information architecture of choice for product-related content creation and management. It is powerful, flexible, and standards-based. It has a proven track record within IBM, and it has generated a strong interest in the XML community at large. We can also expect to see DITA adopted and specialized to new content domains over the next few years.

The Open Toolkit available for DITA adds technology to the architecture, and the result is a strong foundation for creating and delivering topic-oriented content. For a few applications, the information architecture alone, coupled with best practices, may be sufficient to manage and deliver the content. A content management system can leverage the capabilities of DITA and add value-letting the content repository scale, automating workflow, providing versioning and rollback capability, and incorporating other data formats into the managed environment.

In addition, a content management system applies consistent business rules, eliminates redundant authoring, accelerates delivery of content to multi-channel output formats, and enables customization based on metadata. DITA provides an excellent foundation for moving to a well-managed information environment within the enterprise. Add a content management system, and you have a truly robust environment for managing global information. DITA alone? It can be done, but DITA and a content management system provide maximum competitive advantage. And isn’t that something we’re all striving for? CIDMIconNewsletter

About the Author

Jon Parsons

Jon Parsons has over 20 years experience automating the creation, management, and delivery of content in multiple forms. Currently he works in product marketing at XyEnterprise. Prior to that, he was a writer, editor, tools developer, and publishing consultant for a large computer manufacturer. Long an advocate of generic mark-up and an enthusiast for XML, he has served on the Board of Directors of OASIS, the Organization for the Advancement of Structured Information Standards, and is a frequent speaker at industry events.