DocBook is not dead! Long live DocBook!

Scott Hudson, Comtech Services, Inc.

Contrary to popular belief, DocBook is not a dead standard. In fact, it is very much alive. DocBook v5.1 has passed its first round of public review and is in the final steps of the OASIS standardization process to become an official OASIS Committee Specification. Soon after, it will go through the voting process to become an OASIS Standard.

Why do we need DocBook? Isn’t everyone using DITA?
There is no small list of projects, organizations, and companies using DocBook. Just take a look at for starters!

DocBook provides many benefits, not the first of which is stability. DocBook has been a standard for marking up technical documentation since 1991. With that stability comes a robust stylesheet rendering implementation, active user community, and off-the-shelf tools support from a variety of XML editors and content management systems.

DocBook was originally created for hardware and software documentation and excels at marking up content for those industries. The DocBook Publishers standard (based on full DocBook) specifically addresses the needs of the book publishing industry. DocBook is also used to create:

  • help systems
  • Web sites
  • books
  • reference pages
  • FAQs
  • white papers
  • training courseware
  • articles
  •  API documentation
  • reports
  • functional specifications
  • “how to” guides and other procedural documentation
  •  presentations

and now, topics! That’s right, DocBook v5.1 introduces several new elements to address the needs of topic-oriented content. <topic> is now available as a component=level element. Content creators can collect these topics into new modules, manuals, books, help systems, web sites, and more using the <assembly> element.

“We can rebuild him. We have the technology… Better than before”
DocBook has never been the Six Million Dollar standard. It’s been open-source from the beginning. That said, there is always room for improvement, and v5.1 doesn’t disappoint. While DocBook used XInclude to collect modular content into a document before, the new <assembly> provides a number of improvements:

1.    Physical structure – a document <structure> contains one or more <module> elements. Each <module> references a <resource>. Those resources can be managed independently of the desired structure, offering greater flexibility.

2.    Centralized metadata – metadata always occurs in an <info> element in DocBook, but there are times when metadata is needed at the assembly level for a particular output. <info> is a key structure that is allowed at the <structure> and <module> levels.

3.    Relationships – In a topic-oriented system, it is often important to describe how certain content is related. Rather than a strict tabular structure, as seen in DITA, the DocBook <relationship> resembles a Resource Description Framework or Topic Map paradigm that can be expressed as a graph. A <relationship> exists between two or more <instance> resources, and the nature of that relationship is described by an <association>. Using this type of structure enables processors to build “smarter” content systems, since the relationships can be expressed in RDF or other semantic web formats.

4.    Transformations – an <assembly> can identify a collection of <transforms> that can be used during the assembly/publishing process. This process enables an <assembly> to use content from non-DocBook resources via a specified transformation!

If You’re not RelaxNG, You’re Working Too Hard
I’m so convinced of the above statement, that I have it as a bumper sticker on my truck! RelaxNG has been the canonical format for DocBook since v5.0. This schema language makes it MUCH easier for authors and organizations to extend the content models in DocBook to meet their specific content needs. If you aren’t using it yet, you need to learn it. Even DITA has taken the cue and is including it in their v1.3.  Tools are starting to support this schema language now as well.

Better Accessibility
DocBook v5.1 addresses a number of accessibility enhancements for output to screen readers. One of the issues in making tabular data accessible to visually impaired readers lies in providing the appropriate markup that will allow a correct correlation of table data with its headings. Where sighted readers can easily associate a column or row header with the correct column or row, screen reader software and devices cannot reliably interpret these visual cues. DocBook v5.1 addresses these needs by adding the @headers and @scope attributes on <entry>, adding the @rowheader attribute on <colspec>, and allowing <table> to contain <caption> for text summaries.

Other Improvements
Docbook v5.1 contains improved support for multimedia, Schematron assertions, XLink, and XInclude. Now it is possible to configure autoplay and media player configurations to deliver multimedia content. Linking is also improved to allow for extended links and other link types. Since Schematron is embedded in the RelaxNG schemas, tighter controls for validating content are now available.

DocBook vs DITA
The debate continues to rage on regarding DocBook and DITA. For organizations that have a large collection of existing DocBook content, does it really make sense to switch to DITA? DITA often requires authors to re-write their content to fit the topic-oriented paradigm and can result in costly migration and re-training. With the new topic-oriented additions to DocBook v5.1, organizations should no longer feel compelled to make such a switch. Topic-oriented content can be used alongside the existing content set, with no additional migration needed.

For those trying to move from unstructured Frame or Word to a structured standard, my advice is this:

  • What kind of content does your organization create? Take a look at the available elements in both DocBook and DITA standards and choose the one that is most appropriate for that content.
  • Are you creating software documentation? If so, DocBook has a much more robust set of elements to describe APIs and SDKs.
  •  Are you creating e-Learning or training content? DITA has a very robust set of elements geared specifically to training content.

Another consideration is your output needs. While both DITA and DocBook support PDF, HTML, ePub, and more, the path to get to those outputs can vary widely:

  • Customizing your PDF output can be much less painful in DocBook.
  • Creating SCORM compliant output, however, would require the DITA-OT or custom stylesheets.

Documentation on both stylesheet engines is readily available, so take a peek and see what you may be in for when the Marketing department needs to change the “look and feel” of your documentation set!