Demystifying DITA Publishing Options
Organizations often use multiple authoring tools, formatting, and layout applications to produce print and electronic documents. Yet, as content volumes and change requests increase, managing these deliverables without a cohesive approach becomes virtually impossible. Not only is the cost of licensing and maintaining multiple applications cost-prohibitive, it often causes enormous inefficiencies. When more than one application is being used, content must be developed separately for each output medium. In addition, these disparate tools leave the responsibility of document formatting and publishing to content authors—taking them away from their core assignment of writing valuable content.
Today’s documentation teams must deliver continuous content updates in an increasing number of delivery formats. End users expect information to be tailored to their unique needs. In addition, electronic publications must be searchable and easy to navigate. Most importantly, the information must be accurate and consistent at all times.
To meet these challenges, many organizations are turning to Darwin Information Typing Architecture (DITA), an XML standard that promises an open and cost-effective alternative to proprietary publishing solutions. DITA frees authors to focus on writing clear and concise documentation instead of struggling with complex publishing tools. In addition, DITA enables organizations to
- publish from a single source to multiple output formats
- customize deliverables by product variations
- easily reuse content across deliverables
- leverage content management systems and automated publishing to streamline the content lifecycle
- choose from a variety of tools to avoid lock-in to proprietary systems
Publishing with Traditional Tool
Traditional publishing tools such as Microsoft Word, Adobe FrameMaker, and Adobe InDesign, as well as content development tools for online help platforms, attempt to meet the needs of production publishing by embedding presentation styling within the content itself.
While this approach is easy to learn and apply, it combines the tasks of authoring content and preparing it for presentation into a single role. The responsibility of applying styles to the content takes place at the time the content is created. Authoring systems that mix presentation with content are helpful in that they allow the writer to correct layout errors and perform complex copyfitting. However, since each media element typically has its own presentation characteristics and conventions, this approach makes it particularly difficult to re-purpose content developed in one medium for another.
For example, serif fonts aid reading speed and comprehension on printed pages, but sans serif fonts are easier to read on screen due to the limited resolution of computer displays. Moreover, when giving authors the ability to manipulate the appearance of content, it is easy for them to become distracted from their primary task of developing clear and concise prose.
Benefits of Separating Content from Presentation
Conversely, separating content from presentation offers numerous benefits. First, in this environment, formatting and navigation cues in output are more consistent. Modifying styles throughout an entire document is more efficient, as there is no need to work through existing content and manually reapply styles or monitor which authors are conforming to corporate style guidelines.
In addition, writers do not need to divide their time and attention between writing and the presentation layout. In fact, presentation can be optimized for different tasks in the authoring and reviewing process. For example, advanced authoring environments such as XMetaL can present content in a format that is ideal for writing and editing and present it in a different format for reviewing.
Publishing with DITA
DITA fosters information reuse. Authors write content in small pieces of information known as topics and reuse those topics across different deliverables by assembling them into DITA maps. The DITA specification provides additional content reuse methods such as conrefs (content references), a simple text inclusion mechanism that allows authors to easily reuse content stored elsewhere, and conditional text, which enables publishers to select only the content that relates to a specific audience or product. These features make it easier and faster to write, assemble, and publish customized documentation.
Single Source Production
Organizations that produce content based on DITA can generate formatted output for print, online help, or a website from a single source.
This single source production of multiple output types eliminates the need to create and maintain content for each specific output format.
The DITA Open Toolkit—an open source DITA-compliant processor—enables single source production via a library of XSL transforms for creating production-ready documentation deliverables from DITA content. The Toolkit performs preprocessing work, such as resolving links, applying conditional text filters, and arranging topics in the order indicated by the DITA map. Thus, a source XML document can be easily transformed into XHTML for web presentation or XSL Formatting Objects (XSL-FO) elements for printed PDF output. Output formats currently supported by the DITA Open Toolkit include Adobe PDF, Eclipse Help, HTML, Microsoft Compiled HTML Help (chm), RTF, JavaHelp, and other XML formats such as DocBook.
With its robust support for single-source publishing, the DITA Open Toolkit provides a compelling alternative to proprietary publishing systems. This advantage is aided by the Toolkit’s integration with authoring tools, which allows for an authoring environment that can comfortably replace traditional desktop publishing systems.
For example, XMetaL Author 5.0 integrates with the DITA Open Toolkit to provide a complete, XML-based, end-to-end publishing solution, including a productive graphical user interface as an alternative to the Toolkit’s command line interface. Authors can easily generate and preview supported deliverable types without switching between the authoring application and the command line tools provided by the Toolkit. With XMetaL Author, writers can choose to “Generate Output for DITA Map” or “Generate Output for DITA Topic.” Each of these commands launches a dialog to configure and produce a rendered version of the current DITA topic or map.
Many writers are mystified by the variety of skills and technologies that they need to transition to an XML-based publishing system. In reality, DITA, the Open Toolkit, and integrated authoring tools like XMetaL make the transition simple and straightforward, as described in the following sections.
Producing print or PDF with XSL-FO
A common requirement for organizations moving to XML is to replace their traditional desktop publishing systems with a scalable, XML-based publishing system. Since XML vocabularies have no inherent presentation semantics, users must apply an external mechanism to attach stylistic information to the marked up content. For printed documentation, the most common option is to transform the content from its source schema to the formatting vocabulary defined by the Extensible Stylesheet Language Formatting Objects (XSL-FO) specification, which consists of attributes specifically designed to describe paged media. The XSL-FO specification was designed to provide presentation semantics for print rendering of XML, much the same way that Cascading Style Sheets (CSS) provide for HTML rendering.
Many documentation teams will require no special skills to use XSL-FO output, so long as the default rendering capabilities of the DITA Open Toolkit meet their organization’s requirements. Some changes to output, such as replacing a corporate logo file, can be done by changing simple parameter settings. If more extensive changes to the stylesheets are required, the team will need some knowledge of all three parts of the XSL Recommendation: XSL-FO, XSLT, and XPath.
Using XSL-FO Rendering Engines
XSL-FO is suitable for publishing long, text-intensive documents that do not require page-level layout modification, i.e., the document follows regular formatting patterns and does not have to be adjusted on a page-by-page basis. This means that XSL-FO is highly suited to most types of written communication, such as user guides, technical manuals, policies and procedures, datasheets, and white papers.
XSL-FO is an XML language that cannot be directly rendered to print. To produce the formatted rendition, the XSL-FO document must be processed by an XSL-FO rendering engine. The formal XSL-FO Recommendation defines two conformance levels: the Basic conformance level supports common page objects and properties, such as fonts, graphics, colors, block and inline formatting, and page regions, while XSL-FO’s Extended conformance level supports advanced layout-related options, such as “keep together” properties, hyphenation controls, additional page region definitions, and color profiles.
The DITA Open Toolkit supports several XSL-FO processors. Apache FOP is the default processor available with the Toolkit and supports the Basic conformance level. Extended conformance is available from commercial processors that integrate closely with the Toolkit, including RenderX XEP and Antenna House XSL Formatter. For additional support, XMetaL Author Enterprise Edition pre-integrates RenderX XEP and is also fully compatible with the Antenna House and other standards-based FO engines.
XSL-FO capabilities for publishing and translation
XSL-FO supports formatting semantics to meet a wide array of publishing needs. This capability includes features for defining page-level and
line-level characteristics, auto-generation of navigational aids, and formatting for bidirectional text.
The specification offers inherent support for internationalization. It is not biased toward any writing direction or page orientation and is Unicode-compliant. It also provides support for complex glyph layouts as required for languages such as Thai. XSL-FO’s internationalization advantages are a common reason for moving to XML for organizations that are seeking to expand into global markets.
The DITA Open Toolkit contains transformational stylesheets that support several electronic delivery formats, including XHTML. The output of this transformation is a set of XHTML files and an automatically generated index that can be directly deployed to a web server. The Toolkit comes with a set of Cascading Style Sheets for formatting the generated XHTML. These stylesheets can be used as-is or customized to meet corporate style guidelines.
Producing Online Help
The DITA Open Toolkit produces popular online help formats. As with print and XHTML output, help output is produced by using XSLT to transform DITA into the target format. When the output format uses XHTML and CSS, the appearance and functionality of the help system is easily configurable without changing the XSL stylesheets included with the Toolkit. The Toolkit also produces project configuration files, which are included in the output, as well as indexes, Related Topics lists, and browse sequences.
Help formats supported by the DITA Open Toolkit include:
Microsoft Compiled HTML Help (chm): After producing the source files in HTML Help format, the Toolkit invokes Microsoft’s HTML Help Workshop to compile and package the content for delivery. Note that HTML Help Workshop is not included with the DITA Open Toolkit but is available as a free download from
- JavaHelp: JavaHelp is an online help system delivered by Sun Microsystems to support platform-independent help delivery for Java-based applications, including applets, Java components, standalone application, and Java-enabled devices. Viewing JavaHelp requires the Sun JavaHelp processor (available separately from java.sun.com).
Eclipse Help: Content for this open source help system consists of a combination of XML files that configure the content and HTML files that contain the content itself. Eclipse Help can be packaged independently of the Eclipse Integrated Development Environment (IDE). Viewing Eclipse Help requires the IBM Eclipse Help processor, available from www.eclipse.org.
Producing other help formats
Because the DITA Open Toolkit output is fully standards-compliant, content developers can use it to create output in other formats, such as browser-based help. More options are becoming available as tool vendors see the benefit of supporting DITA-based XML content. For example, Quadralay Corporation is currently developing a version of its popular WebWorks suite that will directly transform DITA content into help formats. This solution will enable more interactive features, such as expanding hotspots and popups.
To meet their organizations’ growing needs for dynamic, global information delivery, documentation teams must break free of applications and formats that combine presentation with content. Using a vendor-neutral format such as DITA allows information to be “tagged” to define the structure of the content, rather than specifying its appearance. This process makes it much easier to repurpose data for different product variations, audiences, and formats.
To quickly realize the benefits of DITA and to ease user adoption concerns, consider an end-to-end DITA authoring solution that integrates tightly with the DITA Open Toolkit.
Applications like XMetaL Author DITA Edition that provide out-of-the-box support for the Toolkit will minimize startup and maintenance costs, while providing a previewing and publishing environment that will ease the transition to XML for your desktop publishing users.
About the Author
Jerry Silver has over 20 years of IT experience as a developer, consultant, and product manager, specializing in database and application modeling and design, application architectures, Web technologies, content management, and collaboration. He has been a featured speaker on these topics at numerous industry conferences and a guest lecturer on several Computer Science faculties. Jerry spent 15 years at Oracle in a variety of technical roles, most recently as Principal Product Manager of Oracle Application Server Portal. He also served as Director of Product Strategy with content management vendor NCompass Labs, now part of Microsoft. Currently, Jerry is Director of Product Management with JustSystems, Inc., responsible for content lifecycle solutions and the XMetaL family of structured authoring applications.