Hal Trent, Comtech Services, Inc.

At Comtech, we have been working on integrating two content sources: a DITA-based set of topics and an XML-based data schema. We’ve found that we can leverage the fundamental principles of DITA to merge the two sources at publication. The end result is a seamless merger between two previously isolated islands of content.

In a typical technical publication setting, DITA topics are stored separately (either in a component content management system or file system) from market trend analysis data, price information, and technical specifications that are typically stored in a relational database or XML-based data schema.

The separation of technical information and data allows the respective content stakeholders to manage their information but presents a disconnect between technical documentation groups, marketing groups, and web design groups. The paradigm of keeping each group’s information separate has existed for quite some time but can now be overcome using accepted standards.

This article briefly outlines the opportunities, challenges, and necessary standards to merge DITA and data content. For a more complete understanding of the processes behind merging DITA and data, please look for the full version of this article in the upcoming August 2009 Best Practices newsletter.

By understanding the current uses of DITA and new opportunities for its use, the reality is that DITA and data can harmoniously be merged at publication time.  In general, most current DITA implementations involve after-market technical content. This content can include but is not limited to installation and operations manuals, service and maintenance manuals, and parts catalogs.  Single-sourced DITA topics enable higher levels of reuse, consistency, usability, and output variations for the current after-market content, but the content can also have a major impact in pre-sales literature, e-commerce applications, parts lists, and point-of-sale information.

Before DITA and data can be rolled into production, certain information criteria and standards must be understood and addressed:

  • Standardize the information model.
    1. Design an information architecture that defines information types and content units for technical documentation.
    2. Associate the information types and content units with the appropriate DITA markup.
  • Standardize the data model.
    1. Streamline the process for adding and maintaining data.
    2. Associate the data with information types and content units defined in the information model.
  • Standardize the information flow.
    1. Identify the topic set and various data sources.
    2. Format and normalize the topic set and data stream to remove redundant content.
    3. Determine new processes and workflows for content stakeholders.

Designing an information model that corresponds with the data model is critical to successfully merge the information stored in the component management system with the data stored in the relational database. Without going into too much technical detail, the standard used to merge the two forms of content is XSLT (extensible style sheet language transforms). XSLT is the specification that allows content formatted in XML to be parsed together and merged upon publication.

As with most projects, there will be challenges and caveats along the way. As companies begin exploring solutions for merging technical content with data, it is important to keep these ideas in mind:

  • address multiple content sources
  • develop an information model to support content standardization
  • develop a data model to support data standardization
  • design an integration strategy to merge the content sources

The methodologies described in this article were proven in a client’s production publishing environment. We merged fluctuating pricing information (stored in a relational database) with product-specific marketing information in a component content management system. The end result was a product catalog with dynamic and up-to-date pricing and content.

On a final note, the best strategy for success is to leverage standards. Through standards, companies can gain improved maintainability, accessibility, interoperability, and extensibility.

To learn more, visit the CIDM featured article page for Hal’s Content Management Strategies/DITA North America 2009 presentation, DITA and Data.