Managing the Move to Authoring in DITA


December 2005

Managing the Move to Authoring in DITA

CIDMIconNewsletter Hadar Hawk and Amber Swope, IBM Corporation

You’ve made the decision to use Darwin Information Typing Architecture (DITA) technology, but now you have to plan the project. This article addresses the twin challenges of migrating existing source to DITA and preparing to author new content in DITA.

To Migrate or Not To Migrate?

When moving to DITA, you must decide whether to migrate your existing content to DITA or to simply author new DITA files. Review the migration process and considerations to determine what best meets your team’s needs.

Migrating an Existing Information Source

Migrating your source to DITA means that you create XML files and populate the new files with your current content. Because DITA is a topic-based architecture, you must organize the source information into units called topics that contain information of specific types. For example, to migrate an installation guide, you save the information into separate topics by type. This means that the background or conceptual information goes into concept topics, the steps for installation go into task topics, and the reference information about supported platforms belongs in reference topics.


To generate the installation guide again, you create a navigation structure in external files called DITA maps. If the information is delivered as an online deliverable with links between files, you also need to recreate links between topics.

This sounds like a lot of work, but the benefits of using a typed, topic-based architecture and maintaining the topic links in external files pay off in the long run with cost savings, resource efficiency, and quality improvements. (See “A Business Case for Authoring Content in the Darwin Information Typing Architecture (DITA)” at the beginning of this newsletter for additional information.)

What source to migrate?
Before you can start the project, you need to determine what information you need to migrate. The reality is that you can decide not to migrate some information.

Consider the following questions:

  • Will we ever need to access this information for use in another version of the same deliverable or in another deliverable?
  • Will our layouts or formatting ever change?
  • Do we want to be able to single source information instead of having multiple versions of the same information in different sets of source files?
  • Do we need to add links between online information files to improve reader navigation?
  • Do we reuse information from an original equipment manufacturer (OEM) that authors its content in DITA? Or does an OEM that authors in DITA reuse our information?
  • Do we need to provide customized versions of the same information for multiple customers?

If the answer is “yes” to any of those questions, then you should consider migrating the information to DITA.

In the situation where you have the same or very similar information in multiple source files, migrate only one version of the information and plan to reuse it after the migration.

Basically, the only time it does not make sense to migrate the source files to DITA is when you know that you will never have to update or use the legacy information again.

When to migrate?
There are better times than others for migrating source to DITA. As with any source change, you need time not only to migrate the source files but to also train your team for migration, and develop the new architecture implementation to recreate your existing deliverables and test them.

The good news with DITA is that you can migrate in stages and still be able to link to your existing, non-DITA deliverables. For example, if you migrate one set of online information to DITA, you can still link to other sets of information in other formats, including HTML and PDF. You can also use this strategy if you decide to keep some content in its original source format.

Use a staged-migration process so that you can continue to deliver your product information throughout the migration process.

How long will it take?
How much time it will take depends on the following factors:

  • Is your current source in small units, such as topics? If so, do the topics contain specific types of information? For example, do you have topics that contain only concept, task, or reference information? If not, then you must plan to reorganize the information to topics of specific types.
  • Did the team implement styles and elements consistently? If not, then you will need additional time to prepare the source files for migration.
  • Do you want to make structure or content changes during the migration? For example, if you want to move information that appears in multiple source files into a single source file and reuse the content in other files, this will take additional time.
  • Do you have a large number of hard-coded links? To take advantage of the automatic linking capabilities, you need to remove the hard-coded links and recreate the links in DITA maps.
  • How many topics do you have? How many types of topics do you have? The more topics and more types that you have will impact the migration time.
  • Do you have conditional text or employ filtering to customize deliverables? If so, then you need to address this strategy in the DITA implementation plan.

The answers to these questions can help you estimate the amount of time the migration will take.

As with any project, the more variables and dependencies, the longer the project will take. For successful migration to DITA, address each of the variables in the implementation plan.

How to migrate?
The DITA migration support is constantly evolving and improving as more organizations share their migration resources. See the XML Cover Pages ( web site for a wealth of up-to-date migration resources.

The migration path depends upon your current source file type.

HTML: Use the DITA Open Toolkit support to convert the HTML files to XML files.

FrameMaker: If you have styled Adobe FrameMaker files, export the files to HTML and then convert from HTML, as described above. If you have structured FrameMaker files that are in HTML, then follow the HTML conversion guidance. If you have structured FrameMaker files that are in SGML, they are already XML files. You can map the elements and update the source with the new elements.

For additional source formats, such as MS Word, DocBook, WinHelp, and JavaDoc, see <> on the XML Cover Pages web site for a full list of migration paths.

You may have information in multiple source formats. Take the time to detail the migration process for each source type.

Who does the migration work?
For a successful migration to DITA, you need the following roles on your team:

Tool specialist: Person who knows XML and understands how transforms work. This person probably knows how your team used their current tools and can test XML editors to determine which one is right for you and train the team on the new tool. In addition, the tool specialist will do all your technical preparation and the actual file migration. It helps if this person can write scripts for batch file processing.

Information architect (IA): Person who understands how your information fits together to generate the deliverables your customers need. The IA performs the following tasks:

  • Identifies the appropriate topic types for the information
  • Determines the element mapping from your current source to DITA
  • Creates the files that organize the topics for a deliverable
  • Develops the appropriate reuse and single-sourcing strategy
  • Coaches information developers on how to author in a topic-based environment

File preparer: Person who cleans up source files to prepare them for migration. In some cases, the work will be cutting and pasting text from the current source into new XML files and applying the correct elements. This work can be done by any person on your team who understands what needs to be fixed.

File validator: Person who validates that the migrated information appears correctly in the source files and in the new generated deliverables. This work can be done by any person on your team who has access to both the original source and the new XML source and has access to the original deliverables and the new deliverables generated from the DITA files.

Roles do not translate to individual people. The same person can perform multiple roles; conversely, multiple people can perform the same role. Choose the people who are best suited for each role.

Who do we need to involve?
You need to identify all the internal and external stakeholders who could be affected by the source change.

Potential stakeholders are

  • Product release managers
  • Quality engineering or testing teams
  • Development teams
  • Build teams
  • Source control/configuration management support
  • Translation teams
  • OEMs
  • Any teams with which you share deliverables

Be sure to let them know about the migration well in advance of when you need their support and consider their needs in your project planning.

What training does my team need for migration?
Depending upon your decisions in the “Who does the migration work?” section, you must prepare each role to be successful. Here is the baseline of what each role needs to know.


Tool specialists need to know XML and how to migrate the files. In addition, it is beneficial to know how to write scripts for batch file processing.

File preparers need to understand issues that prevent the existing files from meeting the migration criteria. For example, when migrating from source in HTML, what issues are preventing the source from being valid XHTML? In addition, they need to know how to chunk information into topics of the appropriate type.

File validators need to know the differences between the elements in the source and the migrated files to verify that the output is correct.

Information architects need to know how to use DITA maps to generate the navigation structures and create links between topics.

After you identify what to migrate and who will perform each role, verify that they have the proper skills required for the task.

Authoring New Content in DITA

Authoring content in DITA means that your team is creating new XML files that meet the DITA validation criteria. To most easily create new content, use an XML editor that supports DITA. Although you can create XML files in any text editor, unless you use an editor that validates the structure, you risk creating noncompliant files.

Who does the work?
Authoring in DITA does not usually change the roles that people assume on your team. Instead, all roles expand their knowledge and skill sets to include DITA and XML support.

For example, if you had a tool specialist who was responsible for understanding how your HTML editor worked and training your team on how to use it, then this person would do the same for the XML editor.

If your information architects were responsible for creating the navigation files for online deliverables, they continue to do this task, but they use DITA map files instead of the previous mechanism to specify topic order and hierarchy.

What training does my team need for authoring?
In most cases, all team members need some training to be successful. Here is the baseline of what each role needs to know.

Tool specialists need to know XML and how to update transforms if you want to generate customized deliverables. In addition, they need to be able to train and support the team to use the XML editor efficiently.

Information developers need to know how to use the XML editor to author and edit topics, create links, and reuse content according to the DITA standard. If the team does not have a dedicated integrator to handle generating and submitting deliverables to the product build, then information developers need this training as well.

Editors need to understand the topic type restrictions and how the new transforms and cascading style sheets support the content and style guidelines.

Integrators who handle generating and submitting deliverables to the product build need to learn how to generate the various deliverables and how the deliverables are managed.

Information architects need to know how to create cross-product navigation structures for documents, set up data reuse strategies, and use the XML editor. In addition, they need to train and support the team in writing topic-based information and specifying appropriate links between topics.

Focus your training on your teams’ skill gaps. If your entire team is not going to use the information immediately to produce content, be prepared to offer the training multiple times.

How long will it take the team to be productive?
One major factor in productivity is whether your team is used to writing in a topic-based architecture. This can be a big change, and you need to plan for it to take time. One way to speed up the process is to provide training early in the process. There are many good classes available.

Another major change is the use of DITA maps to automatically generate links instead of hard-coding them. Although information developers and information architects know what links they want to have from one topic to another, it takes time to understand how DITA generates links by default and then control the linking. The better support that your XML editor provides for working with DITA maps, the faster your team will “get it.”

If you are implementing single sourcing for the first time, plan to spend time developing the architecture to support it. For example, if you have some information that appears in multiple places in your documentation set, you can create a separate topic that is the dedicated source topic referenced by all the other topics. The IA must develop a plan for this strategy and work with the team to implement and maintain it.

Plan for iterative success and prioritize specific areas of work by iteration. Planning for iterations allows your team to learn the most important things first and then add information to a strong base.


As with many decisions, the decision to use DITA includes many options. You can migrate all, some, or none of your source and still decide to create all new content in DITA. If you decide to migrate all of the source files, you can do it in a phased approach over a long period of time. If you decide to migrate some of the source files, you can still link to the existing source from the DITA topics. If you decide to migrate none of the source files, you can mine content from the existing files for new topics authored in DITA.

Regardless of whether you are migrating existing source or simply authoring new content in DITA, take the time to understand what the project entails. There are many good resources available, including the articles listed in the references section. CIDMIconNewsletter

About the Authors

 swope hawk