Sabine Ocker, Comtech Services
July 1, 2020

When migrating to a structured markup publishing environment such as DITA, most organizations feel they are well-positioned to move forward once tools and the information model are in place, and their content converted and migrated into the cCMS… but are they really finished?

Newly created content complies with the information model, and so going forward will be well aligned with the new standards and guidelines, but not the existing content corpus.

Many organizations decide to wait until after their content has been converted into DITA to do clean up. Given the pragmatic realities of software release schedules and learning new tools, some find the momentum flags for investing in their existing content.

“DITA Best Practices” asserts that preparation for and cleanup after a content conversion project accounts for 65% of the total project. The post-conversion effort to align with the new standards and guidelines includes metadata, image sizing, and file naming, as well as changing content structures to those outlined in the information model.

To effectively manage the post-conversion cleanup efforts, I recommend tackling the initiatives in 3 levels:

  1. Clean up any issues which result from the conversion effort itself. For example, this includes addressing problems tagged with “requires cleanup” markup.
  2. Next, apply metadata or make other changes needed to republish the new version of the content in the new look and feel to the delivery platform. I refer to this as creating a Minimum Viable Product (MVP).
  3. Lastly, align existing content with the new corporate content strategy.

There may be problems within the newly converted content flagged by the conversion vendor or script. It is mission-critical to the success of the conversion efforts to resolve these problems as quickly as possible. Issues could fall into the following categories:

  • Addressing content structures not specified in the conversion specification, so the vendor or conversion script doesn’t know what markup to map the resulting DITA to. Some examples include caution or hazard statements.
  • Analyzing divisions, sub-divisions, or sections not included in the Information Model to determine how to break them up into separate topics.
  • Deciding what non-semantic, or formatting markup in the source content indicates.
  • Fixing unresolved cross-references or links due to missing target files or incorrect pathing.
  • Ascertaining what to do with topics included in the source content not in the original table of contents, either because they represent content that has been deprecated or documenting a feature or function which will be a part of a future release.
  • Converting .bmp images in the source content as some tools do not support that format.

Once you have fixed these errors, the next level of clean up entails readying for publication and creating  MVP publications. This phase could involve making changes or additions to these items:

  • Adding new publishing metadata, which is not yet present in your converted content. This metadata could either be DITA elements or added via your cCMS. New metadata may be related to content access/permissions, target audience, or product version data.
  • Modifying existing publishing metadata, which has changed in your Information Model. In most cases, this may include product name, category, or target audience values.
  • Removing any hardcoded links which are now automatically generated by the transform. Section or sub-section introductions are one place where hard-coded links may be in place.
  • Resolving content issues preventing the successful transformation of DITA XML source to your desired output format. Generating output for each map in your cCMS will help you find the problem publications and fix publishing errors.

After the second phase of the post-conversion cleanup is complete, you will have a minimum viable product ready to publish to your delivery platform, replacing the legacy version of the publication. Republishing is a massive milestone for any organization, as that means the content will have a consistent look and feel, which is an improvement to your customer experience.

Level three consists of making larger-scale improvements to the content and content structure. The driver for these improvements could be:

  • Alignment with the documentation or enterprise content strategy
  • An enterprise digital transformation initiative
  • Search experience optimization
  • Content improvements to make it more:
    • Topic-based
    • In keeping with the principles of minimalism
    • Reusable
    • Translatable

Changes at this level are the most labor-intensive for writers and will involve writing, rewriting, or reorganizing the content. Some examples of this type of changes:

  • Including short descriptions for all topics and maps.
  • Removing transitional language such as “in the next section” or “as outlined in the previous topic.”
  • Examining generic “topic” and separating the conceptual content from the procedural content into concept and task topics.
  • Assessing the DITA block and in-line elements for markup, which either is missing (i.e., has not been applied) or misapplied (i.e., has been used not to identify the content within it but to achieve a desired formatting in the output.)

If your organization has a plan to or has completed post-conversion cleanup for each of these levels, then congratulations are in order! You are in a great position to maximize the advantages of semantic tagging and a consistently applied architecture to meet your organization’s content strategy requirements for today, tomorrow, and the years to come.