Ensuring Quality Coding in Your Data Source


April 2007

Ensuring Quality Coding in Your Data Source

CIDMIconNewsletter Carolyn Henry, IBM Corporation

Making the transition from your former authoring environment to DITA has been challenging but worthwhile, and you’re starting to see the results of authoring your information in a well-structured, open source XML format. Your writers are well on their way to moving their content to DITA, to mastering a new XML standard, and to producing high quality topic-based writing (they’ve even recovered from several rounds of edits and have delivered polished information to customers). You are confident about the quality of your information. However, are you confident about the quality of your DITA coding? To capitalize on filtering enhancements, automation, and to someday produce customized information units for your customers, ensuring the quality of your DITA coding is key.

To ensure that the quality of your DITA coding is acceptably high, you can review your DITA source. This type of review is referred to as a DITA code review, and it ideally occurs early in the documentation cycle. DITA code reviews are an excellent way to ensure that writers are tagging their files correctly and consistently. Strong semantic tagging within your files is extremely important to the quality of your DITA documentation and to its future agility. Adopting DITA code reviews will also help facilitate yet another necessary paradigm shift for writers: the shift from tagging words to produce the correct format and output to tagging elements according to what they actually are based on their semantic meaning. For example, this paradigm shift necessitates that writers do not simply place a bold tag around a user interface control, but that they instead use the DITA <uicontrol> tag. Bold text, in and of itself, has no semantic meaning. It’s merely a way to manipulate the appearance of text. The <uicontrol> tag, on the other hand, defines a word or phrase as a particular type of “thing.” Words and phrases that are tagged with the <uicontrol> tag can appear in bold font, but they will also carry the semantic meaning that is an essential part of authoring in DITA. Many teams at the IBM Silicon Valley Lab have adopted a DITA code review process and have seen improvements in the DITA source and in the coding skills of their writers.

Here are some of the compelling reasons to focus on strong semantic tagging:

  • Consistency is enforced across your documentation teams, which consequently results in increased customer confidence in your information and the ability of your documentation teams to share common content.
  • Automated tools can exploit semantic tagging to produce a system action. For example, a system could retrieve a message and activate a particular troubleshooting sequence that is tied to the tagging for that message, or a system could activate its self-repair mode based on particular tags that it reads in a message topic.
  • Dynamic retrieval of information based on tagging and metadata is right around the corner. Documentation teams can do more interesting things with the code, such as filtering and offering customized information sets, based on user roles or other criteria.

What is a DITA code review?

A DITA code review is a process in which a writer’s DITA source files are examined to ensure that the tags that they have used are correct and appropriate. Ideally, the editor, writer, and a local DITA “expert” from your team participate in a session where they review a sampling of the DITA topics. A DITA expert can be anyone who has shown an interest in DITA and has kept up with DITA best practices, a writer who has a significant amount of experience working in DITA, or someone in your group who offers to dedicate some time to learn and teach strong DITA coding practices.

A DITA code review is not meant to be a comprehensive review of every single topic that has been written, but rather a review of a representative subset of topics. The editor and writer can agree upon a set of five to ten topics to review. The selection should include a mixture of concept, task, and reference topics, as well as any specialized topics that might present some interesting code challenges. You should aim to include more task topics in the sample group because, from a tagging standpoint, they tend to be more challenging.

The editor, writer, and DITA expert then spend 30 minutes to an hour reviewing the code in the topics, making suggestions, and discussing the approaches that were used. Writers might have specific reasons for coding a section a particular way and should be given the opportunity to discuss their approach. It is also helpful if the editor who is involved in the code review has some knowledge of DITA and has spent some time learning the appropriate and available tagging options. The DITA code review is meant to be a learning experience for all and should not be considered an exercise where “Big Brother” is watching over the writer’s shoulder. Improving the quality of the coding and exploiting the capabilities that strong coding provides should be the focus. Writers are then expected to apply what they have learned in the DITA code review to the remaining topics in their information unit and to their future writing.

How to successfully implement a DITA code review

There are certain things that can help you successfully implement a code review process within your teams. Keep in mind, the DITA code review

  • should be built into the documentation schedule to allow sufficient time for writers to learn and apply the techniques that they are shown.
  • should be completed early in the cycle, after writers have written their first several topics. Having the review early enables you to identify incorrect tagging habits quickly and eliminates time spent rewriting code should not be intimidating. This process should be helpful and educational for everyone.
  • will get easier as the team learns proper tagging technique. Most tagging mistakes are not intentional. For example, writers might not be aware of all of the situations where they should use the <uicontrol> tag instead of the bold phrase tag.

What to look for during a DITA code review

Writers tend to make certain types of errors when they begin authoring in DITA. Your teams can learn to easily identify those errors and be on the alert for them during DITA code reviews. Typical errors found in DITA code reviews include the following:

  • Placement of index entries
    • Some teams place index entries within the paragraph that the term appears, instead of within the metadata tags in the prolog of a topic. Depending on your team’s approach, you will want to look out for inconsistencies. There are arguments for either approach.
  • Use of ordered list tags in the context element instead of steps tags
  • Lists of parameters in unordered list tags instead of in a parameter list tag
  • Use of an unordered list with bold headings instead of a definition list
  • Use of the information tag in steps instead of the example tag. Writers might place example information for a step in the <info> tag, not knowing that they should use the more appropriate <stepexample> tag.
  • Omission of the <menucascade> tag. Writers might use a series of <uicontrol> tags with hard-coded arrows (→) instead of the <menucascade> tag.
  • Incorrect highlighting
    • Use of the bold phrase tag instead of the appropriate tag. Writers often either do not tag user interface elements or use the bold phrase tag to flag these elements. For example, writers tend to use bold phrase tags for GUI controls, command names, and parameter names when there are specific tags provided for these elements.
    • Use of the <filepath> tag for variables and terms. Writers should not focus on output, whether a term is bold or monospaced, but use the correct tag. Writers should be aware of the different tags available to them, such as the <varname> and <term> tags.
  • Omission of the <wintitle> tag. Some writers are not aware of the tag for window titles or how to use it properly.
  • Use of ordered lists or unordered lists instead of sub steps or choice elements. If steps are truly a sequence of actions that must be performed in a specific order, writers should use step tags instead of list tags.

Using a special .css file to identify possible problems

You can develop specific .css files to help aid the code review process. Some teams have developed special .css files that flag certain commonly misused tags with different highlighting. For example, a code review .css file, or a “find bad tags” .css file, can transform all bold phrase tags in the DITA source code to bold red font in the output. You can also set the .css file up to highlight italic tags, ordered and unordered list tags, and information tags, just to name a few. You can customize the .css file that your team uses to flag just about any tagging that you want your writers to notice. In the following graphic, you can see how one team has set up their .css file to transform HTML output with all bold tags flagged as large, red text (Figure 1).


Writers can transform their topics with this special .css file in preparation for a code review. The editor, writer, and DITA expert can use the XHTML output to quickly identify possible trouble areas. Keep in mind that this .css file will flag possible errors, but you must still analyze the tagging to determine if the writer has tagged an element incorrectly. For example, in some cases the bold phrase tag is appropriate.

Table 1 describes some highlighting options that you can set up in a simple “find bad tags” .css file.


Summary of the DITA code review process

  1. You add time for DITA code reviews to the documentation schedule.
  2. Writer and editor work together to identify a representative sample of topics (several of each type up to ten total).
  3. Writer applies the special CSS to check for improper tags.
  4. Editor, writer, and DITA expert review the tagging in the DITA source.
  5. Editor and DITA expert provide input to the writer on DITA coding best practices.
  6. Writer incorporates edits into the topics that were reviewed and applies new tagging techniques to all future topics.
  7. Result: The quality of your DITA tagging improves.

The DITA code review is an excellent process to ensure the quality and consistency of the DITA coding across your documentation group and organization. The many documentation options that are available with strong semantic coding are too valuable to ignore. In order to move your documentation to the next level and to satisfy the sophisticated needs of your customers, you must take advantage of the options that the coding provides. CIDMIconNewsletter

About the Author

Carolyn Henry

Carolyn Henry
IBM Corporation

Carolyn Henry is an Information Developer in the Information Management division of the IBM Software Group. She is responsible for delivering Linux, UNIX, Windows, and z/OS product information for DB2 and IMS Tools. She has been authoring in DITA since December 2003. She holds a Masters in Technical and Professional Writing from Northeastern University and a Bachelor of Arts in English from Connecticut College.  Carolyn leads a DITA Advocates group internally within IBM as well as the Silicon Valley DITA Advocates Special Interest Group (DITA SIG).