Automated quality checking of technical information: Beyond spelling and grammar assistants

Dustin Clark, Citrix Systems & Ben Colborn, Nutanix

[The Machine] is a universal educator, surely raising the level of human intelligence. …Every age has done its work … with the best tools or contrivances it knew, the tools most successful in saving the most precious thing in the world—human effort.”
—Frank Lloyd Wright, “The Art and Craft of the Machine”

Quality assurance (QA), in its common usage, is often defined as a single review or checkpoint that has been reached at the end of a long authoring journey—as if it were a barrier that could be passed through once a deliverable is complete. The logic follows, then, that the act of performing a QA review is a singular occurrence at a prescribed point in the document lifecycle.

Really effective QA, the kind that is done both efficiently and thoroughly, is a process. It starts early in the design phase and moves along with the project, occurring at regular intervals.

Developing Quality Technical Information (Hargis et al.) provides a framework for the QA of technical information throughout the development process. Through a delineation of quality characteristics, Hargis et al. assert guidelines on how to align the appropriate human reviewers to the right set of reviews. Automated review tools play a minor but crucial role in this system as spelling, grammar, and link checkers—tools that perform a focused set of specialized functions. For example, a grammar or spelling checker will not flag an error with “connecting physical stores to virtual sores” [sic] even though a human would easily identify that “sores” should be “stores.”

The basic principles of automated QA, or more generally reviewing content without human intervention, relies on developing a strict division of labor between people and computers. It is no longer common to have full-time editors in information development departments, so it becomes critical to identify the aspects of the editor’s role that can be expressed precisely and are therefore good candidates for automation.

Technical terminology is one area. For example, the Microsoft Manual of Style specifies using “click” alone rather than “click on” (p. 264). Finding every instance of “click on” in hundreds of pages of content is trivial for a computer, but arduous for a human. Even the best editor is likely to miss occurrences. When there are numerous such terminology rules to be kept in mind at all times, the likelihood of a human detecting all violations is low.

In environments where the publication requirements are particularly rigorous, automated QA can also flag for:

  • Tagging structures
  • Standard phrasing
  • Spelling
  • Formatting (in DTP environments)

The quality checklists outlined by Hargis et al. (pp. 388-395) contain other items that conceptually could be candidates for automation, especially in the “Style” category:

  • Tasks are divided into concrete subtasks
  • The style is active
  • Boilerplate text is implemented appropriately
  • Style guidelines are followed

High-end automated QA systems like Acrolinx can perform sophisticated grammatical and style checking. But even with a home-grown system using free or low-cost tools, certain grammatical structures, such as passive voice, can be detected with reasonable accuracy. Freely available tools, such as the DITA QA Plug-in and After the Deadline, offer a framework to tailor terminology matching and basic language metrics to custom deployments.

While it may be tempting to implement checks for an extensive list of terminology rules, the rules should be limited to those that are actually relevant in your environment and problematic for authors to follow consistently. Implementing specific rules and avoiding bloated terminology lists will help minimize irrelevant feedback and false positives. However extensive in quantity, the automated rules should be as specific as possible. Specificity requires more complexity and attention at the development stage, but consistently saves re-work later during the review process.

Automated QA of content is also less sensitive to the schedule of editors, who may be shared resources available one week but not the next. Running automated QA is low cost so it can be done throughout the development process. It frees writers and editors from the taxing and unreliable assessment of scores or hundreds of rules. The goal is not to remove human attention from the information development process, but to enable authors and editors to focus on more important questions: Is this correct? Is this clear? Is this useful?

Works Cited
Gretchen Hargis, Michelle Carey, Ann Kilty Hernandez, Polly Hughes, Deirdre Longo, Shannon Rouiller, Elizabeth Wilde
Developing Quality Technical Information
2004, Upper Saddle River, NJ
Pearson plc as IBM Press
ISBN: 0131477498

Microsoft Corporation
Microsoft Manual of Style, 4th Ed.
2012, Redmond, WA
Microsoft Press
ISBN: 9780735648715