Reducing Context in Modular Documents

CIDM

December 2010


Reducing Context in Modular Documents


CIDMIconNewsletterTony Self, HyperWrite Pty. Ltd.

The advent of XML-based documentation architectures such as DITA has made single sourcing more practical to implement. The concept of the separation of content and form is a fundamental principle of DITA and other XML-based approaches, but to maximise content reuse opportunities, context must also be separated from content. In other words, the topics cannot be hard-coded with publication-specific contextual information if they are to be successfully reused in different publications with different purposes.

However, context serves an important role in communication, and removing context will diminish the value and quality of the topic. This contradiction seems to create a dilemma: how can we remove context from modules to permit greater reuse without losing quality in the process? This article argues that by adopting context-agnostic writing techniques for topic-based modular documentation, technical writers can improve content reuse and achieve greater efficiency through the technical documentation life cycle without significantly compromising quality. In some cases, context can simply be removed from topic modules, but when critical to meaning, it is possible to move context from the topic to a document specification or map. This separation of context from content mirrors the more common separation of content and form fundamental to many XML applications.

Separation of Content, Form, and Context

The use of semantic mark-up in DITA, where text elements are marked up based on their meaning, allows the content to be completely separated from its rendition and display to the reader. For example, a term is marked up as a <term> and a citation as a <cite>, and no information about how those elements will be displayed is stored in the content. Stylistic (display) rules are applied when the DITA content is transformed into a reading format, such as HTML or PDF. In a DITA workflow, documents are created as collections of modular, reusable topic files, and mechanisms allow not only the format to be separated from the content but also the context. The same topic may be a section in the context of one publication, but a subsection in the context of another. The intermingling of content, format, and context in a style-based document workflow essentially eliminates the possibility of reuse. Once a paragraph is styled as having a 13 cm left margin, it cannot be used on paper 12 cm wide. A phrase marked up in italic won’t render as italic on a reading device that doesn’t support italic. But a citation identified as a citation in a DITA topic can be processed to italic by one transformation process, to bold red by a different transformation process, and to synthesised voice by another transformation process.

One of the more difficult changes to writing technique when moving from linear to modular writing is the removal of as much context as possible from the text. For example, the use of phrases such as “as shown above” and “in the following diagram” will not be valid if the referenced content is not included in all output publications. Well-written topics (and smaller blocks of text) with minimal context can be reused in many publishing contexts. Writing in such an approach can be referred to as context-agnostic or context-neutral writing.

Context-Agnosticism: Utility, not Poetry

Petelin & Durham describe three types of context that need to be considered by a writer: “the context of culture, the context of organisations, and the brief context of particular situations”. These three types of context need to be separated from content in the writing workflow for the content to be suitable for reuse in multiple contexts. Separating cultural context might mean avoiding culture-specific terms, such as “bonnet” and “hood”. Separating organisation context might mean avoiding product names. Separating situational context might mean standardising writing style.

Jennifer O’Neill identified the following types of contextual information that need to be minimised in order to produce truly global, modular information:

  • product and company names
  • organisation-specific terminology
  • different ways of writing
  • inconsistent standards application in authoring tools

Writing content that is abstracted from its delivery, presentation, structure, and context may result in a sacrifice of the look-and-feel of the end product. However, compromise in technical documentation, in web design, and in most other fields of communication is normal. Content-agnostic writing is a typical information architecture decision that does sacrifice some aesthetics for reusability and efficiency. As put succinctly by Ellen McDaniel in a conference paper on structured authoring in XML, context-neutral authoring is about “utility, not poetry”.

Techniques for Reducing Context

This article suggests that in order to write in a context-agnostic way, the following techniques can be used:

  • removing or minimising context phrasing (for example, “Have your vehicle inspected…”)
  • avoiding terms specific to organisations, industries, geography, and culture
  • using filtering to selectively remove conditional content from the deliverable document
  • removing branding and other context from graphics
  • removing sequence context and enumeration from topics and automatically applying sequence in the publishing process
  • externalising context by storing context in document maps rather than topics, and using devices such as relationship tables, variables, and indirection

Simplistic Removal of Context

Some contextual phrasing, such as product and company names, can be removed from text without significant compromise to its readability.

For example, a sentence in the User Guide for (the fictitious) ProductA of “Use ProductA to create a project file to manage all the files in your system” can be rewritten as “Use this product to create a project file to manage all the files in your system”. The rewritten sentence is then able to be reused in the User Guide for the similar ProductB.

Likewise, a sentence in a car owner’s manual of “your Subaru Impreza is fitted with a supplemental restraint system” can be rewritten as “your car is fitted with a supplemental restraint system”. This change removes the context binding the sentence to Subaru Impreza cars only, so that the sentence may be reused elsewhere, such as in the owner’s manual for a Saab 9_2 (a rebadged Subaru Impreza WRX). However, there is more context that can be removed from the same sentence. The rewritten sentence mentions “car”, which restricts the use of the sentence to documents concerning cars. Using “vehicle” instead would broaden the potential use of the sentence to documents concerning cars, trucks, and boats.

Writers working in a modular writing environment do not need to know where a topic they are writing is intended to be used or may one day be used. In fact, it may benefit the writer not to know what a topic is intended for, making it easier to write without a context. It is also good practice to be as generic as possible. In the example above, a writer knowing that Subaru doesn’t currently make boats or trucks might be tempted to use “car” rather than the more agnostic “vehicle”. However, using the narrower “car” context might close off some future reuse opportunities.

DITA provides other mechanisms to allow better management of terminology such as product and company names through features such as variables. Variables permit document-specific terms to be identified in the topics, and then substituted with a contextually applicable alternative when the document is published to a deliverable format.

Supra-Organisation Specific Terminology

Although not mentioned by Jennifer O’Neill (STC, 2002) in the list of types of contextual information, supra-organisation specific terminology is also a common type of contextual information that would need to be removed to maximise reuse potential. For example, referring to tax authorities using the US term “IRS” binds the use of the text to a United States context (US Government Internal Revenue Service).

The following text (from a Subaru Impreza MY06 Manual) is largely context-agnostic.

Your vehicle is equipped with a seatbelt warning device at the driver’s seat, as required by current safety standards. There is a seatbelt warning light in the combination meter.

The authors have correctly used “your vehicle”, instead of “your Impreza” or “your Subaru”. Rather than quoting a specific law or safety regulation, the authors have chosen the neutral phrase “current safety standards”. Using more specific (and contextual) language such as “as required by Australian Design Rules” would require the text to be rewritten and replaced for every market other than Australia. It might also lead to unnecessary research, such as determining what safety standards are used in Papua New Guinea for the manuals delivered to that country. The reader of the manual, in this case the driver of a Subaru Impreza, just needs to know that the car is fitted with a seatbelt warning device. It could be argued that the phrase “as required by current safety standards” could be removed entirely.

Compare that context-neutral approach with the following paragraph, which can only be understood in the context of a US-based reader:

Your vehicle is equipped with a Subaru advanced frontal airbag system that complies with the new advanced frontal airbag requirements in the amended Federal Motor Vehicle Safety Standard (FMVSS) No. 208.

To a driver in Malaysia, a reference to a (presumably) US Federal standard is irrelevant at best and confusing at worst. Further, the use of the word “new” restricts the use of the sentence to a time frame; in one year’s time, FMVSS 208 will surely no longer be new!

Use of terms specific to a culture or a geographical audience is another example of context-heavy writing. The phrase “trunk” might be understood by North American English-speaking audiences but is not readily understood by many other English-speaking audiences, who know the component as a “boot lid”. Using a variable for the term, using an alternative neutral term (if possible), or including both terms are some methods to dilute the context.

Filtering Out Context

Conditionalising content helps avoid another obstacle to reuse. To avoid having complicated text blocks listing conditions when the content should be used, conditions can be applied to text, and filtering used to selectively remove text from outputs for a particular market, audience, or publication.

In the following example, the measurements have been provided in two different units, so that the text can be understood by readers using Imperial measurement units and readers who use the metric system. The text has not been written in the context of the audience using the metric system.

The extender adds approximately 8 inches (200 mm) of length and it can be used for either the driver or front passenger seating position.

Although it might appear that the better outcome might be one where the source contains both measurements. That is not necessarily the case. DITA and other semantic XML approaches incorporate conditional filters that can be used to exclude the non-preferred unit from the output for a particular market. However, even in a country using the metric system there will be drivers who better understand the Imperial system.

Filtering can also enhance readability by removing clutter.

Removing Context in Graphics

Illustrations, photographs, images, and other graphic devices within documents are themselves innately modular, in that they are stored as separate files and can be used in different contexts in different documents. However, the content of graphics can bind them to context. For example, a photograph with a figure title superimposed on the image can only be reused if the figure title is relevant in the different reuse context. A photograph with English language call-outs can only be used in an English language document.

Care taken in the creation of images can maximise their reuse potential. The omission of branding from graphics will permit greater reuse. A photo identifying the radiator grille of a car is not significantly enhanced by the inclusion of the car maker’s bonnet badge, but including the badge in the photo will render it unsuitable
for illustrating any document other than one concerning that car brand.

Removing Sequence

Use of terms that denote sequence is another form of context that must be removed for a text to be context-agnostic. Adjectives such as “previous”, “next”, “earlier”, “aforementioned”, and “later”, as in “more information on the product limitations are discussed in later chapters”, should therefore be avoided. Including such restrictive references introduces the risk that the blocks of content won’t be in the sequence in all output publications and that the adjective becomes incorrect and misleading. (It is also possible that the referenced blocks might not be included in all output publications.)

While adjectives such as “following” or “preceding” do apply a context, that fact will not usually present a problem if the referred element is within the same information chunk (a chunk is a block of text describing one idea or step). For example, if the lead-in sentence will only be reused in conjunction with the larger block in which it is structured, then the context is confined to intra-chunk. Context-agnostic writing only needs to aim to minimise extra-chunk context since this type of context is limiting to reuse.

An example of intra-chunk context is:

Operation is subject to the following two conditions: (1) This device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.

In a semantic mark-up language such as DITA, the mark-up itself can be used to bind the lead-in sentence to its antecedents, as in the example:

<p>Operation is subject to the following two conditions:

<ol>

<li>This device may not cause harmful interference, and</li>

<li>this device must accept any interference received, including

interference that may cause undesired operation.</li>

</ol>

</p>

The paragraph element (<p>) contains both the word “following” and the items that follow. Whenever the paragraph is reused, the items will be included in the reuse.

There are, however, some conflicting requirements when the content needs to be localised. Including a list block (<ol>) nested inside a paragraph, such as in the example above, will create some problems for translators. This creates a dilemma that eventually might be solved by advancements in software tools. For the moment, authors should be aware of the issue, and at a minimum, should avoid including text after a nested block inside another block element. The guidelines contained in the Best Practice for Leveraging Legacy Translation Memory when Migrating to DITA whitepaper produced by the OASIS DITA Translation Subcommittee should be followed when writing for translation.

Cross References

Although cross-references might be technically easy for a writer to devise, cross-references are heavily burdened with context.

Cross-references to other parts of a publication are context-specific because to make sense they require a pre-condition that the referenced target (in the example below, the “Activating…” procedure) is included in the output publication.

To exit valet mode, change the setting of your vehicle’s alarm system for activation mode. (Refer to “Activating and deactivating the alarm system” in this section.)

Cross-references are, however, often critical to the navigation logic and the understandability of the text. Rather than remove this context completely, the best approach is to move the context from the topic to the document map.

The document map (in DITA, it is known as the ditamap) is a manifest listing the topic modules that are to be included in a deliverable document and the hierarchy and sequence in which they will appear. A ditamap is therefore specific to a publication, while topics might be reused in many publications.

Specifically to allow the shifting of link context from context-agnostic topic to context-specific map, DITA includes a relationship table, defined in the ditamap. The relationship table defines the linking relationship between topics in the publication. Because the ditamap defines the collection of topics to be delivered as a publication, this is a more logical place for links between topics in the collection to be defined. When the ditamap is processed to create the deliverable document, the relationships in the relationship table are translated into cross-references that are added at the end of the topic in the output.

Externalising Other Content

XML-based modular writing systems offer other methods to externalise context or move the context from topics to the map manifest. One such method is the keyref feature offered in DITA 1.2.

The concept of keyref is that references to other resources are made with one level of abstraction; the link in the topic refers to a key, and a matching key in the ditamap refers to the resource. This concept is known as indirection.

For example, rather than a cross-reference in a paragraph directly referring to another topic, such as in <xref href=”wrx_specs.dita”>, the cross-reference would refer to a key, as in <xref keyref=”model_specs”>. When a direct reference (in the example, to the Subaru WRX model specifications in wrx_specs.dita) is used, the paragraph can only be reused in a WRX publication. However, when an indirect reference is used, the actual target is defined in the ditamap file that specifies the publication. The code in the WRX ditamap might be <topicref keys=”model_specs” href=”wrx_specs.dita”>, while in the ditamap used for the Saab 9_2, the code might be <topicref keys=”model_specs” href=”9_2_specs.dita”>. Using this indirection technique, the cross-reference in the one topic can refer to entirely different target topics depending on the ditamap in which the topic is used. Indirection can be used for any addressable document component: links, cross-references, glossary definitions, images, and content snippets.
Indirection effectively moves context from the content topic to the document specification (map) level.

Conclusion

The change in the technical communication field from document-centric writing to topic-based writing, made practical by the development of XML-based documentation architectures such as DITA, will necessitate a change in writing approach to separate content, form, and context. The modular DITA architecture in particular provides for the creation of independent content topics that are assembled into deliverable publications through document maps. While semantic markup allows content to be separated from form, features in the document map allow some types of context to be moved from the topic level to the map level.

Adopting context-agnostic writing techniques to topic-based modular documentation will result in greater content reuse opportunities, leading to improved efficiency through the technical documentation life cycle. Those writing techniques include

  • removing or minimising context phrasing
  • avoiding supra-organisational specific terms
  • using filtering to selectively remove conditional content
  • remove branding and other context from graphics
  • removing sequence context from topics and automatically applying sequence in the publishing process
  • externalising cross-references through relationship tables, variables, and indirection

Context-agnostic writing does not mean the removal of context, but the separation of context from content and form. While it is not known whether context-agnostic writing will affect the quality of communication by leading to an inferior, superior, or equivalent experience for the reader, the concept of reintroducing context customised to the requirements of the individual reader, at the point of document delivery, holds much promise. CIDMIconNewsletter

Self_TonyTony Self

HyperWrite Pty. Ltd.

tony.self@hyperwrite.com

Based in Australia, Tony Self has been involved in documentation for 30 years. In 1993, Tony founded HyperWrite, a consultancy company specialising in hypertext and emerging documentation technologies. Tony also lectures in technical communication and journalism at Swinburne University. He is chair of the OASIS DITA Help Subcommittee and developer of the open source WinANT Echidna DITA publishing utility.

REFERENCES

JoAnn T. Hackos

Managing Your Documentation Projects

1994, New York, NY

John Wiley & Sons

ISBN: 0471590991

Ellen McDaniel

“Consider the Source: Structured authoring for XML-based documentation”

2005, Raleigh, NC

University of North Carolina

<http://www.unc.edu/cause05/presentations/mcdaniel/mcdaniel.pdf>

Jennifer O’Neill

“A Global Style Guide: Working together around the world”

2002, STC Conference Proceedings

<http://www.stc.org/confproceed/2002/PDFs/STC49-00024.pdf>

Roslyn Petelin and Marsha Durham

The Professional Writing Guide

1992, Warriewood, Australia

Business and Professional Publishing

ISBN: 0582871816

Gershon Joseph & Rodolfo Raya

“Best Practice for Leveraging Legacy Translation Memory when Migrating to DITA”

2007, OASIS

<http://www.oasis-open.org/home/index.php>