Communicating Revision Changes in a High Reuse DITA Environment
Despite all the progress we’ve made solving key business workflow issues by implementing a DITA-based documentation system, communicating and indicating changes between two releases of a document rendition is still a challenge that is largely unaddressed by the open DITA community.
The Technical Communications Subcommittee of the OASIS DITA Technical Committee has taken the lead in crafting technical solutions to address these needs, even though this issue impacts nearly all DITA users. The result of this work may be implemented in the DITA 1.3 standard.
What Do Our Readers Need?
A reader who is new to a document does not typically need (or want) any information regarding what has changed since the last release of a document. Providing change information is primarily aimed at
- Readers familiar with the previous release
- Internal reviewers who want to focus on iterative changes
- Industries or customers with strict change control requirements for their suppliers
Readers (“customers”) in my experience tend to ask for detailed revision histories, while internal reviewers are more inclined to ask for “change bars.” I suspect customers want to understand how changes may impact their understanding, while reviewers want to make sure errors were not introduced.
With change bars, the reader needs to traverse the entire rendition, looking for change bars. Then, she must orient herself with the context, determine what she might care about, and possibly look back at the previous version (assuming she has it squirreled away), and determine whether the change impacts her in some way. In my opinion, most readers will be better served by a reliable revision history that can possibly provide all of the information they need in one section; if a reader must consult the section for details, a link can be provided in the revision history—and, there is no reason they can’t be used in tandem.
Changes in Files vs. Changes in Content
The added complexity of communicating change in a topic-based architecture tends to lead us towards systems that rely on difference reporting (or “diffs”). Yes, in a text-based file format, diff reporting is relatively simple, and with XML-aware diff algorithms, you can ignore potential noise (whitespace, attributes, and so on). The problem with using diffs is that they fail to answer the question, “What is the impact of this change from my reader’s perspective?” Consequently, the move toward file diffs discourages content quality improvement. Editing a paragraph for punctuation is flagged the same way as a change that has significant implications. There is no shortage of ideas for how to embed markup (be it namespaced elements or XML processing instructions) to collect all kinds of metadata to help make diffs more meaningful. Some tools have developed extremely robust methods for showing changes such as the formatting and redlining that we expect from Microsoft Word. All of these efforts move the focus from file diff reporting to content diff reporting, but these efforts face some inherent obstacles:
- How do you indicate a change in a non-DITA asset (image, video, or an asset not available during build time)?
- How do you communicate when a topic is removed or added?
- How to you handle applicability?
- How do translation services handle the influx of metadata?
- In a high-reuse environment, how do you output rendition-appropriate change information?
Flagging Changed Regions (Change Bars)
Flagging content is necessary, if for no other reason than your customer simply demands it. There is a real advantage to standardizing the way changes are tracked in topics, and the DITA Technical Communication Subcommittee is looking for help in this area. To get involved consider joining OASIS and the DITA Technical Committee first. Then, join the subcommittee <http://oasis-open.org/join>.
Focus on Change Histories
In this article, I turn our attention to the other part of the change management requirement: building revision histories in a semi-automated way. This area is underserved and potentially much more valuable to readers and implementers of DITA. In particular, I explore the challenges that writers have always had plus those that are exacerbated in a high-reuse, topic-based architecture.
Traditional Revision History Challenges
“Direct reuse” refers to reusing content which doesn’t change based on the rendition in which it appears. However, even if the content is not subject to applicability rules, the relevance of the change may not be the same in all usages.
For example, say you have two documents that cover two similar products that will be published at the same time. One product (“product A”) has been on the market for a while, and the other (“product B”) is being launched for the first time. During the course of updating documentation for product B, you notice a problem with some unit of shared information, so you fix it. This change should trigger an entry for the product A revision history, but product B documentation should exclude this item from the revision history.
The different outcomes present a challenge to both flagging changes and creating revision histories. The creator of the revision history must be aware of what has been shown in previous releases of each rendition. In my experience, desktop publishing tools do not have robust support for conditional flagging of content, which would lead to manually adding or removing change bars. Manually overriding publishing directives in XML is rarely feasible.
“Conditional reuse” refers to reusing content that varies based on rendition-specific conditions supplied at a higher level or by filtering/linking resolution; it can also refer to applying conditionality on the entire unit itself. In addition to the challenges of “direct reuse,” having conditional content in an information unit requires an understanding of each usage.
To use the same example, let’s say that a sentence in a reused unit refers to a deprecated feature available on only product A. You have a means to filter the feature out for product B documentation. Changing this sentence may be noteworthy for readers of product A documents but would be confusing to product B owners. In fact, you may not wish to call attention to a feature that is not available on newer products.
Knowing What to Report
Even after you have a good handle on how reused content affects all renditions that use it, classic challenges exist even if there was no reuse at all.
- I moved all of the “troubleshooting” procedures to a dedicated section.
- I did a major cleanup to the language, but no meaning was changed.
- I made an important change to a graphic.
- I made the graphic look nicer.
- I changed this and then changed it back.
DITA cannot solve these problems, but any solution designed to semi-automate flagging and/or generating a revision history must take these into account, in addition to the reuse implications described earlier.
Requirements of a Semi-automated Revision History
Conditional attributes in DITA are good at providing filtering based on a variety of dimensions: audience, product, platform, and—through specialization—any other meaningful aspect, such as “feature.” However, the design for the “rev” attribute is virtually useless in a high reuse environment, and it cannot be used to filter. The “rev” attribute attempts to communicate “at what point a change occurred” by taking on a space-delimited list of release tags. The problem with this approach is that the person making the change must not only be aware of all usages of the content, she must also know the current and next release label for each use. If release plans change or new usages are created, the “rev” attribute may need to be re-evaluated.
The Time Factor
Each change made to an asset need not be tracked. The real question is, “what is the state of the overall rendition, relative to the previous release?” If you keep track of when releases were built, then you can determine which changes occurred since that day. Looking at changes between two discrete points (rather than logging each change between two different points) results in a concise record of changes between the two renditions in question. Approaches that obsessively chronicle each and every character change will introduce extraneous noise that distracts readers.
Though we talk about differences between “rev 1” and “rev 2”, we’re primarily referring to all changes accrued between the day rev 1 was built and the day rev 2 was built. It’s easy to conflate the purposes of source control management, configuration management, and release management; you can argue that time is not as important as maturity, and I would agree. However, when it comes to release management, it does not matter how the asset is tagged or labeled; all that matters is what is different between the release on day N and the release on day M.
Accounting for time adds a new dimension of applicability that was not imagined in the DITA language specification. Adding this dimension is critical for meeting customer requirements for providing meaningful change histories. In a high reuse environment, time is the only way to know whether a change was previously reported or whether the current release is the first to use the change.
The Topic is the Versionable Unit
If we agree to indicate change time on regions of content, the next question is “how granular should changes be tracked?” My suggestion is that we view the topic as the target unit for change consideration. A topic should be context independent and express a complete thought. I will argue that if a topic is changed in any way, a reasonable reader will want to consider the entire topic, if not more, to determine the impact to any prior understanding of the subject matter.
I think we can take some cues from the translation experts here. If a change occurs within a paragraph of one language variant, the translation expert will likely consider the surrounding content to make sure the change is meaningful within a context. The same is true for a reader analyzing a change between two releases of content in the same language. Context is important to understanding, so changes to content must be reported within an appropriate context. Well-crafted topics reflect this ideal context.
Use Case for Time Applicability
To demonstrate the need for time-awareness during revision history compilation, we’ll walk through a simple, yet common, example of how a topic changes over time to be applicable in more contexts.
Let’s start in January 2011. Our first deliverables will be the first edition of Safe Cars of 2011 and the first edition of Safe Cars of All Time. We are assigned to the topic, “Subaru,” which lists all Subaru vehicles that make the “safe” list. Other topics are included in both books, but we’re looking at only one topic.
In 2012, we’ll create a third book called Safe Cars of 2012 and update Safe Cars of All Time.
As the writer of “Subaru,” we are aware of two contexts. It just so happens that the topic will be delivered identically in both renditions. Therefore, no special work is required on our part (see Table 1).
Table 1: Identical Topics
Here is the markup for the topic released in January 2011:
Skip ahead one year. New model years are out and our topic must be updated. In addition to releasing a new 2012 book and a new edition of the “all time safest” book, we decide to update the 2011 book at the same time because over the year, we learned new things about the 2011 models.
As we add the new models, we’re aware that we need to be able to filter and flag, so we add “otherprops” attributes. We decide not to use “rev” attributes because we know “rev” cannot be used to filter, and we want to make sure all attributes can be used for both flagging and filtering.
The same topic needs to be shown in three different ways: only 2011 models, only 2012 models, and both year models.
Changes to Topic:
- Add 2012 list items
- Add “otherprops” for all list items
When both year models are shown together, we must be able to flag certain years as “new,” as shown in Table 2.
Table 2: Added Flagging Directives
Action: Add flagging directives to filterable attributes in basic DITA processing.
Generating Revision History
We’ve now exhausted DITA’s current features for indicating and describing changes. For someone new to Safe Cars of All Time, the “new” icons are not terribly meaningful (though they seem to justify the new edition). I can imagine someone who owns the 2011 edition of the book flipping through the 2012 edition and using the “new” icons to influence whether or not he buys the new edition. If your goal is to sell books, you may have a desired reader experience. If you are a product supplier, a better reader experience would be a simple line in the change history saying, “Added 2012 models.” This simple statement could save your reader the effort of flipping through every page looking for flags.
Change History Domain for DITA
The “bookmap” content model in the DITA specification provides a basis for marking change descriptions in topics so that we can generate a revision history. The following markup could be provided as a standard DITA element domain:
<summary>Added 2012 Models</summary>
<li otherprops=”2012models”>2012 Impreza</li>
<li otherprops=”2012models”>2012 Forester</li>
<li otherprops=”2012models”>2012 Outback</li>
<li otherprops=”2011models”>2011 Impreza</li>
<li otherprops=”2011models”>2011 Forester</li>
<li otherprops=”2011models”>2011 Outback</li>
If this new domain were provided to DITA processors, a new routine could be added to generate a list of all change items and output them to a revision history table. Processing change history information would be subject to the same filtering and flagging rules of traditional “props” attribution plus date information.
Filter by Date
In this article, I describe filtering based on time criteria. The following assumptions apply:
- The effective unit of time is the “date” of a change. In other applications, it may be important to establish time boundaries at a different level.
- I do not suggest whether time filtering should occur before or after other props filtering, but I assume that all filtering would occur before revision histories are generated.
- Revision histories would be generated “on the fly” during processing, as are table of contents and indices; in some applications it may be necessary to perform this task by generating a topic—which can be edited by hand and added to the topic collection as a normally referenced asset.
- The revision history shows only the changes between two specific releases (by date). This concept could be applied to a “running” revision history as a natural and incremental extension.
Use Case for Filter by Date
Table 3 describes the applicability of change information based on the time at which a release is made. The guiding principle is that the revision history should show only changes that occurred since the last release. The goal of the filtering routine is to exclude changes that occurred on a date prior to the previous release and to exclude changes that occurred subsequent to the current release.
Table 3: Applicability of change information based on the time at which a release occurs
Let’s return to the “safe car” example to explore:
Without a “filter by date” approach, the revision history of Safe Cars of 2012 would include a line stating, “Added 2012 models.” This would be a mistake because the first release of a document should not include information about changes to reused topics which occurred before the initial publication. Available filtering features in DITA cannot address this use case without editing each reused topic to add conditional attributions each time a new usage definition is created.
How Filter by Date Can Work
At publishing time, you supply the label and date of the start date, which will typically be the last release date, and the label and date of the current release. During a pre-process step, the pipeline can build a table of all change items with dates between the start date and the current date. This table will be subject to filtering. An example output might be:
The following changes were made between