Introduction to DITA Conditional Processing
One of DITA’s primary strengths is combining discrete data chunks into cohesive documents. But it also excels at the other end of the spectrum—separating data chunks when necessary. This feature, called conditional processing, allows you to produce separate documents for different products, platforms, audiences, and more, all from the same input. This article introduces you to conditional processing and its control mechanism, metadata.
Just kidding! Every DITA-related article in the world seems to start with this section, whether it’s needed or not. I’m pretty sure that if you don’t know what DITA is, you aren’t even reading this article. Movin’ on.
Try to say that five times fast.
Let’s first consider a basic DITA Open Toolkit build process. A build file collects information from a ditamap file, which in turn references a group of topic files. The build file also locates a set of XSL transforms appropriate to the requested output type and sends all this along to the DITA Open Toolkit, which collects the topics, applies the transforms, and produces the output.
That’s fine when we want all the content in all the referenced topics to be included in the output, but what if we don’t want all of the content? That’s where conditional processing comes in, the goal being to intelligently control which topics or parts thereof end up in the output. This control is achieved using metadata.
Metadata, often called “data about data,” is a characteristic or trait that helps identify, clarify, or classify an informational element. For example, an HTML paragraph tag might read <p class=”dropcap”>…</p>. Here, the content of the <p> element is the data and the attribute, the class=”dropcap” name/value pair, is the metadata; it classifies the type of paragraph (a CSS class in this case) so it can be processed correctly. Or, in an XML document, a tag might read <cost currency=”aud”>…</cost>. Again, the content of the <cost> element is the data, and the attribute, the currency=”aud” name/value pair, is the metadata; it specifies that the cost element should be taken as Australian dollars. Metadata is often coded as attributes, as in these examples, but not always.
Metadata has various uses, such as workflow support, searching assistance, and index preparation, but is really good at one thing—conditional processing. The primary function of conditional processing is omitting undesired content, or “filtering.” DITA provides four standard attributes to control filtering: audience, product,platform, and rev. It also provides a fifth attribute you can use to specify other properties, reasonably (if not uncreatively) called otherprops. Using these attributes, you can classify everything from individual elements to entire topic groups, applying appropriate metadata to the objects to drive the filtering process.
The big benefit in terms of editing and maintenance is that mutually exclusive content elements don’t have to be stored separately; you can put them all together in a single topic or map and leave out the pieces you don’t need at build time. This technique prepares the content so it can be conditionally processed, while simplifying maintenance by keeping logically related items physically together in a single source location. It’s a great way to cram a lot of stuff into a small space—sort of like the Kardashian sisters.
There are three standard places where you can put metadata: on individual elements, on topics, and on map references.
Element metadata is used at the tag level to apply properties by which the elements can be identified and filtered during the build. Let’s say we want to customize the first step in a task by user experience level. We could use the audience attribute to attach the appropriate metadata to three versions of the same task, like this:
<step audience=”novice”><cmd>Plug in your PC.</cmd></step>
<step audience=”intermediate”><cmd>Turn on your PC.</cmd></step>
<step audience=”advanced”><cmd>Boot up your PC.</cmd></step>
Using this markup, we can easily produce a task topic with steps tailored to the specific audience we’re trying to reach, regardless of PC expertise. (An additional version, <step audience=”doofus”><cmd>Box up your PC and take it back to the store.</cmd></step> may be included if necessary.)
Topic metadata is used at the topic level to specify characteristics with which the topic can be filtered. If we wanted to produce a review document containing all topics written by a given content provider, we could use the otherprops attribute to identify each topic’s author, like this:
<task id=”remove” otherprops=”AnnaGraham”>
<task id=”repair” otherprops=”OttoPalindrome”>
While the use of otherprops to indicate author name is entirely arbitrary, it demonstrates the power and flexibility of having a generic, user-defined attribute. The topics can now be identified by author and filtered appropriately during the build.
Map metadata is used at the top of the metadata food chain to apply filtering characteristics to whole topics or topic groups within maps. We could, for example, construct a single map that allows us to produce a user guide for any of several product releases by adding rev metadata attributes to the topic references, like this:
<map title=”User Guide” id=”userguide”>
<topicref href=”inst-demo.dita” rev=”demo”/>
<topicref href=”inst-std.dita” rev=”1.x”/>
<topicref href=”inst-upd.dita” rev=”2.x”/>
We’re now able to select the correct installation topic (or a set of correct topics, regardless of number or hierarchical placement) for any current product release, from the demo version to 1.x to 2.x, without creating—and maintaining—separate map files. Also, recall that in a map, child topics (topicrefs inside topicrefs) inherit their parents’ attributes, so conditional processing metadata “flows down” just like other attributes. This characteristic allows you to affect whole groups of topics by placing just one filtering attribute on the parent.
How do you know in which layers to put your metadata? Well, it depends (I know, right?) on several factors: content complexity, number of authors, the variety of attributes you use, and so on. In general, assign metadata to the highest level of specificity that makes sense. For example, if you need to easily swap out entire blocks of content, use map metadata to control topics by groups. If you have topics that are similarly structured but different in content, use topic metadata to differentiate them. If you have broad, generic content with many small, specific differences, use element metadata to keep the content together but allow it to be easily filtered.
Here’s a great joke: “What do you call a musician with no girlfriend?”