Conditionals vs. Composition

[level-visitors] You must login to view this article.

Login

[/level-visitors]
[level-logged-in]

CIDM

December 2008


Conditionals vs. Composition


CIDMIconNewsletterEric Armstrong, Sun Microsystems, Inc.

Summary

This article compares and contrasts two different mechanisms for adding context variations to a DITA topic. It’s part of a series that describes the 20 plus decisions that face every DITA project. The goal is to identify the pros and cons for each decision and, where warranted, record the known “best practices” around each decision point. Most of them can be covered in a single article. But a couple, like this one, are intricate enough to require an article of their own.

Note that strategy choices are typically part of the start-up cost of every project, not just documentation projects. It is therefore worth considering whether a “Decision Guide” specialization would be useful for DITA. That thought is partially considered here. I hope to treat the idea more fully in a subsequent article.

DITA Project Decisions

Here is a list of choices that a DITA project needs to consider:

Conversion Choices

  • Convert existing information to specific topic types or to generic topics?
  • If generic topics: Nest the topics themselves (like the original), or make separate topics and nest topicrefs in a map?

Structuring Choices

  • File naming and directory structures
  • Allow sections in topics, or not
  • Put index terms in the body of the document or in the prolog
  • Use conditionals or composition (the subject of this article)
  • Use word-level phrases for variables or sentence-level phrases?

Metadata Choices

  • Metadata—Define all expected metadata at the outset, make a minimal set and plan to evolve it, or define the metadata you see yourself needing in the next couple of years and ignore everything else
  • Make “hierarchical” metadata (Product A_1…B_2) or use separate dimensions (Product=A, B. Version=1,2)
  • Are metadata specializations needed?
  • If so, does it make more sense to do domain attribute specialization, or use the <otherprops> element?

Linking Choices

  • Keep links inline, in topics, or restrict links to reltables
  • Include links to external documents or not?
  • If including links to external documents, use absolute links with a version# entity ref in the link (http://location/product_&version;/file) or use a “doc root” entity ref in the link (&docroot;/path/to/file)

Differencing Choices

  • Use status elements, manually marking items as new and changed
  • Use diffs maintained by the CMS or version-control system (VCS)
  • Use processing instructions to identify changes

Processing Choices

  • Formatting Strategy: Break formatting-template into sections and encode in XSLT for use as DITA OT arguments, or extract content from generated pages and insert into copy of template
  • Processing Mechanism: Use the Open Toolkit, use CMS production tools (if available), or use editor production tools
  • Process Control: Always generate all docs, implement dependency-driven builds, or deliver documents dynamically, upon request

Branding and Styling Choices

  • Use CMS templating mechanisms, if available
  • Add XSLT transforms to Open Toolkit processing
  • Feed Open Toolkit output into downstream processes
  • Find or build a template system where you can modify a WYSIWYG template (for example) and auto-convert the template into XSLT transforms.

Specialization Choices

  • Create specialized topics or use generic topics?

Introduction

The goal of most DITA authoring is to reuse topics in different settings, where the topics have only minor differences in each setting. There are two strategies you can use to achieve that goal:

  • Conditional text
  • Composition

With conditional text:

  • You put multiple variants of the text into your topic, tagging each of them with different conditional metadata.
  • At production time, you filter the topics to get the variants you want by specifying which metadata-tagged elements to include or exclude.

With composition:

  • You embed a reference in the topic using the DITA “content reference” (conref) mechanism. The reference points to an element stored in separate “definition” file (where the definition file may contain several definitions or hundreds of them).
  • During production, you substitute a different definition file to change the text inserted by the embedded reference.

I first became aware of the potential of composition when I worked with the XDocs CMS, where it can be employed as an integral part of the production process. I became further intrigued after hearing from people on the DITA user discussion forum. People who use it swore by it, considering conditionals “old school” and “archaic”. Those messages simply begged the question, “What’s so good about composition?”

In this article, I describe the two approaches, contrast them, and attempt to answer that question.

Note:

This article provides a basic template for a decision guide. Such a guide includes an introduction that lists the choices, explanations that describe the options, and a section that compares the advantages and disadvantages of each choice, perhaps followed by an explanation of “hybrid” alternatives that provide some combination of the advantages and disadvantages of the individual options.

In a future article, I hope to explore the possibility of a decision-guide specialization for DITA. Of the remaining sections of this article, some should be subheads of the composition section and others should be independent topics that expand on entries in the comparison table. But the luxury of the article format is that I’m free to take liberties with the template, both for readability and for improved authoring speed—especially since that template hasn’t been fully defined, as yet.

Using Conditionals

Here’s an example of a task step with conditionals that provide three alternatives for the directory address depending on the operating system: Solaris, Linux, or Windows.

<step><cmd>Go to the installation
directory at

<ph platform=”solaris”>/opt/ product</ph>

<ph platform=”linux”>/usr/ product</ph>

<ph platform=”windows”>C:\ Program Files\product</ph>.

</cmd>

</step>

At production, you must define which platform you want to include in the output.

The good news when you’re reviewing this topic is that everything is right there in front of you. The bad news is that when you need to make a change, you need to find all occurrences of the conditional statements, scattered throughout your topic set. And while you may know the location for one of the entries (say, the Solaris location), you may not know values for the other two, which leaves unknowns scattered throughout your topics.

Using Composition

Here is an example of the same task step, using composition. In this case, phrases are inserted using the DITA conref mechanism:

<step><cmd>Go to the installation
directory at

<ph conref=”metadata_platform. dita#install_dir”/>.

</cmd>

</step>

<step><cmd>Install the program:

<ph conref=”metadata_platform. dita#installation_command”/>

</cmd>

</step>

Note that it is easier to see the punctuation and spacing here. With conditionals, it’s harder to make sure you got those details right.

But the important thing to note is that the conref’d phrases are, in essence, variables. We can give those variables new values by swapping out the referenced file and replacing it with one that has the same element IDs on the phrase elements. When we do that, conrefs will “just work”—even though the file used to provide conref’d values during production may be very different from the one that was used while authoring.

That observation is the essence of the composition. With that strategy, “metadata_platform.dita” doesn’t even need to exist at all. Instead, that file name is a pseudonym that refers to different conref files (aka definition files) at different times. As long as each definition file has elements with the appropriate IDs, the conrefs always resolve properly, no matter which file is active.

In this case, one of four different definition files can be active—one that defines placeholders and one for each platform that contains platform-specific substitutions:

metadata_platform_placeholders. dita

  • metadata_platform_solaris.dita
  • metadata_platform_linux.dita
  • metadata_platform_windows.dita

At authoring time, for example, the placeholders file might be activated using a command like the following:

% cp metadata_platform_ placeholders.dita
metadata_platform.dita

Note:

With this command, the metadata_platform file actually does exist, although its contents can be changed at any time by activating a different file. But on Solaris and Linux systems, a symlink can be used instead. Symlinks are faster and more convenient. And since you don’t have two copies of the file, there is no possibility of editing the wrong one. And perhaps most importantly, it is always clear which file is in active use.

The placeholders file may contain variable definitions that look like this:

metadata_platform_placeholders. dita

<ph id=”install_ dir”>Installation Directory
</ph>

As long as the placeholders file is active, writers see “Installation Directory” in the text, highlighted as a conref. Writers don’t need to know the platform-specific values for different platforms, or even how many platforms there are. They just need to know to reference the install_dir variable.

At production time, one of the values file is substituted to provide the appropriate text:

metadata_platform_solaris.dita

<ph id=”install_dir”>/opt/ product</ph>

metadata_platform_linux.dita

<ph id=”install_dir”>/usr/ product</ph>

metadata_platform_windows.dita

<ph id=”install_dir”>C:\Program Files\product</ph>

To clarify things, here’s a table that shows how some of the references resolve (Table 1):

Armstrong_Table1

Table 1: Conref’d Values Depend on which Definition File is Active

Notes:

A definition file is typically a generic DITA topic, so you can put anything in it. Anything with an ID is a “variable” that can be referenced.

Anything that doesn’t have grammatical variations, like a file pathname, can be in a <ph> element by itself. But anything that has grammatical variations (like product name) should be a complete sentence in a <ph> element, because only complete sentences translate well. Even in English, when the number of the noun (singular or plural) or part of speech changes, the change wreaks havoc on definite articles (a, an, and the), verb forms, possessives, and other sentence elements.

When the specifics aren’t yet known, values files can be populated with something like the following:

<ph id=”install_dir”>__
NOT YET KNOWN__</ph>

The values file then provides a checklist of important information that needs to be determined.

In general, you will have one file set per metadata dimension—one for the variables and one for each of the possible values. You can then create combinations of metadata by varying the set of values files you supply.

For example, these two files would be used to create the installation instructions for the JDK product on the Solaris platform:

metadata_platform_solaris.dita metadata_product_jdk.dita

To be clear about the terminology I’m using, a definition file is either a variable file or a values file. In this article, I use the term values file to mean, “a definition file that substitutes a set of values for a set of variables.” But that choice overloads the term “values file,” which also refers to a .ditaval file, in DITA. To disambiguate the term, I propose to call a .ditaval file what it really is: a control file.

That nomenclature has the advantage of maximum accuracy because the prototypical definition of a “process” is something that has inputs, outputs, and controls—and the .ditaval file certainly acts as a process control. In addition, in a suitably clever implementation of the production system, the control file can be used to automatically select the values files to use during the production run.

Comparing the Two Strategies

Originally, the idea of using conditionals seemed like a “no brainer” decision. But after a closer inspection, composition seemed to have a lot going for it. With composition, there is no need to extend the metadata attribute set, no need to worry about metadata hierarchies, and no need to worry about boolean combinations of metadata. Table 2 summarizes the differences I have so far been able to discern between the two approaches.

Armstrong_Table2

In general, composition would seem to be the preferred way of creating contextual variation of topics, with one major exception: version-specific information. Version-number metadata has the significant advantage that you use mathematical expressions to specify inclusion criteria. For that purpose, conditionals are invaluable. In most other respects, however, composition appears to be the superior mechanism.

A Hybrid Alternative

Upon reviewing the original version of the comparison table (Table 2), IBM’s Megan Bock offered this alternative:

“Use conditional metadata for topicrefs in maps but use composition in topics, so there are no embedded conditionals in your topics.”

With that strategy, you would still need to create metadata, which negates some of composition’s advantages. But you are left with maps that are more readable, since they don’t contain references to “a topic or sub-map to be named at a later date.” The advantage is that your maps are less abstract and less complex. The disadvantage is that you still have to define the metadata. You just use it less often.

There is one more significant advantage for this approach, though:

Conditionals let you solve the one kind of problem that composition can’t touch: The problem of multiple locations.

Consider this real-world example. The outline below comes from a map for an installation guide that covers both the Java Development Kit (JDK) and Java Runtime Environment (JRE). There is only one way to install the 64-bit supplement for the Java Runtime Environment (with an executable), but there are two ways to install it for the JDK, depending on how the JDK was installed. The same topic is therefore referenced in two different locations. In one location, it is nested. In the other, it isn’t:

JRE: Installing the 64-bit Supplement (executable)

JDK: Installation Options

Installing the 64-bit Supplement (executable)

Installing the 64-bit Supplement (packages)

Here, the same topic is used in two different locations. Conditionals solve that problem handily.

Of course, it may be that you don’t really need conditionalized maps. It’s quite possible to create a different map for each deliverable you intend to produce. You give up some of the advantages of single sourcing, but you do so only at a very high level, where it doesn’t hurt very much.

This distinction is an area that needs a razor—a way to decide between the two approaches. Conditionals in topics certainly appear to be more trouble than they are worth, but when does it make sense to use them in maps and when does it make more sense to create duplicate maps? (In the example on page 155, there were about a dozen topics in the map, so it didn’t seem to make sense to have two copies—but things would have been simpler if I had.)

Compound Conditions

Most of the time, one file per metadata dimension will be sufficient. You might have sets of files like this:

metadata_platform_variables.dita

metadata_platform_solaris.dita

metadata_platform_linux.dita

metadata_browser_variables.dita

metadata_browser_firefox.dita

metadata_browser_opera.dita

Then, at production time, you do the appropriate substitutions to create the document appropriate for Firefox users on Linux, for example.

But every once in a while, you may find that you need some boolean combination of metadata—a compound condition—where, for example, the substitution value for Solaris and Firefox differs from the value for Solaris and Opera.

Conditionals don’t allow for that kind of capability, but composition does. To get that behavior, you would create a set of dual-dimension files like these:

metadata_platform_browser_variables.dita

metadata_platform_browser_windows_firefox.dita

…etc…

Authors would then need to know to look for some variables in the dual-dimension file, rather than in one of the single-dimension files.

Simulating Metadata Hierarchy

Ideally, it would be nice to have hierarchical metadata that looks like this: solaris, solaris:32, and solaris:64 where content elements are tagged with one of the three.

  • When producing a document for 32-bit Solaris, solaris:32 is specified, but all items tagged solaris are automatically included.
  • When producing a generic Solaris document, solaris is specified, and all items tagged solaris:32 and solaris:64 are included.

With conditional metadata, the closest we can come is to create metadata that looks like this: solaris, solaris_32, and solaris_64. It looks similar, but it’s not actually a hierarchy. Then:

  • To keep authoring simple, content elements are tagged with one of the three, as before.
  • When producing a document for 32-bit Solaris, both solaris and solaris_32 are specified for inclusion.
  • When producing a generic Solaris document, solaris, solaris_32, and solaris_64 are all specified.

It’s not exactly the same as a true hierarchy, but it works pretty much the same. The authoring task is no more difficult than it was before and, as long as you get the production scripts right, you get the expected results.

With composition, that effect is somewhat harder to achieve—and some things can’t be done at all. Imagine a topic that looks like this:

<step><cmd>Go to the installation directory at

<ph conref=”metadata_platform. dita#install_dir”/>.

</cmd>

</step>

<step conref=”metadata_platform.dita#install_additional_64_bit_package”/>

<step>

<cmd conref=”metadata_platform. dita#platform_specifc_install”/>

</step>

The idea is to include that middle step for a 64-bit install but to leave it out for a 32-bit install. That’s easily done with conditionals. But it won’t work with composition. To resolve the reference, you’ll have to include an empty step, which won’t read very well. So the best you could do here would be to create a step that says “Install additional packages, if any”, and then include the phrase “No additional packages.” It’s ugly, but it would solve the problem, more or less. The alternatives are to restructure things so that the additional installation step is in a topic of its own (so the conditionals are confined to the map, as in the hybrid alternative), or else consider the situation an exception in which in-topic conditionals are required.

The situation represented by the third step can be handled with greater accuracy, but it still takes a bit of work. The nature of the third step depends on whether you are doing a 32-bit or a 64-bit install. So your file substitution set could look like this:

metadata_platform_variables.dita

metadata_platform_solaris_32.dita

metadata_platform_solaris_64.dita

With that implementation, common solaris values would be duplicated in the two values files. Of course, if you had your heart set on avoiding duplication, it would be possible to do so. You would have the same variables file and then divide the values into files for solaris, solaris_32, and solaris_64. You would then manufacture the substitution-file at production time by combining the values in the solaris file with the values from one of the other two files.

In practical terms, that’s a lot of additional effort for results that are more difficult to predict. In most every case, it will make more sense to live with the duplication in return for a single, easily-reviewed values file. But it’s an interesting thought experiment to imagine how the problem could be solved, if you needed to.

Improving the Review Process

The goal is to display all possible values of a conref so that you can provide the same kind of review capability for topic composition that you get with conditional metadata (the ability to see all possible values in one place). To do that, we can use the DITA-OT to generate an output that shows all possible values for each conref, tagged with the metadata that produces it.

In the three-platform example discussed above, we want the copy published for review to look like this:

1. Go to the installation directory at

solaris: /opt/product

linux: /usr/product

windows: C:\Program Files\product

Output of that kind not only nullifies the advantage of conditional text for review, it goes conditional text one better, because you see the metadata highlighted, in addition to the values. (Without the metadata, a review copy that includes all possible values can be difficult to read—and while the displayed values may be correct, it’s impossible to know which metadata they’ve been tagged with.)

To produce that kind of output, we need to combine all elements with the same ID from multiple values files into a single <ph> element, coupled with an identifier (solaris, linux, etc.) for each value. Substituting that file at production time displays all possible values, tagged with the appropriate metadata…

Here is an outline of the procedure:

Given a file named metadata_platform_variables.dita:

Start a new file called metadata_platform_ tagged_values.dita

For each <ph> element in the variables file with an ID, add a new <ph> element to the target file that looks like this:

<ph id=”xyz”>

</ph>

where the ID is the same as the ID in
the variables file.

For all other files named metadata_platform_*.dita:

Extract the METADATA_VALUE from the
name of the file (represented by the wildcard, “*”).

For each <ph> element with an ID in the file:

extract the CONTENT from the element.

Add the tagged value to the <ph> element in the tagged_values file:
<ph id=”xyz”>

<! break !> <b>#{METADATA_
VALUE}:</b> #{CONTENT}

</ph>

where:

The processing instruction tells where to insert a line break “#{X}” is the
Ruby syntax for string interpolation.
It says to insert the value of variable
X into the string.

To produce a copy for review:

Generate HTML, substituting metadata_
platform_tagged_values.dita
in the
processing stream.

Convert the processing instructions to <br/>
tags in post-processing.

Automating the Composition Process

If multiple values files need to be substituted at production time, it makes sense to automate the process. That way, you can ensure that the correct files are substituted every time you produce a given deliverable.

In an ANT script, the substitutions could be made at the start of a task. When invoking the OT from the command line, a wrapper script can be created that does the substitutions before invoking the OT.

In either case, the substitutions need to match the data specified in the .ditaval file that drives production (if there is one). It would even be possible to create the substitution set by examining the .ditaval file.

To do that, it’s necessary to have a naming convention for the values files. Using the naming conventions described in this article, the process must do the following:

  • Examine the ditaval file, extracting the metadata property name and associated value for all entries that specify “include”.
  • Look for the corresponding metadata files.
  • Do the appropriate renaming.

Given this .ditaval specification:

<val>

<prop att=”platform”
val=”opensolaris” action=”include”/>

<prop att=”browser” val=”firefox”
action=”include”/>

</val>

The script would substitute files of the form metadata_<property>_<value> for each entry. It would look for metadata_platform_solaris.dita and metadata_browser_firefox.dita.

Although it is somewhat more difficult to do so, the script also needs to look for compound metadata files of the form

metadata_<property1>_<property2>_<value1>_<value2>.dita. In this case, it might need to substitute a file named metadata_platform_browser_opensolaris_firefox.dita

Of course, while that level of automation is interesting to contemplate, it is probably overkill in nearly every case. A few lines in a script or ANT task that does the substitutions are all that is really necessary, most of the time.

Transcluding the Invariants

Thanks to Ted Nelson, we have the word “transclude” in our vocabulary, which means “take something from somewhere else and include it in-line here.” So a DITA conref is a way of transcluding material into a topic from somewhere else.

It’s second nature, I guess, to consider the small differences in things as part of the “foreground,” and assume that the larger, invariant pieces form the “background” (backdrop, context) against which the actors move in the foreground. In this case, the foreground is a conref, while the background is the context into which the conref fits.

So when there is a small bit of boilerplate that needs to be included in multiple files, it’s easy to consider that text the foreground, make a conref out of it, and transclude it wherever it’s needed.

So far in this article, the bits and pieces that change (the variant data) have been very small compared to the topic in which they fit—a single path, for example, or a command. And it has been somewhat natural to look at those small variants as the “foreground,” conrefing them into the larger, “background” context.

But there is a good case to be made for the opposite approach: encoding the variable information in a topic of its own and transcluding chunks of background material around it to create context.

Note:

My thanks to Deborah Pickett for pointing out the value of this approach and to Sowmya Kannan for recognizing it as a potential solution for a key design problem we’re facing (a recognition that was aided by the multiple alternatives she explored.)

So rather than having a single topic with contents that look like this:

Install.dita

…lots of text…

<ph conref=”metadata_platform.
dita#install_path”/>

…lots more text…

We can turn things around 180 degrees and have multiple topics that look like this:

Solaris_Install.dita

<ph conref=”base_topic.dita#lots_
of_text”/>

<ph>…Solaris installation
path…</ph>

<ph conref=”base_topic.dita#lots_
more_text”/>

Linux_Install.dita

<ph conref=”base_topic.dita#lots_
of_text”/>

<ph>…Linux installation path…
</ph>

<ph conref=”base_topic.dita#lots_
more_text”/>

Note:

The key enabler that makes such an arrangement possible is an authoring tool that shows transclusions in context. As long as your editor does that, it’s a pretty viable solution that writers can deal with. Without that behavior, the solution would be pretty darn ugly and unlikely to get much traction in the real world.

To make such topics work, we need a base topic that has the parts we need to transclude. Then we need to make a topic for each variant we want to produce, transclude all of the parts, and add the variant information.

One downside to this approach is that it’s clearly a lot of work. Another is that, like conditionals, the information needed to create a complete document is once again scattered throughout the topic set. That makes it harder, for example, to construct a complete list of everything you need to know to produce documents for a new platform.

But the approach does have a significant advantage—it lets you create links to the topic variants, give them titles of their own, and include them in a DITA map—all without having to do any file swapping.

In our case, that advantage may well turn out to be the one that decides our final design. It will let us create a DITA map that produces a document like this:

Applet Developer’s Guide

Developing an Applet

Deploying an Applet

Making a JAR File (Solaris)

Making a JAR File (Windows)

That sort of thing is pretty darn hard to do if there is only one topic called “Making a JAR File.” It could conceivably be done, but it would take a lot of work (solution below). But even when it works, the resulting DITA map can only be used for generating HTML pages. It wouldn’t be useful for generating PDFs, DocBook files, help files, or anything else.

Alternative Solution Using a Shared File

  • Generate those pages separately from the main DITA map. Since they’ll have the same name, they’ll need to be in separate directories. Say:

solaris/making_jar.html

windows/making_jar.html

  • Set the scope to “peer” in the topicrefs in the DITA map. That causes the links to be generated in the TOC, but doesn’t cause the topics to be generated.
  • Since the generated links would be invalid the way they are, add an outputclass attribute in the DITA map to guide downstream processing and then add a plugin to the DITA OT to modify the links appropriately.

Note:

I’m not taking credit for this solution. I’m only reporting it.

Conclusion

There is a lot to be said for composition as an architectural strategy. It’s worth exploring. Having said that, the key decision is whether to transclude the variable information or the invariants. Whichever way you decide to go, you have a lot of power at your disposal. CIDMIconNewsletter

Eric ArmstrongEric Armstrong

Sun Microsystems, Inc.

eric.armstrong@sun.com

Eric Armstrong has been programming since before there were personal computers. His programs have included real time programs, business applications, and AI programs. When he became a writer, he noticed a distinct lack of automation for common writing tasks. Since then, he has found his greatest joy in creating tools that make a writer’s life easier. That focus caused him to enthusiastically embrace DITA—a topic he often blogs about at blogs.sun.com/coolstuff.

This series of articles was motivated, in part, by a roundtable discussion at the July 2008 meeting of the Silicon Valley DITA Interest Group (SVDIG). It records many of the thoughts that surfaced during the meeting about the kinds of decisions you need to make when starting a DITA project. Those thoughts combined nicely with information gathered in conjunction with the lead DITA architect for JavaSE docs, Sowmya Kannan. In the process, we leaned heavily on information gleaned from Alfresco’s documentation manager, Briana Wherry, and her lead architect, Janys Kobernick.

[level-logged-in]

We use cookies to monitor the traffic on this web site in order to provide the best experience possible. By continuing to use this site you are consenting to this practice. | Close