A Preview of Coming Attractions: DITA constraints
The DITA 1.2 standard is getting ready to depart the station at OASIS with a number of new tools for DITA adopters. These tools include constraints, a refinement of the DITA architecture that makes it easier for you to create consistent, tailored content with minimal investment.
On your first hearing, the name might sound mysterious or difficult, even like a contortionist technique. What are constraints? How can something that limits my options be good? What exotic authoring tools will I need? This article explains how, with constraints, less can be more. We’ll start by taking a step back and considering the role of XML in producing high-quality content with many uses.
From Guidelines and Templates to Specialized Information Types
Content creators have long recognized the value of consistency. Readers get more from content with less work when they can, for instance, reliably find an introduction at the start or a list of related resources at the end of each article. Content creators historically have codified such practices in style guides, often accompanied by templates that writers copy and edit to start new documents.
The guideline-and-template approach puts a heavy burden on writers to internalize the guidelines. Editors have to focus on narrow questions of article structure, cracking the bullwhip, rather than looking at larger questions such as the coverage of the subject matter. If the writer or editor has a heavy lunch or a frenzied deadline, inconsistencies slip through. As a result, your document processing can’t take advantage of the consistency that you’ve worked so hard to cultivate. If you try to depend on content consistency, a deviation is bound to break your web site someday (usually announced by a 3:00 am phone call).
XML vocabularies help by proactive checking. As the writer works, a good XML authoring tool guides the writer through filling in the document structure. When the writer finishes, the tagging shows the document structure for anyone to see. You can write reliable processes that pull out introductions to create lists or that filter related resources for different audiences. You can also publish the content in many different formats or change the styling at any time.
DITA specialization applies a key principle to simplify the creation of XML vocabularies. Instead of treating each vocabulary as a blank-slate design challenge, DITA specialization treats special vocabularies as a consistent pattern imposed on a more general kind of document. For instance, factual reference topics can be viewed as general topics that conform to a pattern of lists, tables, examples, and so on. That makes it possible to specialize the DITA reference vocabulary from the DITA general-purpose topic vocabulary. Similarly, API reference topics can be seen as reference topics that conform to a pattern of listing the fields, methods, parameters, and return values of an API. That makes it possible to specialize the DITA API reference vocabulary from the DITA reference vocabulary.
If some markup is good, is more markup better? Should you tag every last participle and adverb? That might be a great approach for annotating a document to produce an automated linguistic analysis. Even the most adaptable writer, however, would soon start cursing when crafting a thin ribbon of text within a thicket of XML angle brackets. For most writers, the best markup captures what the writer already knows and has a visible effect on the use of the document.
While there’s no absolute formula for good XML vocabularies, consensus within communities provides a practical recipe for good vocabulary design. Good report documents for one financial company are likely to resemble good documents for other financial companies, as shown in Figure 1. That recognition of useful document consistency across organizations is the fundamental basis for information typing, whether you implement information types with guidelines and templates or as XML vocabularies.
Figure 1: Financial Organizations with Consensus on the Structure of Report Documents
The goal of consistency means you should never undertake a new information type lightly. While DITA specialization simplifies the definition of XML vocabularies, the implementation has costs, particularly for the design. Getting the design right means working with others to arrive at consensus on the definition of the information type.
Simplifying Vocabularies or Enforcing Best Practices
While working with the community is the best approach for a complete and stable information type, the resulting XML vocabulary can err on the side of completeness. What if the vocabulary has more than you want? Should you start creating templates and guidelines in XML?
Here are some common examples of this predicament:
The DITA software domain module includes seven elements that become available as alternatives to the generic phrase and keyword elements. That gives adopters the vocabulary to handle a variety of situations for software documentation.
What if your writers only need the command name and system output elements? Do your writers have to learn to shun the extra elements?
The section element allows any mix of text, phrase elements, the title element, and block elements like paragraphs and lists. That flexibility allows for a wide variety of grouped content.
What if your sections are always titled groups of paragraphs and lists? Do your editors have to watch out for untitled sections?
Optional best practices
In DITA topics, the introductory short description is optional. That way, documents can be migrated to valid DITA topics while deferring manual editing to turn parts of a chapter into standalone topics.
What if you want a short description available when hovering on any link to a topic? To make sure every topic has a short description, do you need to write your own validation scripts that check for topics without short descriptions?
You wouldn’t want to specialize to meet these requirements. The vocabulary already assigns the right names to the pieces of content. Specialization would hide the consistency between your documents and similar documents by using different element names for the same kind of content.
These scenarios are a job for constraints. Instead of creating new XML elements, you can, with less effort, tailor the existing XML vocabulary to fit. That means simpler and more productive authoring for your writers while they still create documents that are immediately understandable to anyone or any formatting engine that’s familiar with the information type. In fact, the only way to detect the constraint on a task topic is to put your magnifying glass over the architectural XML attributes on the topic, which exist for the rare cases where a process might care about constraints.
Implementing Constraints on a Vocabulary
From the beginning, DITA has required vocabulary designers to conform to a design pattern for a DTD or XML Schema. The design pattern is key to DITA because it supports modularity and extensibility while still validating DITA documents with vanilla XML tools.
DITA 1.2 refines the DITA design pattern so designs can be constrained. Let’s leave the gritty details of the revised design pattern for another day. For now, just know that, for validation of documents, a constrained document type is just another DTD or Schema. Thus, if you’ve got authoring and processing tools that support DTDs or Schemas, you already have an XML production environment with basic support for constraints.
You can constrain an existing information type in two ways:
A DITA domain module adds alternatives for more general elements. For instance, the software domain module adds elements for tagging commands, user input, system output, and so on. These alternatives can appear anywhere that the base phrase or keyword elements can appear.
A constraint can drop any elements out of the list of alternatives, including the base element. So, you can treat domain modules as a buffet for sampling what you want instead of as a delivery truck that drops off an entire pallet.
Restricted content or attributes
A specialized DITA element can restrict the content or attributes of its base element. If you recall that a specialized reference, for instance, just tags the subset of topics that provide reference information, this restriction rule makes sense. A specialized element can change the short description element from optional to required or drop the optional platform attribute. A specialized element can’t, however, add a new timestamp attribute.
A constraint can restrict an element in the same way as a specialization. A constraint just can’t add any new names to the XML vocabulary. For that, specialization remains the right tool for the job.
The implementation of a constraint has one final piece. As with specialization, the document type must have architectural attributes with a default value that names the constraint. Writers never see this value, but processors have the option to check for constraints.
Interoperability with Other DITA Adopters
A task topic that you’ve constrained to make the short description required looks just like a regular task. You can turn a constrained task into a regular task just by switching the DTD or XML Schema. After that, no one can tell the task was originally created with a constrained document type. The only difference is that, after the switch to the regular task, a writer could delete the short description.
You can’t, however, switch a regular task to the constrained task without first checking whether the regular task has a short description. That’s the same issue you run into when converting a topic from a general-purpose topic to a specialized reference topic. You have to confirm that the topic has good reference content.
The catch is in the expectations of the writers. A specialized topic type has different element names. Thus, writers can easily spot the difference in documents with different information types. By contrast, a constrained task has the same element names as a regular task. Who could blame a writer for being frustrated if two seemingly identical documents have an invisible incompatibility?
Fortunately, you can avoid team mobs with torches and pitchforks and ensure interoperability with other DITA adopters by following a few simple rules:
- Author using consistent constraints. If you constrain the task to require a short description, always use your constrained task instead of the general task. That way, writers will never get baffled trying to copy and paste from a task without a short description into a task that requires a short description.
- Process the unconstrained information type. While you might devote special attention to tasks with short description, make sure your processing can provide some kind of fallback for tasks without a short description. You’ll sleep soundly at night, knowing that if a merger or partnership suddenly forces you to work with documents that don’t follow your best practices, you can still publish. To put it another way, use constraints to control authoring within your organization but don’t depend on constraints for processing.
Note: If content deviating from your best practices is simply not acceptable for publication, you can choose to check the constraint not only during authoring but also during processing. Constraints preserved during processing are known as strong constraints. With strong constraints, documents can include content fragments only from other documents that have the same constraint. For instance, a topic that requires a short description can only include nested topics that also require a short description. In short, strong constraints are available when you want to limit the input for your processes to a subset of an XML vocabulary.
Benefits for Design
Currently, a designer has to walk a tightrope when specializing an XML vocabulary. The designer tries to find the optimal blend between handling the full set of requirements for the community or conforming to best practices. The original DITA task offers an example of this tension. Because task content is so crucial for goal-oriented guidance, the original DITA task erred on the side of best practices. Handling a wider variety of task content was raised as an important requirement for DITA 1.2.
The OASIS DITA Technical Committee plans to address this requirement by relaxing the design of the task vocabulary to allow more varied task content. The Committee proposes to continue to support the best-practice task for current DITA adopters by introducing a constraint that restores the restrictions relaxed in the general task.
This approach offers a lesson for designers. A designer can specialize for broad requirements but still support best practices through an accompanying constraint.
With DITA 1.2, adopters will have two options for adapting DITA to meet your specific requirements for consistent, processable documents:
- Specialize to name a distinct kind of content with a new DITA vocabulary
- Constrain to simplify or require best practices for an existing DITA vocabulary
Far from tying you up in knots, constraints let you streamline the XML authoring experience and thus maximize the value of your XML investment. With DITA 1.2 on the way, you may want to start thinking about how to simplify or support the best practices for your organization.
Erik Hennum is an Information Model Architect with IBM. He has originated proposals for several DITA features including domain specialization, design constraints, map references, data extensibility, and metadata schemes. He has presented on DITA or subject classification at Semantic Technologies, Extreme XML, OASIS Symposium, Content Management Strategies, and other conferences and has worked on XML publishing pipelines, dynamic websites, and metadata search indexing.