Dawn Stevens, Comtech Services, Inc.

The DITA architecture strives to accommodate the requirements of many different information-development communities spanning a wide variety of content domains. With each update, as DITA strives to accommodate even more communities, it frequently comes under fire as being too complex, allowing an unwieldy plethora of information types, elements, and attributes that are typically more than any one organization requires or than authors are prepared to handle.

To address criticisms of complexity, DITA 1.2 introduced a variety of mechanisms that allow organizations to more easily modify the DITA architecture to best meet the requirements of their content and their information developers. But if the DITA architecture is already perceived as complex, the decisions about what to keep and what to hide are equally perplexing to organizations new to structured authoring and semantic coding. Without experienced guidance, organizations tend to the extremes—leave it all in or take it all out—and as is the case with most extreme solutions, neither approach hits the mark, instead leaving a poor first impression as to DITA’s usability and/or reusability within the organization.

Constraining DITA first requires a well-defined information architecture based on a thorough content analysis, an understanding of reader needs and expectations for content accessibility, and an appreciation of the author’s role in and impact on a successful implementation of that architecture. It is a balancing act to find the right mix of elements that provides a meaningful classification of content, supports user search and scanning strategies, and minimizes the choices authors must make while writing. Information architects and other decision makers need to understand the implications of their decisions on both author and user. This article addresses the considerations to take into account while making decisions about constraining DITA elements and the potential implications of those decisions.

Content Considerations

Contrary to what might be inferred from the number of DITA elements available, the designers of DITA did not set out to create a long list of elements just to confuse authors; they do not take pleasure at the idea of poor authors spending hours debating whether content should be coded as a <msgblock> or a <codeblock> or whether they really need the <properties> element when the <table> element can handle the same content. Instead, each element was identified as a meaningful classification of content for specific types of information. Thus, your first step in determining what elements to use should be a detailed content analysis—what kind of content do you have in your library? How would you classify it?

Consider these types of questions as you analyze your content:

  • What type of data do you make into tables? How complex are your tables? Can you get by with only a simple table format that limits your ability to span rows and columns?
  • Why do you use fonts other than the standard body font? Is it simply for emphasis or is it calling attention to a specific type of information—a file path, menu selection, or command name, for example?
  • When do you use lists? Do you need to use ordered lists outside of the context of numbered steps?
  • Is there ever a situation in which you provide instructions where you do not number steps?
  • Do you provide a glossary? What information is provided in those definitions?
  • What type of information do you include on your covers and in your legal information?

The answers to these and many more questions like them help you to identify the elements that have meaning within the context of your content. If an element has no meaning or application in your context, clearly you don’t need it cluttering your authoring environment. However, just because there is meaning doesn’t guarantee a place either. You still need to consider the expectations of your users and authors.

Reader considerations

Let’s be honest, readers don’t care what tools or standards you use to create their documentation, and they will never know how many DITA elements you choose to keep in your environment. Nevertheless, their needs must be considered as you develop your constraints strategy. Ultimately, the success of your documentation centers around whether users can find the information they need, and DITA elements can play a role in that.

One of the tenets of semantic coding, such as that used in DITA, is to aid in information gathering and retrieval. Rather than a simple character-for-character match in a search, the context can also be taken into account. For example, suppose a user wants to find rules about Boolean operators and types “AND” into their search criteria. The user doesn’t want a list of every topic that contains the word “and” which would likely be the entire library. Instead, he or she expects that the writers of the content have included tags in relevant topics that will raise these topics to the top of the search results list. You must understand the needs of your users to develop an effective metadata taxonomy, which might possibly extend deeper than the topic level and touch the individual elements within a topic.

In addition, although we make every effort to separate form and content, one key reason for making the distinction between various elements, such as inline elements, is formatting concerns. Users of our documentation will scan a page or topic for the information they need, and if we can anticipate that information, we can set it apart visually in the text to draw the eye’s attention and aid in that search. Conversely, however, too many visually distinct elements get in the way of this strategy and, in the extreme, can even create a “ransom note” effect.

Consider these types of questions as you analyze your user’s needs:

  • What information will your user be looking for?
  • How do your users classify and categorize the information they need? What keywords will they use?
  • What search strategies will your users employ?

If your users won’t or can’t make a distinction between categories of information that you have identified in your content analysis, there may be no reason to keep certain elements in your environment. However, before you eliminate them, make sure you consider the needs of your authors as well.

Author considerations

Obviously, even if you have selected the optimum set of elements for your content and reader concerns, your efforts are pointless if your authors don’t, won’t, or can’t use them. Successful search strategies rely on consistent coding of the elements. Many argue that a simplified set of elements results in better author acceptance of the standard. With too many elements to choose from, authors get confused about which to use in what situation and either make incorrect selections, or give up and don’t use them at all. In addition, if it isn’t clear to the authors when content is considered <systemoutput>, <codeblock>, or <msgph>, you can probably be sure that your users won’t be making the distinction either.

In addition, without simplifying the authoring environment, more “creative” authors will likely find ways to force what they want by using obscure tags that you’ve left in the environment, but did not intend for them to use, or by using tags where you never intended them to be used. By applying constraints, you not only eliminate tags that are unnecessary, but also define when and where the tags you keep can be used, limiting the options that are valid, but not considered best practices in your own authoring environment. For example, you might want to prohibit writers from using highlighting elements such as <b> and <i> to enforce the separation of format from content.

However, simply eliminating all tags that aren’t justified by your content and user analysis potentially does a disservice to your authoring community. Not only should you consider how your end users search for information, but how your authors will as well. A common reason for using a DITA topic-based architecture is reusability of content. Content can’t be reused, however, if the author isn’t aware it exists or can’t find it. DITA elements that might not be relevant to your users’ search strategies could be crucial to your authors. You may find that the extra effort spent tagging the content when it’s written more than outweighs the effort required to find information that was not tagged.

In addition, tagging certain elements can help speed up other parts of your process, even if it slows down the original writing time. For example, many editors allow you to exclude content within specific tags, and you can instruct your translators not to translate specifically tagged elements. In some types of documents, this can save hours of effort.

Finally, consider whom you might share your content with outside your own organization. If your tech pubs environment is constrained to not include the learning and training specialization, does that eliminate a potential source of content because learning and training content will not be valid or recognized in your environment? Do you absolutely have to have <apiname> when you know that the marketing department has constrained the entire programming elements domain and therefore cannot use your content without some kind of adjustment?

Ask yourself these questions when making that final cut of elements to keep in your authoring environment:

  • Can authors easily distinguish between the elements I am keeping? Is it clear when to use each?
  • How will authors need to access information in my content repository? How will they classify and categorize the information they need? What keywords will they use? What search strategies will they employ?
  • Are any constraints implemented across other organizations I share content with? Can the groups meet to determine a common element set?
  • Does the time it will take to code these additional elements save time later in the process?
  • How will you verify and enforce the proper use of the elements you keep?

Clearly elements that are of no use to either reader or author can be removed from your environment without any further analysis. But you’ll likely find a handful of elements where the decision is not black and white, where the elements might be of use to your authors but not likely to your users. Here you need to weigh which has greater impact on your measure of success. However, in general, it is likely best to err on the more restrictive side. Remember, you can always go back and add elements as coding their content becomes second nature to your authors and they begin to see the value in what the additional tags could offer.