Paul Wlodarczyk, easyDITA
Reprinted with permission from

Many organizations using DITA are in the early stages of developing taxonomies and metadata strategies for improving findability of their DITA content.

There are several huge benefits to developing basic taxonomies and a metadata scheme if you haven’t done so already:

1. Help your authors locate DITA content for reuse.
2. Help your project managers and content managers wrangle your content.
3. Help your end users find what they’re looking for, whether that’s on a web site, in a portal, or via some other dynamic content delivery system.

Taxonomy and metadata can seem like scary or complex turf to the uninitiated—but they don’t have to be. I like the idea of “search-first” thinking as a way of guiding basic but effective content strategies. The first person I heard apply the “search-first” concept to content strategy was James Matthewson, who leads Enterprise Search Strategy for IBM. As James defines it, “search-first” is

“… a strategy that structures the whole content enterprise around search experiences, from messaging to writing, to coding, to production, to architecture and UX design. If companies structure their content strategy around search, they maximize their investment in content for both search crawlers and their target audiences.”

Creating content in DITA is already a great first step for search-first, since your content is granular and can be easily published to a dynamic delivery system.

Developing a metadata strategy for a search-first approach can actually be fun! The approach I like is a lot like playing Mad Libs—it involves writing search use cases or user stories with blanks to fill in. You can play this Mad Libs game with your team, your SMEs, and end users. Play one or two rounds thinking about how authors and managers search for content, then repeat the process thinking about end users. Your Mad Lib use cases should read something like this:

  • Author: “I am trying to find content to reuse. List every
    [content type] that is about [this product], that describes [type of task], that uses [tool or part reference], that is newer than [date].” A completed example might read: “List every [Service Procedure] that is about [Acme Jetpack XL7], that describes [troubleshooting], that uses [fuel pump], that is newer than [1/1/2011].”
  • Project Manager: “I am trying to find content that needs attention. List everything that is [status] for [project] that has [annotation type] from users in [group name] that were created since [date].” A real example might be: “List everything that is [In Review] for [Acme Jetpack XL7 User Guide] that has [unresolved proposed changes] from users in [Environmental Health and Safety] that were created since [6/1/2012].”
  • Content Manager: “I am trying to make our content secure. Show me everything that is classified as [security classification] that mentions [customer] that is older than [date],” where security classification could be “Top Secret” and customer could be “Wiley Coyote.”
  • End user: “I am trying to troubleshoot something. Show me everything about with that describes [symptom] or [fault code].” An example of this could read, “Show me everything about [Acme Jetpack XL7] with [turbocharger] that describes [blue smoke from top of unit].”

When you’re done playing this game of Mad Libs, you should have a list of “blanks” that your users will typically want to fill in when searching for content. These blanks are your essential metadata fields.

Now let’s turn this list into a metadata schema that you can use for configuring your component content management system. For each metadata field you listed, ask yourself:

  • What do we call this?
  • How do we describe it in 50 words or less so that everyone will know what we mean?
  • What type of information is it? (this could be Single line of text, multiple lines of text, Yes/No, choice, checkbox, or an entry from a Controlled Vocabulary)
  • Is it auto-populated (like a date or a user name) or does it require manual entry?
  • Can it have multiple values (for instance, fault codes in a service procedure)?
  • Where does the value come from (for instance, is it made up by the author, chosen from a specific controlled vocabulary, or supplied by a system)?
  • Is it required to have a value or can it be blank?
  • Is there a default value?
  • Do people need to see it? If so, who and where?

Once you’ve finished this process, you have a Metadata Schema! Not as hard as you thought, was it? Not only do you have a metadata schema defined, but you are assuring that the metadata you’re managing actually matters—because it will help your authors, managers, and end users find what they need—whether in your CMS, on the web, or in a portal.