Leveraging Conditions as Metadata
In the Information Development (Info Dev) team within Symantec, we have been working in an XML-based content management system (CMS) for several years. While we make use of some CMS-provided metadata (such as date created and author), we have not yet implemented a rich metadata strategy based on content. In terms of a metadata strategy for Info Dev at Symantec, we wanted to develop metadata for internal use first so that writers could easily locate content in the CMS to reuse in their documentation. Using internal metadata would also be a learning process to ensure success when we implement an external metadata strategy to assist customers in finding content based on their needs (for example, by product or the type of information sought). In this article, I will discuss how we are beginning to implement metadata using information that writers are already applying—conditions.
As the person tasked with supplying a metadata strategy, the thought of developing one from scratch across our immense line of products was daunting. Info Dev writes all of the content that appears in end-user manuals and in Help systems for all of our products. We currently have hundreds of thousands of topics. As I began contemplating this task, I also began the process of normalizing condition values that were being applied by writers in the CMS. This task was serendipitous as it unearthed a clear way to begin.
In technical documentation, FrameMaker has traditionally been used as the text editor of choice. In FrameMaker, functionality allows writers to select and tag text in a document with one or more values called conditions. Using conditions, you can show or hide text or images based on the values that you assign. When you are ready to create output (such as a PDF) from FrameMaker, you specify the conditions that you want to show in the output. Any content that has no conditions applied is included in the output. Some example conditional values include the output type (such as Help for a Help system), operating system (such as Windows or Mac), and audience (such as administrator or end user).
Immediately, I discovered that many of the values that are used as conditions would also be useful for metadata. The required operating system or intended audience, for example, would be helpful in the content discovery process for both internal and external users of the content. In addition, conditions were already being applied by writers in the CMS; they are not a new requirement. Conditions and metadata could also both be based on an internal company terminology list. And, since our documentation is in XML, we have the potential for automation, which I’ll discuss in more detail later.
There are two XML DTDs that are used primarily for technical documentation: DocBook and DITA. In DocBook, the term that is used for conditional publishing is profiling. In DITA, it is conditional processing. Both terms are defined as applying attribute values (like conditions) to filter content that you want or don’t want to appear in a specific output.
At Symantec, we are using DocBook. We initially implemented profiling using a single attribute (condition) to emulate our working environment in FrameMaker. In recent months, we’ve realized a need for a more complex profiling strategy. Profiling in DocBook requires that the Boolean ‘OR’ be used if you have multiple values applied to a single attribute. This requirement leads to a proliferation of conditions to account for all of the variations that would be needed for an ‘AND’ output (for example, solaris, solaris_help, and solaris_print).
Following is an example of some XML markup that illustrates the issues that we were experiencing in our initial profiling strategy:
<para condition=“solaris;admin”>Text for Solaris administrator output.</para>
<para condition=“solaris;user”>Text for Solaris user output.</para>
<para condition=“linux;user”>Text for Linux user output.</para>
In DocBook, the Edition element contains the conditional values that you want to appear in your output. In this example, the values “solaris;user” are specified. The values in the Edition string are supposed to include only the second paragraph (tagged “solaris;user”). However, since the separator (;) acts as a Boolean ‘OR,’ and the values “solaris” or “user” appear in all three paragraphs, they would all be included in the output.
To solve this issue, writers began combining multiple values into a single condition (for example, solaris_print, solaris_help, linux_print). This proliferation of values is not a good long-term solution because any number of combinations would be needed across products, versions, output types, and so on.
There are other attributes in DocBook besides condition that can be used for profiling. At Symantec, we refer to the use of multiple attributes for profiling as enhanced profiling to differentiate it from our original profiling strategy. Profiling is enhanced because
- a Boolean ‘AND’ is used among attributes in DocBook, which solves the Boolean ‘OR’ issue that was illustrated previously, and
- rather than storing all values in the non-intuitively named condition attribute, more intuitive attributes are used to hold values.
Using multiple attributes also assists in implementing and managing constrained lists because there are several intuitive categories (attributes) rather than a single, lengthy list of condition attribute values. The potential for metadata automation is also reached using this strategy.
One important issue to keep in mind is that attributes are available at different element levels. For example, there are some attributes that can only be applied at a section level, a chapter level, or a book level. To use attributes for profiling purposes, they must be available on every element (as condition is). They must also be predefined as attributes that can be used for profiling. The current DocBook attributes that meet these needs are
- Arch: The computer or chip architecture to which the content applies
- Condition: A value (not covered by other attributes listed here) that is used to include or exclude content from certain outputs
- Conformance: Standards conformance characteristics of the content
- OS: The operating system to which the content applies
- Remap: An element name or similar semantic identifier assigned to the
- content in a previous markup scheme
- Security: The security level associated with the content
- UserLevel: The level of user experience to which the content applies
- Vendor: The computer vendor to which the content applies
The current DITA attributes that meet these needs are
- Audience: The intended audience for the content
- Platform: The platform to which the content applies
- Product: The product to which the content applies
- Rev: The revision or draft to which the content applies
- Otherprops: A value (not covered by other attributes listed here) that is used to include or exclude content from certain outputs
There were attributes that we needed based on our conditional text normalization that are not available in either DocBook or DITA. Output (print and Help, for example) and Feature (for Symantec product features) were two such attributes. We also encountered other limitations. Some of the attributes that we needed were already in use for other purposes (Localization and Help requirements). While customization is an option, if you customize DocBook or DITA attributes, you are no longer DocBook or DITA compliant, which means that you will have to customize other tools that you are using (XSLTs, for example). Adding new attributes or changing existing attribute names are two such customizations.
Following is the example text that we reviewed earlier using the newly implemented enhanced profiling strategy:
<para OS=“solaris” UserLevel=“admin”>Text for Solaris administrator output.</para>
<para OS=“solaris” UserLevel=“user”>Text for Solaris user output.</para>
<para OS=“linux” UserLevel=“user”>Text for Linux user output.</para>
In the Edition element, the attribute values OS=“solaris” UserLevel=“user” are specified. To be included, the text must meet all of the listed values (Boolean ‘AND’). OS must equal solaris AND UserLevel must equal user. Only the second paragraph will be included in the output.
Now that we’re established attributes (OS and UserLevel, for example) and controlled vocabularies for those attributes, they can propagate up to the topic level and be stored as metadata. In DocBook, for example, metadata values are stored at the section level as keywords within the sectioninfo element. If you had a paragraph tagged with OS=“solaris” and UserLevel=“user,” the values “solaris” and “user” could become keyword values for the section. To get even more refined results, you could also assign a role attribute to each keyword and have the attribute names populate them. For example, for the keyword “solaris,” a role of “OS” could be automatically assigned. You could then use the controlled vocabularies that you implemented in each attribute for your metadata keywords as well.
Even if you have content that isn’t profiled, you can benefit from metadata automation. Since many attribute values would be known to be true of the entire body of content (the required operating system, for example), you could apply them at a higher level (such as at a book or chapter level) and use a script to push them to each child topic.
Beginning a metadata strategy by using conditional values is beneficial in many ways. First, by automating the process, writers aren’t tasked with another writing assignment—they are already capturing condition values as they develop content. Second, they experience the value of metadata and are open to applying additional metadata in the future.
About the Author
Monti Lawrence is an Information Architect at Symantec. Since joining Symantec in 2000, Monti has contributed to the selection and implementation of an XML-based content management system and developed and maintained content models. She is developing a metadata strategy and best practices in the areas of information design and reuse. Monti has worked in editorial and writing positions with McGraw-Hill and several small business web sites. She is currently pursuing a Masters in Library and Information Science at the University of California, Los Angeles.