Information Standards: A survival guide for considering, adopting, and living with them

Home/Publications/Best Practices Newsletter/2009 – Best Practices Newsletter/Information Standards: A survival guide for considering, adopting, and living with them

 

CIDM

April 2009


Information Standards: A survival guide for considering, adopting, and living with them


CIDMIconNewsletter Barry Schaeffer, Content Life-cycle Consulting, Inc.

It’s difficult to have a conversation about information management these days without finding yourself in a discussion of where you are on the road to information standards adoption and which standard you are going to adopt. Often, these discussions give the impression that adopting an information standard is roughly equivalent to buying a new server or upgrading the speed of your workstations. While some standards may be this straightforward, most are not and wrong impressions, if acted upon, can do real damage to an organization’s ability to meet its information management and delivery objectives.

In this article, we will examine information standards in the hope of equipping readers with the perspective they need to evaluate, enter, and survive an increasingly standards-focused world. The intent is not to discourage the adoption of information standards but instead to help potential adopters live through the process.

What is an information standard (and what is it not)?

Despite what standards purists may believe and claim, an information standard is a set of rules and specifications, often designed by committee and aimed at unification of the interaction among participants in a particular industry or government segment, application environment, or interest community… nothing more. In fact, an information standard has only three legitimate goals (the fourth described below is noteworthy but unfortunate):

It is an attempt to define a coherent interchange of data among many disparate but logically related users, thereby enhancing the entire community’s ability to communicate effectively. Indeed, at its core, an information standard is all about interchange. It cannot—and should not—seek to prescribe how adopters conduct their internal affairs, and it does not itself provide solutions to the challenges of information creation or management, pre- or post-adoption.

It can be an attempt to help encourage best practices by co-opting already successful conventions and making them into standards. Standards of this type normally begin as efforts by one or more organizations to develop, test, and profit from better ways of doing things. Postscript, for example, was designed by Adobe and placed directly in the public domain for use and comment, deliberately avoiding the standards process. Only after several years of growing acceptance did Postscript enter the standards world. Likewise, DITA, now a standard, was designed by IBM for its own library of documents. DITA entered internal use in 2000, was donated to OASIS in 2004, and finally became an OASIS standard in 2005. In both of these cases, the primary motivation of the original developers was better functionality and use, not standardization.

It is nearly always an attempt to create a software market of sufficient size and uniformity to attract the attention of the software industry in the hope that software firms will develop and market off-the-shelf tools for the standard’s users. The software industry typically ignores standards that fail to reach this critical mass, forcing adopters to develop their own supporting software tools or to attempt compliance without tools, neither a particularly attractive prospect. Indeed, the information graveyard is littered with “standards” that failed to develop a sufficient following to attract industry participation. Even government mandated standards—the DOD’s long-suffering CALS standards of the mid 1980s come to mind—will languish unless they can create sufficient magnetism to attract a critical mass of adherents.

It can be a cynical attempt to gain competitive advantage by perverting the standards process. Sadly, software giants have at times attempted to gain market share and freeze out competition by influencing, bullying, or cajoling their communities (read “markets”) to standardize on their strong suit. While largely unsuccessful, these efforts often leave a path of confusion and delay in their wake.

Are there different types of information standards … and should I care?

There are indeed several types of information standards, each aimed at different levels of the information life cycle and each bringing with it a unique set of characteristics and impacts. You should care because each type will have different but real impacts on your organization and must be approached in different ways. They may be grouped as follows:

Recording standards
(How should we record data we seek to process and exchange?) SGML, XML, CGM, SVG, ASCII, EBCDIC, and so on. While normally a given in any specific instance, those who think this level of standardization is non-controversial need only to read the history of the bitter battle between IBM’s EBCDIC (Extended Binary Coded Decimal Interchange Code) and ASCII (American Standard Code for Information Interchange.)

Communication packaging standards
(How can an entire sector exchange data at levels supported by operating systems and communication infrastructure?) These standards, like the TCP/IP stack, Ethernet, SMTP, SOAP, (and some uses of XML), and so on, generally fall between the lower level recording standards and higher level intellectual property standards described below. They provide common conduits through which that intellectual property can be described, queried, and exchanged. In essence, they become the package labels used to route data packages to their intended destinations.

Notational standards
The best known of these is XML (and SGML before it). These two standards are uniquely capable of describing any desired organization of data and intellectual property in a highly independent and portable way. A notational standard provides the syntax within which more complex intellectual content forms may be developed but avoids direct influence on the actual content for which they are used. For example, the SGML standard preamble describes SGML and its DTD map syntax as the “structure of content” but makes clear that it does not seek to influence the content itself. The XML standard, in response to requests for increased facility to “type” the content of XML elements, added the XML Schema and with it the ability to define some of the characteristics of content held inside XML elements.

It is worth mentioning that although relational databases are also an important part of the content world, they do not provide portable content notations capable of use outside the database software. SQL, normally used to access data from databases, is a powerful query language but does not itself provide a portable data notation.

Intellectual content standards
(How should information creators generate a usable flow of intellectual property among themselves, their consumers, and others?) Here again, SGML and XML serve as the foundation for the design, definition, and control of intellectual property through its life cycle. This level of information standards is often at once the most powerful and the most intrusive, carrying the greatest direct impacts on adopters’ operations. Intellectual content standards contain three broad facilities for definition and control:

  • Content structures. Generally built using a general syntax such as XML but focused on a particular type of content by definition of specific tag sets captured in DTDs or schemas; S1000D Data Modules, DITA Topics, Docbook PARA0s, ATA 2200 aircraft manual structures, news articles, law enforcement data in GJXDM (Global Justice XML Data Model), intelligence assessments in ICML (Intelligence Community Metadata Language), and so on.
  • Content delivery control structures. Including S1000D Data Module Requirements List (DMRL), Data Dispatch Notes (DDN), DITA Maps, and so on, these content standards, of a higher order than “delivery formats” such as PDF, act as “build lists” for inclusion of desired content components to be processed by output rendering software (DMRLs) or forwarded (DDNs) as-is to external users.
  • Procedural control structures. Designed to record and exchange rules as to how content should be evaluated and processed. If the DTD or schema acts as a map for contentstructures, the rules exchange might be seen as the equivalent map for processes.

Procedural standards
(How can the internal processes of information life cycle participants be rendered consistent and predictable?)

  • Pre-assigned function/task/chapter IDs in technical standards such as ATA and S1000D. These pre-assignments, normally designed by industry committees, determine in advance the order and nature of certain content created under the standard.
  • Predefined Componentization and Reuse procedures. For example, the ATA 2200 maintenance DTD, although possessing a rich hierarchy of individually accessible components, from “document” to “sub-para-5” and below, restricts componentization and access to designated “anchor elements” no more than a few levels from the top. One way this characteristic of standards is used is in the definition and enforcement of “minimum revisable units,” the smallest portion of content to which individual access and revision will be allowed.

Managing the Impact of Information Standards

While a standards committee often has broad strategic visions for how its standard will affect its target communities, adopters and managers who must decide whether and/or how to adopt the standard should take a somewhat more restricted view. In a word, this view can be summed up as “impact.” The following sections describe the major impacts of information standard adoption and offers some thoughts on how to deal with them.

Am I under a formal mandate to comply with a particular information standard?

While information standards are often presented as official mandates with failure to comply punishable by some form of sanction or penalty, they usually fall far short of this. Indeed, information standards mandates have proven to be very complex, sometimes more “squishy” than they are presented to be. Among their details are:

  • If I must comply, when and to whom must I demonstrate that compliance and what, if any, will the penalties for non-compliance be?
  • What demonstrated level must my compliance reach in order to be judged acceptable? For example, in a complex technical data standard, compliance may be as simple as proving the ability to export data in compliant formats or as expansive as a requirement that all structures included in the standard be supported by the user’s internal processing. The more extensive the compliance requirement, the more complex and costly the effort to adopt and use it.
  • If I cannot meet the level and deadline for compliance, under what circumstances and for how long may I request and be granted a waiver or extension?
  • If I comply with a particular version of a standard, must my capability evolve to fully support subsequent versions and revisions? If there is one inescapable characteristic of information standards, it is that they will evolve, sometimes undergoing massive change. The adopter must know whether compliance means “at the time of adoption” or requires evolution as new releases of the standard are issued.

What standards characteristics should I consider most carefully?

  • Their primary motivation is broad adoption. With exceptions like Postscript and DITA, most standards you will confront are not aimed at making your life easier but instead seek to make their entire community work better. While perhaps broadly synonymous, the difference between these two goals can be significant to an organization trying to understand what adoption will mean for its future and ability to function properly.
  • They aim at community-wide consensus. In every community, whether industry, educational, or government, there are significant differences even among apparently similar organizations. Standards intended for use by an entire community necessarily contain many compromises designed to make them at least palatable for a wide range of these individual players. In such a situation, no one gets all he wants and virtually everyone must accept things he doesn’t need or want. The maxim about the devil being in the details applies nowhere more than in the world of information standards and their management. This issue is important when the standard impacts the operations of an individual organization with individual needs, not all of which may be met by the standard and whose budget may not provide for dealing with the accompanying baggage.
  • They are verbose/complex to support wide use. Perhaps the most visible characteristic of community-wide standards is their tendency toward verbosity and with it, increased complexity of the authoring, management, and delivery processes. The S1000D technical data standard, for example, beyond its sheer 2,500 page heft, uses a complex multi-component ID scheme that will likely be overkill for organizations outside the aircraft and aerospace industries. Because such complexity in the phases of the information life cycle has a direct effect on the cost and productivity of the entire flow, dealing with it should be an important aspect of every organization’s planning for standards adoption and support.
  • Hidden assumptions. Information standards are usually based on a series of implicit assumptions about how their adopters will operate. For example, a number of the most popular current standards, DITA and S1000D come to mind, envision output products created from collections of individually authored and managed data fragments (“Topics” in DITA, “Data Modules” in S1000D, for example.) Implicit in this vision is the assumption that adopters’ content lends itself to easy modularization and that their staff and management are prepared to conceive, author, acquire, and maintain their content in these modular forms. With few organizations already operating this way, the resulting changes can have significant impacts on every step of the content life cycle and on everyone involved in making it flow, especially the involved authors. Moreover, those responsible for maintaining the organization’s existing content must figure out how and when to convert that “legacy” content to the new component based formats.Finally, with the generation of often thousands of individual yet interdependent components, managers are often faced with the need for a radically higher level of automated support lest the entire library spin out of control. These impacts often require significant training of editorial staff; impact external content providers; require major software acquisitions and upgrades, and impose a draconian conversion schedule before the organization can even achieve its pre-adoption productivity levels.While all of this is manageable if carefully planned and conducted, the organization that blunders ahead without knowing what it is taking on will likely find itself in serious difficulty.
  • Arbitrary change process based on political consensus. As with any set of rules and procedures, most organizations will find that their unique needs dictate differences in their use of an information standard, creating the need for changes to some portion of the standard itself. Unfortunately, while most standards undergo change over time, the process is usually ponderous, time consuming, and difficult. Organizations facing the need for change often find themselves faced with the choice between waiting many months or years for a change to the standard or departing from the standard to meet their current challenges. This choice can be an excruciating situation, especially for the organization that has developed or purchased application software to aid in its compliance with the standard’s mandates and assumptions.
  • Software support. Most information standards are developed with the implicit assumption that its adopters will develop or purchase the necessary tools to support their productivity and accuracy post-adoption. In a few cases, DITA for instance, the developers actually make a portion of the necessary tools available at no cost, hence the DITA Open Toolkit. However, most adopters, of DITA and other information standards, find themselves facing the need for additional support. For popular standards, there is usually an array of available tools and systems, some offered by general software firms and some by targeted application providers. While there is not room here to fully describe the variables that should be considered in evaluating and purchasing such tools, a couple of core concepts are worth mentioning: Any system, especially one that claims to support the entire processing requirements of the standard, should be carefully evaluated to make sure that it is based on solid software foundations, is designed to easily accept extensions that may be required to customize its behavior for a particular user’s unique needs, and is supported by a full range of training, documentation, and support from the vendor. Anything less, regardless of how elegant the system appears in the sales demo, opens the user to trouble and expense down the line.Because organizations often find themselves facing a hybrid environment with some of their operation adhering to the standard and some still processed in a legacy format, any system purchased to support the standard should be capable of either communicating with the user’s legacy systems as they are moved toward compliance or of handling the legacy data if systems are not already in place.

What specific impacts should I anticipate and plan for?

  • XML Requirements. You aren’t likely to find an information standard these days that does not assume your ability to create and manage XML as your content notation. If you already use XML, you are ahead of the game, but even then you may find yourself in need of an upgrade to your XML support.Editor software. All of the current XML-based standards assume a high level of structural control in the authoring process, some provided by the XML editing software itself using the standard’s DTD or schema. But standard DTDs and schemas, representing broad compromise, do not always impose sufficient structure control to fully support needed authoring processes. In such cases, you may need enhanced training and external authoring aids or investment in extensions to the editor itself to meet your needs.Transform capability. As stated above, if you are already working with XML, you are ahead of the game when it comes to supporting needed transforms to and from the XML mandated by the standard. However, especially if you are new to XML, you should be prepared to develop an enhanced understanding of protocols and software tools like XSLT, XQuery, XPath, AJAX, and so on. While you may not need or use them all, you must be in a position to make informed decisions about which ones and at what level you should make them a part of your environment. If you are considering DITA, IBM’s development and contribution of the DITA Open Toolkit will make your job somewhat easier.Publication (rendering) engine. Whatever processes and software you have been using to create formal output, printable pages, or web output pages, your adoption of an information standard is likely to force changes. You may find that you need a new rendering software program, one capable of handling the XML mandated under your new standard. Should that be the case, you may also face redevelopment of the style sheets or other control mechanisms to format your output. Fortunately, in an XML environment, there is a range of rendering packages, from free tools like FOP to high end composition systems like XyEnterprise’s XPP. The important point here is that your ability to make the standard work for you requires that you neither ignore nor leave this final step toward usable output until last.
  • Management requirements. Perhaps the most notable impact on management is the presence of multiple challenges, all of which must be considered as plans are developed and played out. While you may end up completing the steps toward full support over time, you should—perhaps must—consider all of those steps in your initial planning, especially given the potential staffing and cost impacts that may come with each. The following items describe the most important.Migration strategy. Unless you are starting from scratch, the challenges of getting content you already have into the new standardized forms is a major consideration. If your authoring and provider communities resist or reject the call to use the new forms, it can turn out to be the most serious challenge you face. Your options, in general, areAll at once Most challenging but likely required if content is to support web requests. Because conversion to more complex forms always takes time, this will extend the time it takes to get your new environment up and running.Parallel tracks Begin creating new data in the standard XML form, converting existing content as it requires republication or revision. Most appropriate and workable in page-based environments.Author in legacy format and transform on the fly If you can’t create or acquire new content in the standard XML forms, you must develop a transformation plan that generates it from what you can get. No matter how you approach this, it involves convincing your authors and providers to accept and adhere to a higher level of discipline in what they do. This will be a negotiation and can become a battle if you don’t approach it carefully.Target environment. What will your new environment look like under the standard? While the standard may assume you are ready to convert your processes immediately, you will likely need to plan for significant change across your organization, including sufficient time to agree, design, train, try, monitor, and finalize the new processes before you are fully operable. Each of the following broad situations brings its own challenges:New data structures, same procedures/products

    Upgraded procedures/products

    Hybrid data/procedures/products

  • Development impacts. If you use automated processes in support of your content and delivery environment, you will face impacts growing from the standard, its required forms, and the software required to support them:Automation choices limited to vendors who support standard. While a large segment of the software industry supports the use of XML at various levels, only a subset of that industry makes the commitment to supporting the details of any particular standard. This limitation can present some serious challenges, especially if vendors you are comfortable with and committed to aren’t among this subset.Localization capability for site-specific requirements is more complex and expensive while being less flexible. Once you have decided on the software environment you will use to support the standard, you must confront the need for enhancements to meet your unique demands. If you have selected a vendor’s packaged application for the standard, you may find it more expensive and difficult to add or change its behavior to meet your needs.“Defensive” automation costs (costs to minimize negative productivity impacts) are increased, reducing positive ROI. You may find yourself facing the need to maintain multiple software environments, increased database administration, parallel process flows for standard vs. legacy content, and staff and user interface upgrade costs to maintain productivity.

    You may find yourself faced with the need to upgrade your staff’s capability through training and software tuning. This necessity is most likely and most potentially damaging among your authoring staff. If you are typical, you will probably also face some staff attrition as older workers elect to leave or retire instead of facing the changes. This attrition is often based on a panic reaction among staff members and can be reduced by careful planning prior to confronting them with the coming changes.

Conclusion

Adopting an information standard, if handled correctly and carefully, can be a positive move toward enhanced productivity and integration into the overall community. Handled incorrectly, however, it can lead to serious problems and unexpected costs. Any organization facing the prospect of adoption should start early, give itself time to get its arms fully around the coming challenges, get help if needed, and plan carefully to meet them.

About the Author

Barry Schaeffer_bw

Barry Shaeffer
Content Life-cycle Consulting, Inc.
bschaeff@xsystems.com

Barry is Principal consultant with Content Life-cycle Consulting, Inc. Prior to its acquisition by XyEnterprise in 2008, he was Founder and President of X.Systems.Inc, a system development and consulting firm specializing in the conception and design of text-based information systems, with industrial, legal/judicial, and publishing clients among the Fortune 500, non-profit organizations and government agencies.  During his more than forty-year career, Mr. Schaeffer has held management and technical positions with The Bell System, Xerox, Planning Research Corporation, U.S. News and World Report, Grumman Data Systems, and XyEnterprise. As a consultant and systems architect, he has supported more than 50 clients including major industrial organizations, Federal civilian and defense agencies, and state governments. Mr. Schaeffer is a frequent speaker and contributor on subjects related to information and content management. His work with structured information standards began in 1979 with SGML, and with XML at its initial publication as a standard in 1996.