Successful Taxonomy Development: Three Important Questions

Home/Publications/Best Practices Newsletter/2006 – Best Practices Newsletter/Successful Taxonomy Development: Three Important Questions


October 2006

Successful Taxonomy Development: Three Important Questions

CIDMIconNewsletter Seth Earley, Earley & Associates

Systems for managing unstructured information are proliferating throughout organizations, not only in terms of variety—document management systems, content management systems, portals, search engines—but also in terms of volume—each department or division develops multiple instantiations of these installations. All of these systems aim to improve ways of connecting people with the information on which they depend to do their jobs. However, the nature of this evolution of business and technology has led to the point where the solution becomes the problem—new systems designed to make information more accessible become more systems to access information.

It’s Still English, But…

Key elements in the success or failure of these systems are issues of language and meaning. Language is used to create, use, search, navigate, and retrieve information. However, words can be ambiguous and have different meanings for users depending on their context and experience. Consider what happens if each constituency does their job, but accounting people speak British English, IT speaks a Cajun dialect, legal speaks in inner-city slang, and researchers speak in scientific terminology. How can they share information? For all practical purposes, the languages they use in communicating with their professional peers are as different as these corners of the English language. It’s still English, but using these various dialects impedes effective communication.

I know I am preaching to the choir to the readers of this publication, but as we know, for content to be understood, reused, and exchanged across these different contexts, by different users, and by different systems, there must be a foundation of common terminology and understanding.

In this article, I talk about three important issues to consider when creating the underlying organizing principle of information systems —the taxonomy: a common language for sharing concepts and allowing efficient organizations of documents and content across systems and processes.

Even though developing a taxonomy is an essential step in any information management initiative, whether implementing a new solution or optimizing an existing one, many organizations make some fundamental mistakes. These projects suffer from a poor understanding of the bigger picture. Taxonomy development is not just about terms, definitions, and hierarchy; there are larger issues to consider to ensure that your taxonomy can be successfully leveraged.

Here are three important questions to ask before starting any taxonomy project.

  • How does the taxonomy fit into the larger content strategy?
  • How will the taxonomy be applied?
  • Do we understand the context?

How Does the Taxonomy Fit into the Larger Content Strategy?

Content strategy can be defined in multiple contexts:

  • Business context: How does the content meet business needs? What are the business objectives of users and what capabilities are we providing?
  • Technical context: How will capabilities be supported with new and existing technology?
  • Deployment context: How will the new system be populated with meaningful, high-value material, and how will users be trained or enabled?
  • Management and governance context: How will the system be maintained and how will change requests be collected, vetted, and approved?

Taxonomy fits in each of these contexts. Just as one could not create a content management system without having a clear business objective with a defined audience and understanding of technical capabilities, a taxonomy project requires the same consideration to these issues. Many people think that an enterprise taxonomy, by its very nature, is context and application independent. In other words, the purpose is to define a structure and organizing principle to provide consistency to all applications and processes. This is true; however, if we start out trying to be all things to all people, we end up not meeting any specific needs. The correct approach is to begin with a single perspective and expand. A good place to start is with someone who either sells to the customer or serves the customer. If your taxonomy project is too far removed from a specific business problem or too many levels away from the customer, it will be perceived as academic and irrelevant.

How Will the Taxonomy Be Applied?

Technology-driven approaches to information and knowledge management often get a lot of flak. We’ve all heard of knowledge-management projects failing because too much focus was placed on the IT and not enough on people, processes, and other enablers. While this is a valid criticism, developing a technophobia in taxonomy development can cause a project to become disastrously disconnected from the reality of the application environment of the final product.

In its purest form, the taxonomy is a list of terms, arranged in a hierarchy, describing relationships and order. However, in order for these terms to be of value, they have to be applied. Taxonomies are part of an information-management context and therefore live in and are applied by technological applications and systems. Disregarding this fact affects how much value you will ultimately gain from developing a taxonomy.

There are many questions arising from the technical application of taxonomies that affect how they are built. For example, think about the various systems in your organization. How does each one use the terms in the taxonomy? Does your content-management system apply terms differently than you portal? How will this affect your development process?

Even more fundamental, does your system even have the capability to apply a taxonomy? You may come up with a great set of terms, but if your technology has no capability to support use of the taxonomy, then little value will be gained.

The first step is to identify all of the applications that will leverage the taxonomy, whether through search, navigation, content management, etc. You then need to understand in what capacity these tools use and apply the taxonomic terms. Are terms applied as metadata to document containers? As facets in a search interface which map to specific locations of information? How will users attach terms to content?

Integration between applications is another key technical consideration: How will the terms be passed between the various systems? How will terms be updated? Which system will be the “source of truth”? Understanding application dependencies begins to get into the realm of metadata management and data architectures and standards, but this is core to effective taxonomy deployment and integration. There will be multiple consuming systems for your terms. A system that consumes terms from another context may be a source of terms for yet another application.

An example of this is usually found in the financial systems of your organization. In most cases, it is not possible to go in and alter “reference data”—the controlled vocabulary lists that are used to organize and categorize customers, products, accounts, and so on. These values may be the “source of truth” for a customer relationship-management (CRM) system. But the CRM application may be the source of truth for distribution-channel information. Both of these systems might feed the content-management system. Understanding these dependencies and mapping out the relationships is a key part of the taxonomy-development process.

Do We Understand the Context?

There is a great deal of confusion with regards to the value of developing well thought-out taxonomies in a world dominated by Google-like interfaces. In many organizations, there is a line of thought that “if we just get a really good search engine,” the problem of people not being able to locate information to do their work will go away. The answer to this hope is, that although search algorithms are constantly getting better, they still cannot infer context. They cannot tell what you want to do; they cannot tell what is important to you; they don’t know the context of your work.

Taxonomic terms derive and describe the perspective, meaning, application, and function of content within a context, whether a business context, systems context, process context, etc. The question of context is so important in taxonomy development that I’ve split it into two more specific questions: Do we understand the knowledge processes and flows in the organization? Do we understand the audiences?

Do we understand the knowledge processes?
Understanding knowledge processes really boils down to whether you understand how information is used in your organization. If you’ve decided to build a taxonomy, you’re likely trying to solve a particular problem or improve a particular process. To do so, you need to have a good understanding of these processes, how people are doing their jobs and using and exchanging information.

Most taxonomy development projects begin with content audits, which involve surveying and identifying information sources (web sites, file shares, hard drives, etc.) and describing the conditions of the content (current, out of date, to be migrated, ownership, etc.).

What is equally important, however, and often ignored, is a knowledge audit. A knowledge audit looks at problems or knowledge processes and puts information and resources in the context of these processes. For example, if I am a technical support representative trying to help a customer install a software update, I may be interested in technical bulletins, bug fixes, common installation problems, specific configuration documents, and so on. Configuration documents might also be useful in a number of other work tasks.

Ask users to identify the locations of documents and applications that they typically access during their specific work processes. Also ask them to submit samples of “high-value” content—the types of documents that they feel are representative of useful information for their particular job function.

Understanding knowledge processes and flows helps describe artefacts in the context in which they are used, and this context description becomes the raw material for the taxonomy. The more you know about context, the better you can develop an organization scheme that fits the information need.

Do we understand the audiences?
Having an understanding of your audiences, what they need, and how they think is equally important as understanding how they do their jobs. It goes beyond daily work tasks, getting to the very nature of people’s understanding of their problems, their line of thought, and their language.

A key step in the taxonomy development process should therefore be user interviews. Basically, you are trying to characterize audiences by deconstructing their thinking process, their patterns of information use and organization, and their terminology.

Some example questions you can ask in interviews include:

  • What are some of the information sources that you use on a day-to-day basis?
  • Who are your audiences for the content that you create?
  • Describe the first thing that you do when you come in the office. Where do you go first? What do you do?
  • Where do you put documents from email messages on your hard drive?
  • What kinds of questions do people ask you over and over?
  • Give me some examples of the specific terms you might use.

What is the significance of knowing all of this? The more we know about our users’ world, the more precise our understanding about the types of information they look for and how they go about finding them. An accurate set of taxonomic terms can be then developed that describes information in the context that is most applicable to the users and their goal or work task.

More Taxonomy Development Tips

Many of you are probably saying to yourself, “I already go through these steps in developing a content-management strategy.” The point here is to realize that the same considerations that we apply to content strategy also need to be applied to taxonomy development. Here are some other tips:

  • Start with a focused perspective.
  • Apply the taxonomy to high-value content during development through tagging and/or navigational exercises.
  • Determine how you will apply terms to production content in advance.
  • Do a “sanity check” with users—ask if the topics make sense and are clear, and if there are any large gaps.
  • Avoid politics by testing “straw man” taxonomies and tabling contentious issues.
  • Develop governance processes during the taxonomy-derivation process.
  • Get representation from owners of “downstream” systems that are affected by taxonomic terms.
  • Leverage existing term sources—don’t be afraid to reuse what is in existence.
  • Map out application dependencies.
  • Create strategies for ongoing updates of multiple systems.

Taxonomy projects can seem overwhelming at first. But by starting with a focus on a problem and perspective, the pieces begin to fall into place. Try to take a long-term view of the process and put energy into development and maintenance on an ongoing basis rather than letting the taxonomy lose relevance and effectiveness. CIDMIconNewsletter

About the Author


Seth Earley
Founder and Senior Consultant
Earley & Associates

Seth Earley has been implementing content management and knowledge management projects for over 12 years and has been in the technology field for 20+ years. He is founder of the Boston Knowledge Management Forum (Boston KM Forum) and co-author of Practical Knowledge Management from the IBM Press. He is a former adjunct professor at Northeastern University where he taught graduate courses in Knowledge Management Infrastructure and e-Business Strategy.

Seth has developed search, content, and knowledge strategies for global organizations and has developed underlying taxonomies for a diverse roster of Fortune 1000 companies. He is a popular speaker and workshop leader at conferences throughout North America speaking on intranet design, knowledge management, content management systems and strategy, taxonomy development and other related topics.