Single Sourcing—Notes from the Trenches


February 2002

Managing Metadata for Responsive Web Sites with Subject Schemes

CIDMIconNewsletter Laura Katajisto, Microsoft Corporation


Two years ago, the Education Services department at J D Edwards began implementing the Enterprise Content Manager tool that was developed by J D Edwards. Currently, we have a staff of 75 people producing all the documentation and training materials for J D Edwards in a single-sourced, content-management environment. In our production repository, we have 150,000 individual content units, 780 collections (aggregates, books, and so on), and 600,000 metadata assignments. We’ve successfully delivered content from this repository for the last two years, and in that time, I’ve written thousands of lines of code and dozens of white papers, design documents, requirements documents, and project plans. In this article, I offer some lessons learned, arranged in a somewhat logical order, if you’re beginning to think about single sourcing and content management.

From our experience and research, 90 percent of all single-source projects fall into one of two categories: proof of concepts that don’t use real information and failed full-scale implementations. The first category represents prototypes or proof-of-concept implementations that don’t use real information. In practice the absence of real information means that the company skipped the hard part, which is analyzing and planning the structure of their content, and tried to implement “faked” content. No kidding. We’ve had people say to us, “We didn’t actually put the content in a database, because we know it will work,” or “We put one book in our repository, and it required a lot of unexpected work, but we know the rest will be easier.” Without a well thought out information architecture plan, it will not get easier, and the project will fail.

The other category is failed implementations. One company I recently visited had published a glowing report of their single-sourcing project over a year and a half ago. Now they are barely getting their deliverables out the door. They are waiting for the money to convert back to a non-single-sourced SGML/XML environment.

Do such problems imply that single sourcing won’t work? Absolutely not! But you’ll need help getting your single-source solution into full production mode. In this article, I’ve detailed a few lessons we’ve learned at J D Edwards that may help you avoid some of the most common mistakes.


The justification for single sourcing, its value proposition, always centers around reuse. It’s fascinating that once you get your single-source system up and running, it turns out to be fairly difficult to figure out how to measure reuse. Do you claim that every instance of a content unit in a collection or aggregate is reuse? What about your “working” collections, collections that are never published (and believe me, these will immediately begin evolving in your single-source environment)? Here’s the key. In single sourcing, “reuse” means reusing work. You have to find the place where you are effectively reusing the work of authoring, editing, subject-matter expert review, QA, and production. To really understand a single-source system, you must abandon the idea that you’re reusing content and focus on how you are reusing work.

Legacy Content

When someone comes to sell you a single-source system, they sell you their system as if you were starting from scratch. They have to start this way because as makers of software or as consultants they don’t have any legacy content. You do have legacy content. Almost nobody decides to implement single sourcing unless they already have a pretty big content problem. Start your project by figuring out what you’ll do with legacy content. You may decide to develop new content for a new product, as a prototype, in your new content-management system so that you are not bound by the restrictions inherent in your legacy content. Every answer to the question of what to do with legacy content is unique and difficult. We ended up inventing our own system for parsing, chunking, and importing our legacy documents. Even so, cleaning up the imported content still took months.

Chunk Size

We decided to parse, chunk, and import all of our legacy documentation. If I could offer only one piece of advice to someone beginning a single-source implementation by importing all of their legacy documentation, it would be this: chunk your legacy content as large as you possibly can. Then, as you go along and discover opportunities for reuse, break out smaller chunks but only when you know reuse is possible. As you think through your chunk size, you’ll feel an overwhelming temptation to chunk your content very small because it will seem like you create more opportunities for reuse. Resist this temptation. It’s much easier to downsize your chunks than to upsize them.

Alternatively, you can avoid this problem altogether by slowly phasing in the documentation by re-writing it. Chunk size will evolve at the prototype stage and be worked out by the time the legacy documentation is rewritten and phased in.


The content and the repository seem the obvious places to begin planning a single-source implementation, but that would be a mistake. Start with your workflow, and make sure it includes each and every deliverable as an end-point.

There are several reasons to start planning at this point. First, an accurate workflow will immediately identify about 50 percent of the metadata you need, and it will help you identify obvious things you’ve missed. We started with the content, the repository, and the authoring tools and tried to incorporate the workflow later. We were a year and a half into the implementation before we realized that we had no effective way to eliminate unused or outdated content in our repository. We would have seen this problem immediately if we had built a workflow that embodied the content life cycle. Editing, QA, and production will be different in each single-source implementation. A good workflow will allow you to anticipate the differences rather than having a moment of panic the evening before a deliverable is due.


Metadata is crucial to the success of a single-source implementation. Metadata does two things, and nearly every single-source implementation overlooks at least one of them. Metadata’s first job is to allow you to find content in your repository. Single sourcing means that you de-contextualize your content to some extent. Context, in a traditional content-development and -delivery environment, allows you to find specific content units. In a single-source environment, the metadata has to do that work. The second thing that metadata does is impose order on the presentation of your content within a deliverable. In a full-blown, single-sourced, dynamic deliverable, one of the most difficult things is to figure out the order of presentation. Imagine that you’re single sourcing a Help system and the HelpID says that five content units should be returned for a given Help call. What order should they be presented in, and how do you encode that in your repository? Suppose the same five pieces are returned for a different Help call, in a different order. It’s not practical to associate a sequence with every HelpID. What do you do? Your metadata must establish relationships between content so that relative order can be derived on-the-fly. Finally, remember that after you’ve been exposed to a single-source environment for a while, metadata will start to make perfect sense to you. For most people new to metadata, it is mysterious and difficult to understand. You need to plan to thoroughly educate the staff that will be working with metadata.


There doesn’t seem to be a lot of discussion about ownership in the standard literature on single sourcing. Ownership is composed of authority and accountability. Once you begin an implementation of single sourcing, the puzzle of ownership can drive you crazy. Consider this example: you create a collection composed entirely of content units owned (or authored) by other people. Now, what authority do you have over your collection? Can you advance it through workflow to production? What if the content is sent back for another edit? Does all of the authority rest with the owners of the content? In that case, what exactly can you be held accountable for? If I’m the owner of a piece of content, am I accountable for notifying each of the reusers of changes to that content? If I’m a reuser of content, am I accountable to the owner of the content for how I use it? We don’t have all the answers to these questions yet. What we do know is that many of these ownership questions can be answered with workflow, where the relationship between authority and accountability is established through workflow roles, rather than by relying on the communication skills of individual owners.


Not long after you begin your single-source implementation, the necessity of notifying authors and reusers regarding changes to content will become obvious. Reusers of content need to know when content has changed so they can determine if the reuse is still appropriate. It’s even more difficult to distinguish between a significant change and an insignificant change without workflow. Simply monitoring check-in and check-out from the repository means that everyone will be swamped with useless messages. On the other hand, if you don’t send messages about significant changes, people will stop reusing content because they won’t be able to count on it remaining appropriate. The answer here again is workflow. When a piece of content has moved from one workflow state to the next, it has undergone a significant change and messages should be triggered.

Repository Management

The necessity of managing your single-source repository is easy to overlook. In a non-single-source environment, the equivalent of repository management happens incrementally inside the documents. It’s almost completely hidden. New duties arise in a single-source environment that simply don’t arise in a traditional content creation and delivery environment. For instance, to make single sourcing work, you will have to define content types whose structure and purpose are very clear. Then invariably, someone will find it necessary to invent a “generic” content type because they’ll discover that they have content that doesn’t fit into any of your defined types and doesn’t warrant creating a new type. As soon as there’s a generic content type, it will be misused. It’s a sort of “Gresham’s Law”1 of content; bad content types tend to drive out good. So, you need a repository manager (or managers) to monitor the growth of different content types. The repository management position requires a skill set that’s difficult to find. This person must understand content creation and structured documents in addition to being somewhat skilled in database administration.

Information Design

As you roll out single sourcing, you’ll encounter many occasions when you need to communicate what you’re doing. You might need to communicate upward with management to ask for additional funding. You might need to communicate to the development organization when you discover you need some custom code. You might need to communicate to the customers of the content you produce because they want to customize it. You might need to communicate to the consulting firm you have to hire. To communicate what you’re doing, you need an information design. This is another topic that has limited coverage in the standard literature of single sourcing. You may be reading this, thinking to yourself, “Yeah, an information design. That would be good.” Let me ask a question. If you were holding a single-source information design in your hands right now, what exactly would it look like? Would it contain graphs? How about system diagrams? Could an information design be composed of nothing but XML schemas? Would you have completely different information designs for what happens “inside” the content and for what happens “outside” the content but “inside” the repository? Should the information design include the structure of your deliverables? I don’t imagine you can answer any of those questions. Hardly anyone can. What is the answer? An information design maps a general design strategy onto a particular design problem. XML schemas are a good place to start, but you’ll also want to produce some straightforward graphics that map metadata through all of its transformations.


In this article, I may have made single sourcing seem like an impossible task. It’s not, but you absolutely need someone to help you, someone who really knows what he or she is doing. The intellectual challenges of single sourcing are exhilarating. There are huge payoffs when it works. For example, not only do we single source our content development; we also single source our deliverables. We ship HTML in XML wrappers. We can now use XSLT to transform our XML wrappers into pre-compiled HTMLHelp. What used to take five days now takes fifteen minutes.

The three most important things you need to do if you’re planning to implement single sourcing are

  • Be duly diligent and make sure you have a big enough problem to justify single sourcing.
  • Get competent help.
  • Prepare for an adventure with all the standard misery and excitement. CIDMIconNewsletter

About the Author


1 Gresham’s Law: Bad coinage drives out good.