Single Sourcing across Product Versions


June 2002

Single Sourcing across Product Versions

CIDMIconNewsletter Mark Baker, Director, Communications, OmniMark Technologies

Version control is one of the most important functions of a content-management system (CMS). However, there is an aspect of version control that is often neglected: product version control. One of the most common applications of single sourcing is to allow the reuse of information components across multiple versions of the same product. If a particular feature remains unchanged across three versions of the product, why not use the same component in building each version of the documentation?

The difficulty arises when a new version of the product does require a change to the information component. Now, you have a new version of the component that belongs to a new version of the product. However, this is not merely a new version of the same component. It has different subject matter (the new version of the product) and is therefore a different piece of information. On the other hand, it is not an entirely independent component. The component that describes the feature for version 2 of the product is closely related to the component that describes the feature for version 3. If you fix an error in the version 3 component, that same error may well exist, and need fixing, in the version 2 component and vice versa. The two components are neither entirely separate nor entirely the same. They are cousins.

The Component Family Tree

The family tree of cousin components can get complex. Let’s work through a scenario. Let’s say you’re documenting an office phone system. Version 1 has a Transfer feature. You create a component to describe the Transfer feature.

Version 2 has exactly the same Transfer feature. You use the same component, which now applies to versions 1 and 2.

Version 3 changes how the Transfer feature works. You now need one component for versions 1 and 2 and a different component for version 3. You create a version 3 specific copy of the Transfer component. It is a cousin of the version 1 and 2 component. Any change to one may also apply to the other.

Version 4 splits the Transfer feature into separate Transfer and Conference features. The total functionality available is the same, but the packaging is different and requires separate topics for each feature. You create new version 4 specific Transfer and Conference components. Of course, the version 4 Transfer component is a cousin of the version 1 and 2 and the version 3 Transfer components. But the new version 4 Conference component is also a cousin of the version 1 and 2 and the version 3 Transfer components because Transfer was used to create a conference in earlier releases.

Version 5 combines the Transfer and Conference features back into a single feature, but now it is called “Conf/Transfer.” You create a version 5 “Conf/Transfer” component to describe this feature. This component has a new name that is not shared with any components belonging to any previous version. Nonetheless, it is a cousin of all the components mentioned above because it documents common functionality.

You can see that the family tree can get quite complex, but if you want to maintain the change management advantages of your single-sourcing system, you have to be able to manage these relationships.

The difficulty with managing topics that potentially refer to more than one product version is that product versioning is part of the semantics of your information set, not part of the semantics of a typical content-management system. That is, the concept of how versions of a document relate to one another is well understood and easy to standardize. But the way in which versions of a product relate to each other can vary widely. And in addition, the way that the information about a new product is organized can vary from one product version to another, driven by a number of factors, including the changed structure of the product, a desire to improve the organization of the documentation, or the necessity to take a new approach to documenting a greatly expanded or more complex product. While a CMS, out-of-the-box, can support document versioning well, the issues relating to product versioning can be more complex and particular and may require a custom data model to manage efficiently.

How do you handle the product versioning problem? There are several possible approaches, each with their advantages and disadvantages.

The Snapshot Approach

The first approach is to take a snapshot of your system immediately after you build the documents for a particular product release. A snapshot is essentially a complete backup of your data. If you need to edit a previous release, you load the snapshot, edit it, create the output, and save it again.

The advantage of this approach is that it is relatively easy to implement. It does not require any changes to your system, your data model, or the way you work.

You do need to examine the capacity of your CMS to cleanly switch snapshots. Ideally, you would like to load the snapshot into a separate copy of your CMS so that your work on the current release is not interrupted during the edit to the older release. You need to check if the system supports this and if it is consistent with your software license.

The disadvantage of the snapshot approach is that your content is no longer single sourced across product versions. The snapshot versions do not inherit changes made in the main version or other snapshots. Current versions do not inherit changes to the snapshots.

The Effectivity Approach

The second approach is to use effectivity within a component to separate information specific to particular versions. In effect, the multiple logical components referring to different versions of the product are combined into one physical component.

The advantage of this approach is that it keeps things single sourced. You are maintaining a single component for all product versions, and the effectivity clearly shows what material applies only to one product version or another.

To implement this approach, you need your information components to be stored in a form that supports effectivity. This includes proprietary formats like FrameMaker and open standards like SGML and XML. If you use SGML or XML, of course, you will need to make sure that your DTD includes the necessary structures to express the effectivity you need.

Implementing this approach also requires a way of applying the effectivity rules whenever you do a build of the documentation. You need to make sure that your build process provides support for applying effectivity rules, otherwise you will need to create additional build processing logic to apply the effectivity.

The problem with effectivity is that it gets increasingly complex as time goes on. The distinctions between two versions are simple and easy to handle. Effectivity is usually applied to as small a section of text as possible, to maximize reuse. But by the time you reach the fourth or fifth version, you can end up with a maze of intersecting effectivities that are difficult to manage or even to follow.

The greatest difficulty with the effectivity approach, however, is that it does not handle the branching and joining of components, such as the splitting of the Transfer and Conference features described above. (You could use effectivity to cut the conference material out of the version 4 Transfer component and then create a stand-alone Conference component, but then you would have duplicated the conference information and the conference component would be an orphan.)

There is also the question of ownership. One writer does not necessarily own the same component across multiple product versions. If ownership is split along version lines, it will be very difficult, if not impossible, to get your access or workflow systems to support varying ownership of different effectivities of a single component. And who owns the common parts of the component?

The Metadata Approach

The third approach is to rely on component metadata. If your problem is too complex to handle with the effectivity approach, you will need to create separate components wherever the information differs between product versions. If your CMS has no explicit support for product versioning, you can add product versioning to the metadata of each component. Most CMSs will allow you to specify your own metadata fields, so this should be easy enough to do.

The advantage of this approach is that it is relatively easy to do using the features found in most CMSs.

The problem with this approach is that, without specific product version support at the CMS level, you have to rely on whatever features the system provides for working with metadata to build version specific documentation or to identify, track, and retrieve cousin components.

For instance, you can probably add a “Product Version” metadata field to each component stored in your CMS. With some systems, you may be able to define this field to require a value from a specified list. This is helpful because it ensures that correct values are used and the metadata is consistent. Because a component can belong to more than one product version, the product version metadata field must sometimes record more than one value.

To track cousin components created by branches and joins, you will also need to add metadata fields to describe relationships such as the one between the transfer and conference features in our example. This might be done by having cousins simply point at each other or by assigning them to family trees. (Again, one component may belong to more than one family tree.)

Once you have this metadata, however, you need a way to use the metadata to select and assemble the appropriate components to build an information product. There are several challenges here. For instance, can your system reliably select components based on one of several values in a metadata field (to find a component that is part of more than one product version)? Can your system enforce that when a new copy of a component is created, the appropriate family-tree metadata is also created? If you don’t have robust support for tracking cousin components, you can easily overlook an affected component when making changes.

The Data Model Approach

The fourth approach is to build support for product versioning into the data model of your content-management system. If the data model of the system supports product versions and the management of cousin components, then you will get full support for the product version problem.

The data model of a content-management system is the way in which pieces of information are recorded and the way they are related to one another. The repository of a content-management system is a database, and like any other database, that repository is organized in a particular way to meet particular business needs. A data model designed to handle product versioning, for instance, will probably have a table containing information on each version of the product. A new record will be added to this table each time a new version is created. Individual components will be linked to this “product version” table to show that they belong to a particular version. This table will be referenced whenever the system builds a version-specific edition of the documentation.

One of the most important ways in which the model of a content-management system can be designed to support multiple product versions is in the area of references. In our example, the component for the “Hold” feature might cross-reference the Transfer component. Because of changes between versions, we may have more than one Transfer component. However, the Hold component never changes in any of the new product versions. Ideally, the reference from Hold to Transfer should still build correctly even though it references a different Transfer component depending on the product version being built. One way to model this is to create a single abstract “Transfer” component on which each version specific cousin is an instance. References are then made to the abstract component and resolved to the version specific component when the documentation is built.

Creating a custom data model to support your specific content-management needs is the most robust solution. It ensures effective change management while maintaining the virtues of single sourcing. It also provides support for an effective workflow solution if different writers own different cousins.

The main disadvantage of this approach is the difficulty of implementing it. Your CMS may not provide explicit support for product versioning and cousin components. If you build your own database and data model, you can implement product versioning and cousin support but you will also be responsible for developing the application logic to support it.

If you can’t find a CMS that supports these features, it may be possible to implement it using the extension features of your current CMS. Most CMSs provide for extensions either through a built-in scripting language or through an API used to hook in functionality written in a programming language. You may also be able to use the workflow features of your CMS to implement some of the cousin logic.

On the other hand, it is problems of this kind that often lead companies to develop their own content-management systems in order to fully support their own business rules and the unique features of their content.


The biggest pitfall in handling the product version problem in a single-sourcing system is that the problem gets worse over time. The first time you encounter it, it is likely that the number and extent of the deltas between versions will be slight. The effectivity approach is very appealing when you first encounter the problem because the simple effectivity between two versions is easy to handle and the solution is usually easy to implement using the existing features of your content-management system.

As time goes by, however, more version changes accumulate. As these changes occur, effectivity-based solutions become more and more difficult to handle. Effectivity does not provide good support for splitting and joining components. Other kinds of version problems can occur as well. For instance, a single product may be split into a range of products (Standard, Professional, and Enterprise versions of software, for instance). Now, you have to handle the level of a product that a component belongs to. Potentially, you may have a feature that works differently in different product models, or you may find that conceptual components have to be versioned because they include references to features not found in a lower level of the product.

After three or four rounds of product updates, documentation repackaging, and product line splitting, you will find you have a complex relationship of cousin components. The complexity of their differences may be too great to combine them all into one component with effectivity. Without cousin logic at the CMS level, it may be too hard to keep track of all the cousins.

Many content-management systems fail over time due to accumulated complexity. Make sure you plan your system now to accommodate the full range of version management problems you are likely to face in the future. CIDMIconNewsletter

About the Author