Refactoring Your Legacy Content


August 2005

Refactoring Your Legacy Content

CIDMIconNewsletter JoAnn Hackos, CIDM Director

It’s tempting to believe that we can move our information quickly and easily into the new topic-based architectures. If we can just turn our massive collections of technical manuals into small chunks, we can reap the business benefits of reuse. Reuse earns us reductions in development and maintenance time and promises significant cost controls over production and translation.

So tempting are the rewards of managing smaller chunks of information that some content management vendors and their counterparts in XML-based editing software promise us miracles. All we need to do is take all that old book-based information and convert it into modules using “smart tools” that appear to read between the lines or the heading levels. Voila! There you have it-instant topics that can be reused everywhere.

The promise of easy gains in turning tired old manuals into vibrant, chunky topics is false. The difficulty, as I see it after having reviewed hundreds of thousands of pages of technical content over the past 30 years, is that we’re stuck with legacy content that was never designed to be easy to change. We’ve created a craft that encourages writers to produce their own books, typically idiosyncratic structures so intertwined with transitional elements and references forward and back that the cost of maintenance is astronomical. We employ many writers who often can do little more under impossibly tight schedules than adding additional layers of complexity to these tomes as the products they support increase in features and functions.

On the contrary, you may assert, “My books are well written and have not been corrupted by years of maintenance. I can do a simple conversion to topics for my books.” Unfortunately, simple conversion does not work even for well written books. Books are written to be used in a linear fashion. As a reader of a book, how often do you go back to a preceding paragraph or section to clarify something you have just read? Chunking a book by heading levels to create topics does not work because the essential paths to earlier material which were assumed by the author and required by the reader are lost. In user studies, I have watched users of help systems, converted directly from books, desperately searching for the earlier information without success.

To think that we can salvage these legacies of generations of writers by simply dividing them into bright little chunks is pure fiction. If writers are telling managers that they don’t get much reuse out of the results, they’re absolutely right. What we must do is accept the impossibility of creating a useful and reusable architecture by harvesting the refuse of poor architectures. It’s like trying to create modern buildings by salvaging stones and mud from primitive huts. We need to take responsibility for creating new structures that will support future innovations in information design and dissemination. We need to discover new structures that will be flexible, transformable, and help us reduce the cost of change.

As many information-development managers have quickly discovered, moving to topic-based authoring and reaping the benefits of component-managed systems requires a resource commitment. Without that commitment, nothing much changes. To take a term from agile development methodologies, we need to consider the “Cost of Change” in our underlying architecture. Have we created an architecture for our information that is transformable? Can we deliver information easily and in new ways without having to start over?

The core product-development concept that we need to take into account is called “refactoring.” A term introduced in
object-oriented programming, it refers to the process of improving a computer program by reorganizing its internal structure without altering its external behavior. For information development, refactoring has come to mean the process of restructuring and rewriting a library or a book from its original linear book architecture into a modular topic-based architecture. Unlike sections of a book, a refactored document consists of a series of logically self-contained topics that can be reused to produce a number of different documents. Each topic is fully understandable in itself. The user should not have to look elsewhere to find content needed to explain the content in the topic. If the source book or books are of high quality, refactoring can be accomplished from the content found in the books alone. For poorly written books, refactoring may require gathering additional content from users and developers. Because of the architectural changes required to convert from books to topics, refactoring must always be done by people, not by dumb “smart tools.”

In refactoring our content, first we must design a coherent, simplified information architecture. A topic-based architecture, such as the DITA standard, provides a sensible starting point. Then we need to research our existing content, looking for opportunities to extract content that is better maintained as separate components. Finally, we must remove complexity from our original content by applying the new structural model. As a result of this new, simplified, minimalized design, future changes to content resulting from product changes or changes in user community or design thinking will be easier and less costly to implement. Even the basic release-to-release maintenance costs will be significantly reduced.

As you prepare for topic-based authoring and content management, decide which legacy content is likely to move forward with new development and leave behind in the old world the content least likely to change. By analyzing and refactoring your most valuable content, you are most likely to benefit from your investment in new technology. CIDMIconNewsletter