What Makes an Authoring System Tip?


February 2004

What Makes an Authoring System Tip?

CIDMIconNewsletter Mark Baker, Analecta Communications Inc.

This article is based on a paper I gave at the 2003 Best Practices Conference. The theme of the conference was “Innovation: Making it Happen,” and the theme book was Malcolm Gladwell’s The Tipping Point. In Gladwell’s book, the tipping point is the point at which a trend, technology, or disease suddenly breaks out and becomes ubiquitous. To foster a change, therefore, one must find or create the conditions under which a product or idea will reach its tipping point.

One of the biggest changes that documentation mangers are attempting, or at least contemplating, is the shift from a desktop publishing (DTP) environment to a collaborative single-sourcing system based on content management and XML. This led me to pose the question of why our present desktop publishing tools tipped as the tool of choice for technical communications. If we want to find a way to make a new set of tools and techniques tip in our organizations, it is probably a good idea to understand why our present tools tipped. If we don’t understand what makes DTP tools “sticky” (to use Gladwell’s term), we may have a lot of difficulty getting ourselves unstuck from them.

Sometime in the mid 80s, at about the time that personal computers were starting to find their way onto people’s desks in large numbers, two rival approaches emerged for the creation of technical documentation: desktop publishing (DTP) and generalized markup in the form of SGML. On the face of it, SGML should have won hands down.

  • The computers of the day were too slow to run DTP applications effectively. Font and graphics handling was weak and desktop printing was painfully slow.
  • Authors of technical documents were already using markup languages extensively in terminal and mainframe applications.
  • The most popular word processors were markup based. They didn’t use SGML, but they did use visible inline markup to define structure and formatting. Just about everybody using a personal computer had learned to use markup.
  • Page design, page layout, and typesetting were separate skills practiced by skilled professionals. There was no obvious reason why technical writers should want to learn or use those skills in addition to the writing and technical skills they were already required to possess.
  • In every other sphere of endeavor, structured approaches to data handling were taking off like wildfire. Database applications were proliferating through every aspect of the business. Software development groups were adopting configuration management systems that allowed for collaborative coding and conditional product builds. It seemed only natural that technical documentation would follow the same path, and SGML was the obvious technology to use. SGML seemed primed to tip.

But that is not what happened. It was DTP that tipped, not SGML. People looked at SGML. But the almost universal response was “Sounds Good, Maybe Later.” Then they bought PageMaker.

Why did it happen that way? In this article I will suggest some reasons why DTP tipped and SGML did not. Along the way, I will examine some of the myths about SGML that might otherwise distract us from the most likely answers. Finally, I will suggest some of the things that need to be done if we are going to make the current generation of markup technology tip in the documentation world.

What Is Markup?

What is generalized markup? Despite, or perhaps because of, the years of rabid enthusiasm for XML, and all the promises that are made on its behalf, markup is not always well understood. Markup, quite simply, is a set of marks inserted into a document or a piece of content to identify certain properties of that content. Technically, it is distinguished from binary file formats, which use a variety of techniques to associate properties with content but which cannot safely be edited without the original application that created them. Markup can be edited and manipulated by many different applications or even by hand. But more important than that, different systems of markup-different markup languages-can be created to describe content in different ways for different purposes. This is where the generalized part of “generalized markup” comes in. A generalized markup system is not one specific markup language. It is a set of rules that allows you to create a markup language that describes your content in a way that is uniquely useful to you and to your organization. The real key to efficiency is not just choosing markup over binary file formats. It is creating a markup language for your content that produces the greatest efficiencies for your organization. The two generalized markup language technologies of interest to technical communicators are SGML and XML.

What Is SGML?

Describing SGML has always been difficult because SGML involves a level of abstraction that people do not expect. As a generalized markup language, SGML is not a single tagging language but a language for describing languages or, if you like, a grammar for describing grammars. A language for describing languages is often called a meta-language. Meta-languages are not directly useful to someone who wants to write a document. You can’t just sit down and write a document in SGML. You have to first design a tagging language (or choose an existing one) and then write your document using that tagging language. When you design your tagging language, you are using SGML. When you write your document, you are using the tagging language that you designed using SGML.

What Is XML?

XML is a meta language, just like SGML. The popular press often describes XML as being somewhere between HTML and SGML. This is nonsense. HTML is a tagging language. XML and SGML are both meta-languages, and you can choose either one of them to create a tagging language for your documents. In fact, many of the tagging languages first created using SGML are now being redesigned using XML. The most notable example of this is XHTML, which is the XML version of HTML. HTML itself was originally (if somewhat loosely) based on SGML.

So if they are both meta-languages, what is the difference between SGML and XML? Both consist of a set of statements that you can use to describe a markup language. However, SGML has a lot more such statements and can therefore be used to describe a much wider variety of markup language syntax. For instance, in SGML you can change the markup start character-the character that says, “Now we are looking at markup, not data.” In XML the markup start character is permanently fixed as “<“. In SGML, you can decide to omit start or end tags in situations where the parser can figure out for itself where things begin and end based on context. For example, you can omit the closing tag on each item in a list because when the parser sees the start of a new item or the end of the list, it knows that the previous item must have ended. This is how lists work in HTML, which is based on SGML. Omitting tags is not permitted in XML. In XHTML, which is based on XML, you have to include all the end tags on lists and on everything else.

Can XML Succeed Where SGML Failed?

If XML and SGML are so much alike, why should we expect XML to succeed where SGML failed? XML is commonly promoted as a big step forward from SGML, so one possible answer to this question is that XML will succeed because it is better than SGML. If DTP tipped in the 80s because there was something specifically wrong with SGML that prevented people from adopting markup instead of WYSIWYG approaches to authoring, then the answer to why DTP tipped would be a pretty simple one. Before we dig deeper into the question, therefore, let’s look at some of common objections to SGML.

SGML is too hard to use
We are commonly told that XML is easier to use than SGML. Is that true? It depends on who you are.

1. If you write parsers, XML is much easier than SGML. The parser is the piece of software that goes through the document and determines which characters are data and which ones are markup. Because of all the extra possibilities it allows, it is much harder to write a parser for SGML than for XML.

2. If you write processing applications to publish or manipulate documents, XML and SGML are equally easy to use, because the parser does all the hard work of detecting the structure of the document.

3. If you design tagging languages, the parts of SGML that are equivalent to XML are just as easy to use. If you use SGML’s additional features, which can allow you to write tagging languages that are easier for writers to use, you will have to learn some extra stuff.

4. If you are a writer using a tagging language, those created with SGML may have more sophisticated features that, if well designed, make them easier to use than ones designed with XML.

If you adopt a generalized markup strategy, members of your documentation team will have to write processing applications and design and use tagging languages, but they will not have to write parsers. From the point of view of a documentation team, therefore, XML is not any easier to use than SGML. In fact, SGML may be easier to use. Ease-of-use differences between SGML and XML, therefore, do not explain why generalized markup failed to tip in the 80s.

SGML was not ready for prime time
Some argue that SGML was not ready for prime time in the 80s and that XML introduces important advances that will make it suitable as a documentation technology at last. Unfortunately, XML contains no significant technical innovations over SGML in the area of document markup. XML is, in fact, a simplification of SGML, not an advance. Most of the significant XML standards related to documents are rehashes of equivalent SGML standards. The major XML product lines are reflections of similar SGML product lines, often from the same vendors. There have been some significant changes made between XML and SGML, but these are changes that make XML more suitable for conventional data and actually less suitable for documents.

And any suggestion that SGML was not ready for prime time is disproved by the fact that some of the largest and most challenging documentation projects of the last 20 years have been performed-very successfully-using SGML.

(None of this, by the way, is to suggest that you should be using SGML rather than XML today. SGML remains a viable technology, but XML has far more support. And it is the general virtues of custom markup that are really important. The differences between XML and SGML, though real, are of secondary importance.)

SGML systems were too expensive
Some people complain that SGML simply priced itself out of the market. There are certainly some expensive high-end SGML systems out there. But these systems are content management systems (CMSs), and non-SGML-based CMSs are similarly expensive. You do not need a CMS to do SGML. Indeed, if your documentation is created using a generalized markup strategy, you will be able to let it grow a lot bigger and more complex before you have to resort to a CMS to manage it. CMS is expensive, SGML is not. XML certainly is not.

Does the Problem Lie with Markup?

If SGML provided a viable generalized markup technology in the 80s and still lost to DTP, maybe the problem lies with markup itself. There are a number of common objections to markup that need to be answered.

Authors can’t learn to type tags
Some people argue that authors can’t learn to type tags. This argument may make some sense if the author in question is a marketing writer or someone in the organization for whom writing is not a full time job. But for professional technical writers, the suggestion is frankly insulting. Technical writers, indeed, often seem to take a particular pride in the complexity of their authoring and publishing tools and the tricks they can make them do. Certainly, our current tools are not all that easy to use. We send authors for extensive training in DTP tools. Not only do they have to learn the tools, they have to learn a complex and detailed set of skills and information connected to page design and page layout. At any gathering of technical writers, whether in person or online, the conversation tends to be dominated by questions about tools. Learning to use FrameMaker effectively takes far longer than learning to type angle brackets. Besides, the success of HTML shows that pretty much anybody can learn to type tags if they have a reason to do so. Typing tags is far easier than learning to do DTP and learning to use DTP tools. If you doubt it, look at the number of job ads that rank DTP skills before writing skill or technical knowledge when hiring technical writers. DTP is hard. Markup is easy.

Documentation is different
Some would argue that documentation is different and you just can’t put a document into a database.

Clearly this is true of some documents. Strongly narrative material cannot easily be refactored into topics or abstract information objects. But creating a markup language for strongly narrative documents is relatively easy and can bring many benefits, including ease of linking and cross referencing. Using a markup language for narrative documents can facilitate shared authoring and relieve authors of formatting chores-both big wins for efficiency.

But most of the output of technical documentation groups is reference material, and there are a number of highly successful systems out there that show that constructing documents from semantically tagged, independently authored information components works very well for a wide class of technical documents. It has been demonstrated time and again that working with markup is not only possible, it is more efficient and more reliable than working in a document-centric DTP environment. For one good example of this, see the article on the ANZAC ship project in Technical Communication May 2001.

Documentation isn’t on the radar
Some might argue that it is hard to move to markup simply because documentation isn’t on the radar and cannot find the necessary investment and support to make such a major change. Moving to a markup system requires an investment, and documentation always seems to be the last place that a company is willing to invest. However, many documentation groups are now finding the funding to invest in content management systems: a much steeper investment than is involved in moving to markup from DTP. So documentation groups can find the investment and support to change how they work. They just don’t choose to use it to move to markup.

I don’t want to roll my own
Many documentation mangers feel a reluctance to embark on the creation of custom tagging languages for their content.

Actually, there may be some advantages to be gained by moving to markup, even if you don’t make up your own tagging languages. You can adopt a standard tagging language like DocBook. Doing so will at least give you the advantage of freeing your writers from formatting chores. But while DocBook is used in some quarters, it has never won mainstream acceptance. Part of the problem is that generic markup languages are actually harder to use than specific ones because they are bigger and more complicated. But the biggest reason that generic markup languages like DocBook don’t deliver the productivity gain we are looking for is that the real productivity win comes from adopting a markup that allows you to manage your content in ways that are specific to your content, your subject matter, and your business processes. Generic languages created by other people cannot do that. To get the real benefit of markup, you do need to roll your own tagging languages.

Should the requirement to create a custom tagging language be enough to make documentation managers hold back? The fact is that every other function in the enterprise that handles large volumes of complex data does roll its own data models. They employ staffs of database administrators to design and maintain database models that are specific to their business and to design and maintain data entry and reporting systems to collect and publish data. They do it because it is much more efficient to work this way than it is to work without specific data models and applications that support the unique characteristics of their data. Why should technical documentation be different?

Why Did DTP Tip?

If all the factors seemed to favor SGML, and if none of the common objections to SGML in particular, and to markup in general, really hold any water when you compare documentation to other parts of the organization, why did DTP tip?

We should be clear that there is absolutely no question that it did tip and tip in a major way. The technical communication community rushed to adopt desktop publishing, rapidly overthrowing all the old ways of doing things. Writers were required to learn a raft of non-trivial new skills, but they apparently did so without complaining, and even with glee.

Perhaps it was simply that documentation was such a neglected part of many organizations that writers who wanted to see the entire documentation production cycle handled in a professional manner jumped at the chance to do the whole thing themselves. I think that is at least part of the answer.

However, I think that the main answer lies in the issue of adaptability. Documentation is developed in highly dynamic, even chaotic conditions. Changes in subject matter are constant and often poorly communicated. Changes in output requirements are capricious and sudden. Changes in display and delivery technology are unrelenting. In this environment, adaptability is key. If you cannot adapt what you are doing quickly and easily to meet new conditions, your work becomes impossible.

However, this answer is somewhat paradoxical. If the issue is adaptability, it should be markup that wins. How do you make a significant change in your documentation set in a DTP environment? You throw lots of writers at lots of books with lots of overtime, and you get it done as best you can. How do you make a significant change in your documentation set in a well-designed markup environment? You make the necessary changes in the fully normalized information collection (in which only one instance of each content topic exists), rewrite your synthesis routines, and run your build process. There is no panic and there is no overtime-or at least a lot less. This works brilliantly, providing that your system has been properly designed, properly maintained and everybody has followed the rules.

However, these are a lot of conditions. What if your system was not well designed? What if it was not properly maintained? What if people didn’t follow the rules? What if the original design did not anticipate the changes you have now been asked to make? Do you have access to the people with the skills to change the design? Do you have people with the skills to change your synthesis and publishing routines? The markup solution works wonderfully when everything goes right. But what happens when it goes wrong? Several of the managers who attended my presentation at the conference had worked with markup, and some had experienced what happens when it goes wrong. The result can be a crisis and the complete inability to get any documentation out at all.

DTP avoids the risk of catastrophic failure
Edward Tenner’s book Why Things Bite Back provides an interesting counterpoint to Gladwell’s The Tipping Point. If The Tipping Point is about how trends start, Why Things Bite Back is about how they stop, or at least about how they reach their limits, and the unintended consequences that accompany change. One of the central tenets of Tenner’s book is that technology tends to replace acute problems with chronic ones. The chief medical concerns of a century ago, for instance, were infectious diseases: acute conditions that killed millions. Today, we worry more about chronic conditions, and we are more likely to die of diseases that cause slow deterioration than sudden death. Our enormous medical advances have not saved us from death or discomfort, but they have replaced acute threats to our health with chronic ones.

Human beings, Tenner contends, feel more at ease with chronic problems than with the risk of catastrophe. Even though chronic problems can, through cumulative effect, be even more deadly than occasional catastrophe, we prefer to accept the chronic problem rather than run the risk of catastrophic failure.

Desktop publishing is a chronically inefficient approach to creating technical documentation. It is woefully inefficient in terms of managing duplication of effort or ensuring consistency and completeness of information. It distracts writers from the job of research and writing and demands that they develop layout and publishing skills. If the job ads are any indication, with their emphasis on tool skills, using DTP can actually change the profile of the technical communicator to someone whose skills are focused on publishing rather than communication.

But for all its chronic problems, DTP is largely free of the risk of catastrophic failure. With ordinary diligence, to avoid lost or corrupted files, DTP solutions can respond to almost any new demand. All that is needed is elbow grease. By putting the entire writing and publishing process in the hands of a single writer, DTP ensures that documents do get delivered. It may be slow, it may take a lot of overtime, it may even delay product release, but it will get done.

While markup technology is a demonstrably more efficient approach to creating and managing content, that efficiency comes with what Tenner calls a “revenge effect.” It makes the author, and thus the whole project, depend on custom data models and the custom software development that is required to support those models. Mistakes in the design of such a system, or the loss of the resources to maintain it, can have catastrophic consequences.

To use Gladwell’s term again, the “stickiness factor” that made DTP so attractive to documentation groups in the 80s was that it gave them control of the entire publishing process and thus freed them from dependence on other systems and other departments. It was an advantage that markup, for all its merits, could not match.

Can Markup Tip Today?

All epidemics take place in a context. Our renewed interest in markup today arises in part because the chronic problems associated with DTP are becoming harder to live with. In fact, as budgets tighten and timelines get shorter, DTP’s inefficiency is increasingly changing from a chronic problem into an acute one. Documentation managers increasingly do not have the extra writers to throw at the problem, and the time frames in which they are asked to respond keep getting shorter. The combination of these factors means that today a dependence on DTP can potentially lead to catastrophic failure.

On the other hand, it may be easier today to assemble the kind team that is required to implement a robust markup strategy. To build an effective markup-based system, you need a text programmer/database administrator as part of your team. In fact, you need a multidisciplinary team, with each person doing what they do well and everyone working together to achieve the common goals of the team. This is not such an extravagant requirement. In fact, most of the departments of an organization work exactly this way, and they work much more efficiently because of it. But the fact of the matter is that today most technical communication groups are a homogeneous collection of writers who do their own desktop publishing. Even if single-sourcing systems are in place, they seem mostly to be run by individual writers working with desktop tools.

It was very difficult to assemble such a team in the 80s and 90s. After all, one of the big reasons there were so many technical writing jobs opening up was to free scarce engineers for engineering work. I remember at the company I worked for in the mid 90s we were trying to set up a single-sourcing system based on SGML, and we simply could not recruit the programming talent we would have needed or get our project even remotely on the IT department’s radar. Things are different today. There is no reason why a documentation department today should not be able to recruit the programming talent required to implement and maintain a markup solution.

If conditions have changed, what is needed to tip markup solutions in the documentation space?

The answer does not lie in the area of tools. There may be more tools available today, and they may be cheaper; but there is nothing fundamentally different in the tools available today for managing documentation markup than there was 10 years ago. In fact, most of the tool development in the documentation space over the last 10 years has been focused on propping up DTP-based solutions. Markup solutions, in fact, are fundamentally less tool-intensive than DTP solutions. What they do require is custom development, and that is not a tools issue but a people issue.

The key, then, lies with people. Markup-based solutions for reuse and single sourcing will tip if documentation mangers are willing to invest in building the kind of multidisciplinary teams that can put together a robust markup strategy and can implement and maintain the necessary content models and processing solutions. If not, we are probably going to go on living with DTP systems that may make token use of XML, but will not deliver the true benefits of a personalized markup strategy.CIDMIconNewsletter

About the Author


Mark Baker
Analecta Communications Inc.

Mark Baker is the owner of Analecta Communications Inc., a communication consulting firm in Ottawa, Canada. His former positions include Manager of Information Engineering Methods at Nortel and Director of Communications for OmniMark Technologies. Mark has written and spoken extensively on single sourcing and markup. He is co-author of HTML Unleashed, 2nd Edition, and author of Internet Programming with OmniMark. He is currently writing a book on Refactoring Content. Material for this article was originally created while Mark was employed by Stilo Corporation.