On the Road to Reusability: Update about a Single-Source Publishing System
More than six years ago, Andre Purnot, the manager of our Netherlands-based translation function, and I were thinking about a system that would ease the pain of getting our user documentation written, translated, and produced. Now, we are three years into a $3 million project to update our processes and tools, and we are seeing promising metrics about the returns on our investment. We call our system MAPS, for Medtronic Advanced Publishing System.
Andre and I saw a huge amount of reuse within our documentation; we estimated anywhere from 60 percent to 90 percent of common material. My division of Medtronic makes pacemakers, defibrillators, and other medical devices. These medical devices are generally released in families, with two to as many as twelve models per family. Because of the large number of submissions to regulatory agencies around the world, as well as the stringent testing requirements, medical device documentation has more milestones (and therefore versions) than is typical in other industries. On top of that, we work on as many as four generations at a time. In that environment, trying to make a change everywhere it applies is a daunting task. A project to change warranties cost almost $500,000.
FDA regulations require us to track the history of text and illustrations: who made what change when and why? But many of the other drivers leading Medtronic toward single sourcing are common in many industries:
- movement toward simultaneous worldwide release
- escalating costs
- expanding language requirements
- increasing product complexity
- demand for electronic information delivery
Andre and I drove the project from the bottom up. We started within one division, Medtronic’s oldest and largest. Upper management in that division recognized the problem and the potential gains. We began with
technical documents only, no training or marketing materials, although we see training materials as a logical extension. We also limited the initial scope to English for authoring and to nine European languages for translation: French, Italian, German, Spanish, Dutch, Danish, Portuguese, Swedish, and Greek. These languages are for the European Union countries that have native language laws.
Status of the process
Before upper management would approve funding, they insisted we re-examine our process. Spending a lot of money automating a poor process is a big mistake. We carefully analyzed and documented our current processes and then re-invented them with a vision of how new tools would improve and automate the work. That re-analysis alone brought us to a greater understanding and standardization.
About two years ago, the translation group stopped using outside agencies. They developed a network of freelance translators and brought Trados translation memory in-house. They also divided their translation personnel into two groups: linguistic specialists and project coordinators. They have two linguistic specialists for each language into which we translate. All are trained as translators, and many are native speakers of their assigned language. The project coordinators are the liaison to the writing and software groups. Now, the translation group is also hiring native speakers of our first-tier languages as full-time translators.
On the writing side, we tried to standardize to the best practices among those of
different writing groups. We also started mixing personnel from different groups onto projects. Writers are trying to consciously think about how to write in a way that facilitates reuse. For example, we use the word “device” rather than “pacemaker” or “defibrillator” so that text modules can be used more widely. We encode company name, model number, and other like information as variables rather than by typing it as text. We also have a vision of writing by “domain” or topic rather than writing by project. The goal would be that each writer has a vested interest in trying to optimize the number and wording of modules to encourage reuse.
Status of the tools
On the tools side, we divided our initial development phase into three releases. We are in full production for the first release. That release supports one of three types of manuals, a short and simple document type. These documents are 10-20 pages long, with about 150 words per page. Our second release adds a more complex document type, with screen snaps, graphics with translatable text, and index. That release is validated and is in pilot use. Most of the major functions for the first phase are finished. Our third release will add some enhancements and an additional simple type of document with a different appearance.
The MAPS system architecture, as shown in Figure 1, includes a repository of parallel components or modules of text and graphics. For each module, a parallel module in each of the different languages is supported. The text modules are tagged in SGML. We started with SGML versus XML because SGML is a mature ISO standard and the XML tools were in their infancy when we began. Based on advice from one of the authors of XML, we stayed within the subset of tags common to both so we can convert to XML at some time in the future, if we choose. Modules are assembled into documents for different models through a “Document Build List.” Based on what type of document is needed, the MAPS FOSI (Formatting Output Specification Instance) determines the appearance of documents at the time of output.
Figure 1. System Architecture
Figure 2 shows the MAPS UNIX server file system, which is the mechanism through which MAPS controls versions. The system works through a set of parallel directories and file names. For example, look at the first English module, under directory EN, subdirectory sgm. The module names are automatically assigned by MAPS, assuring uniqueness by using the UNIX system time, appended with the author’s initials and a three-digit revision number. In the subdirectory SGM-meta below, it is a file with the same name that contains the metadata about the module. In the French directory on the right, there is another parallel file with the same module in French. Back in the English directory, there is a revision of the first English module, labeled with an incremented revision number, 001. However, the French parallel module is missing, which is how the system knows it needs to include that module in a translation packet. The non-lingual directory on the left contains a list of Document Build Lists (DBLs), along with a file of metadata about each Document Build List. The EPS and EPSmeta files contain graphics.
Figure 2. Server File System
Goals for Improvement
We had three goals for improving the process and tools, one for each segment of the process:
- Authoring: Optimize the reuse of text
- Translating: Eliminate retranslation and handling of unchanged text
- Publishing: Automate layout and composition
On the authoring side in the old process, we started by making a copy of a document for a similar past or current project. But right away, the two sources started to diverge, despite well-intentioned and conscientious writers. Even if the writers do things perfectly, we end up with hundreds of copies of common text, such as the corporate address. To change the common text, we have to find, open, change, and quality check every document. In the MAPS world, every document is really a list of pointers to modules. The address or other module exists and is maintained in a single place. (See Figure 3 and Figure 4.)
Figure 3. Authoring Before
Figure 4. Authoring with MAPS
In translation, we used to send a monolithic document to the translators. A human translator might have found and used text that had been translated in a previous document, but basically we got back a whole document. The translation agencies charged us for handling already-translated text, sometimes at full new-translation prices.
In MAPS, the repository knows whether it already has a translation for every module. When we initiate translation of a Document Build List, the MAPS system automatically filters out modules that have not changed. It bundles only those modules that are new or changed. Then, translation project coordinators run those new or changed modules against translation memory and usually find many more sentences with 100 percent matches, or at least “fuzzy” matches. (A fuzzy match is a nearly identical sentence with one or two different words.)
For the purpose of context, the system does send a copy of the whole document, but translators are focused only on brand new sentences or fuzzy matches. (See Figure 5 and Figure 6.)
Figure 5. Translation Before
Figure 6. Translation with MAPS
Before MAPS, we used Adobe FrameMaker with custom templates optimized to aid the translation process. However, FrameMaker does not prevent writers from overriding the templates to create a design effect. Sometimes those overrides have an unintended effect on publishing after translation. So, we instituted a quality check to weed out overrides.
The FrameMaker process also included a step to generate a book that included generating the table of contents, cross-references, index, and page numbering. We then saved the book to PostScript and later distilled it to PDF. Covers were created separately because of the need for color.
A frequent source of human errors was the complex business rules about addresses and notes that must appear on covers. The rules differ depending upon whether a document is for clinical trials or market release. The US rules also differ from those rules for countries outside the US. Furthermore, electronic publishers might need to adjust some callouts or other items to accommodate language expansion if the writers had not been very careful in tagging them correctly. Finally, the composition process had to be repeated for each language. (See Figure 7.)
Figure 7. Publishing Before
In MAPS, the number of steps for composing is much reduced. Furthermore, its page composition eliminates much of the potential for human error. The FOSI, the MAPS pre-processor, and the Arbortext Epic composition engine interact automatically with the Document Type Definition (DTD) and the Document Build List to resolve the SGML, generate the book, do the page composition, and distill to PDF. Based on metadata associated with the Document Build List, the system knows the business rules and automatically composes the cover, including the correct warnings. For example, the system includes a special notice on the cover of documents to be used only in clinical trials or notes that a device can be used only by prescription from a physician. (See Figure 8.)
Figure 8. Publishing with MAPS
In the MAPS world, writers use a slightly customized Arbortext Epic Editor. Normally, they work in a view of a document that includes the SGML tagging, as well as a visual representation of the structure of the document. (For a comparative example, see Figure 9 and Figure 10. The headings appear bold and in a larger font than text.) However, the text is really coming from the modules that the Document Build List points to. One can see that fact in Figure 10, in which the viewing mode has been switched so that only the document structure and list of pointers appear on the screen.
Figure 9. Document Build List, Text View
Figure 10. Document Build List, Entity View
Figure 11 shows how the document is divided into modules. Some modules are extremely short: the word “Introduction” is a module in and of itself. A more typical module length in MAPS is a paragraph or a set of a few bullets. This very small granularity increases reuse, and so far it has not proved unmanageable for writers. The writers usually start working by cloning an existing Document Build List and then determining what modules need to be changed or added.
Figure 11. Document Build List, Module View
Our first release with a simple document type has been in production for several months; we now have some good data about the amount of reuse that we are achieving. Our repository includes 35 manuals of the simple type, 21 of which are translated into 9 languages. (Normally, we produce two sets of English manuals, one for inside the US and the other for outside the US.)
Each week, the system generates reports showing current reuse both by module and by words. These statistics vary depending upon where we are in the project cycle; when a new product line is introduced into MAPS, the reuse decreases until more projects in that line have been started. The module reuse has been running generally above our expectations, most recently at 82 percent, which means that 82 percent of modules in a Document Build Lists are from reuse. If you look at the total number of words within all modules, reuse is about 70 percent, which is still high. (See Figure 12.)
Figure 12. Reuse Results: Expected Versus Actual
A factor that drives down reuse is tables, each of which is currently stored as a module. We have not yet found an optimal way to break tables into smaller elements so that components, such as a cell, column, or row, could be reused.
Authoring time and costs
We had expected that writing English source manuals using MAPS might take longer than with FrameMaker. We thought that great savings on the translation side would offset the extra time in English. Certainly, we expected to see more time spent on the first few projects done in MAPS, due to time spent working out the bugs, as well as the learning curve. Instead, compared to a baseline project where the per-page cost was about $108, the first MAPS project showed a slight reduction in per-page cost, down to about $94. The second MAPS project showed very dramatic savings, but the measurements on that project may have been taken too early, thus not reflecting total costs. A third project showed a more realistic $61 per-page cost, which is still a respectable 40 percent reduction in the number of authoring hours and costs per page. (See Figure 13.)
Figure 13. Profiting From Reuse: Authoring
Translation reuse is quite dramatic. Approximately 70 percent of the translated words come from module reuse. So, 70 percent do not need to be handled by translators at all. Another 21 percent of words come from 100 percent matches in the Trados translation memory. A 100 percent match means that the software found the same sentence in its memory. An additional 6 percent of translated words are found in fuzzy matches, matches where a word or maybe just a tag differs, which leaves only about 3 percent of words in new sentences-to be translated for the first time. (See Figure 14.)
Figure 14. Reuse Results: Translated Words
As one would expect given this level of reuse, the cost for translations done through MAPS is also coming down dramatically. The cost of new translated pages has decreased by about 15 percent; the decrease for all pages processed is about 40 percent.
The metrics on publishing (automated page composition and distilling) have also been exciting. On the translation side in the FrameMaker world, after the translations were done, the electronic publishing group took up to five calendar work days to schedule, compose, and quality check a manual in ten languages. With MAPS, the covers, front matter, and back matter are all automatically composed from metadata associated with the Document Build List and all content is automatically composed in batch mode. In MAPS, translated pages do not need to be processed by the electronic publishing group at all. After you press a button on a dialogue box, the system automatically resolves the SGML, lays out the pages in all languages, and distills the document into PDF. A 250-page manual that contains ten languages, which would have taken three to five days with FrameMaker, is done in three minutes with MAPS. A 20-page English-only document is done in less than one minute. Longer, more complex documents auto-compose in about five minutes. (See Figure 15.)
Figure 15. Automatic Publishing
System performance, in general, has been good. Execution is fast, even from Europe. Originally, we planned two servers that would mirror each other, one in Europe and one in the US. However, with a dedicated T1 line already part of the company infrastructure, we found we did not need the other server, thereby simplifying operations and maintenance.
The Human Side
We expected and have experienced some resistance from writers about moving to this new tool. The development team succeeded in keeping the interface for all users very simple. They strove to eliminate the potential for errors in use. This usability combined with extensive communication and carefully planned author training have helped to turn their resistance into healthy skepticism at worst and enthusiasm at best. One of the first writers on the system is not a tools jockey by her own admission. She actually prefers a system that limits writer functionality to what is absolutely required. She said, “MAPS is easy! I’m more comfortable after using MAPS for three months than I was after using FrameMaker for a year.”
The translators adapted quickly to using an SGML translation-memory editor, Trados’ TagEditor. They use Trados not only for MAPS-authored material but also for documents from the divisions that do not use MAPS. The MAPS segment of their workload is taking a smaller percentage of their time. They are especially excited about one-touch, automated, multilingual composition. It composes in three to five minutes and eliminates three to five days of QA for each change cycle.
As we peel the layers of this onion, we predictably find new things that are required or at least desirable. The translation memory tool is the only component of the system that is not based on a client-server architecture. Translators working on PCs must carefully check that their parameters are set appropriately for MAPS; MAPS has no way to set them automatically. Also, the Trados software does not prevent a translator from changing settings, which can result in translators inadvertently changing tags as well as text. Something that goes wrong in the non-client-server part of the translation process is hard to troubleshoot, especially from the US.
The compare features in Epic Editor are not entirely satisfactory. (Neither are those features in FrameMaker, for that matter.) It is important for us to be able to readily show FDA and other reviewers the substantive differences between source documents and a new document, which remains a time-consuming chore. Similarly, the freelance translators, in-country proofreaders, and linguistic specialists-none of whom work directly in MAPS-would like better visual marking to show what text is old and what is new.
We would like to find a strategy for improving reuse within tables, which is currently not possible.
Finally, the writing staff is still grappling with exactly how to implement our vision of writing by domains instead of by projects. Currently, we use a new tool but in somewhat the same way. We hope to plunge into that next phase in the next few months.
Also, still ahead are the challenges of integrating documents from other divisions and other functions. But our current and planned metrics tell a story that will attract additional Medtronic groups to this system. Their participation should provide additional returns on that $3 million investment.
About the Author