Single Sourcing HTML and PDF with FrameMaker and WebWorks
Interested in publishing to the Web but not ready to install a full XML content-management system? That was our situation. Facing complaints and usability issues with our library of legacy PDFs, we decided to add HTML support to our existing publishing environment, extending rather than replacing our current tools. The approach has let us achieve the benefits of Web-based publishing for end-users. In the process, we’ve also laid the groundwork for further development of our content-management architecture and started to think and write in content-management terms.
We now use an automated system based on Automap, the server-based version of Quadralay WebWorks, to translate Adobe FrameMaker files to HTML. The system supports HTML versions of about 50 manuals, representing more than 10,000 pages of documentation for our core products, available to clients on the Web. WebWorks automatically updates the HTML library from FrameMaker books as writers make changes to books.
New product information for our company’s software products is developed from the ground up to be online, either built into the products or on our support Web site. But we also have a sizable library of information that was published originally as books before the days of online documentation. The bulk of this information-any document longer than several pages-is maintained in FrameMaker and published on our Web site in PDF format. People rely heavily on this information; the online library of PDF manuals is the most visited destination area on our Web site.
Problems with PDF
Though PDF documents provide several major benefits (such as central access to current documents, strong support for printing, and reduced printing cost), they have several drawbacks for online use. Our users (especially experts who use documents frequently) complain that PDF access is cumbersome and inconvenient. Some specific complaints include the following:
- Finding information is hard if you don’t know which document you need. The Web site allows searching for PDF documents that contain a phrase, for example, but after you find a list of documents containing your phrase, you still have to download and search each document individually to find specific references.
- To look at a single topic, you have to download the entire document. Because manuals are generally several hundred pages long, this is cumbersome, even with a fast Web connection.
- Pages are laid out based on the pre-existing “printed page” format. Text is often either too small to read or (if you enlarge it) runs off the edge of the screen.
- PDF requires readers to switch back and forth between the standard browser-based interface and the Acrobat Reader interface. One sample complaint-page numbers in the original document don’t match the PDF-assigned page numbers-creates confusion about page references.
Overall, navigating among PDF documents slows people down, results in unsuccessful searches for information, and reduces performance and satisfaction.
The HTML alternative
The main issue in developing a large HTML-based Web site is content management. Each page has to be created, managed, and presented to the user as a separate file. For us, this requires storing and managing thousands of HTML files, as well as a framework for navigation.
A high-end approach would be to convert all text to an XML-based content-management and publishing system. But that’s a major undertaking, difficult to justify for an existing library. Another, more accessible approach for FrameMaker users is to continue to maintain manuals in FrameMaker and convert them to HTML for the Web. WebWorks software is designed to work with FrameMaker to do this. This provides the main advantages of HTML publishing for the Web but is easier than switching to native HTML or XML-for one thing, writers can continue to use their existing tools.
Several years ago, our publishing group decided to explore the use of WebWorks to convert FrameMaker manuals into HTML, allowing us to produce both output formats-HTML and PDF-from the same source documents. See Figure 1. We already used WebWorks to produce HTML Help files, which were written from scratch for online delivery. Converting the existing books to HTML was simple in concept-we would simply use a different WebWorks template to produce HTML rather than Help projects. But implementation also involved a number of significant practical issues. We would be converting a significant amount of text, which had not been written for online delivery. We had to adapt the supplied WebWorks template to work with the existing books. And we needed an infrastructure that would allow multiple teams to publish books to a single centralized library on the Web and update the HTML as FrameMaker books were updated-all quite different from producing separate Help projects.
Why Users Like It
The initial release of HTML-based versions received an extremely positive response from some of the heaviest users of existing PDF manuals in our client support groups. The general reaction was that the HTML versions are much easier to use online than PDF. See Figure 2. Features cited include the following:
- cross-document search across all documents in a library or set
- combined table of contents for multiple related books in the same library
- responsiveness, with quick jumps to any page
- screen-based formatting, which adjusts lines to the size of the current window without changing font size
A pilot project with a small set of manuals established agreement among managers that documentation in HTML format would be beneficial, but before proceeding, we had to assess whether we could support this on a production basis.
To be worth doing, conversion would have to support HTML access to a broad range of our legacy documentation (approximately 50 main manuals in the US and Canada). It would also have to be accessible through standard browsers supported by our company’s Web site. The HTML documents would have to be integrated with the existing Web site so users could still go to one place for product information. (HTML would not replace PDFs, which are still useful to print manuals.)
From a production perspective, the HTML conversion would have to work in our decentralized writing environment, giving each writer independent control over conversion of individual books but also be easy for writers to use, allowing writers to work essentially as they did before, without becoming HTML or XML experts or spending time managing document conversion.
We wanted conversion to be unattended, which meant we also needed a quality assurance (QA) strategy to monitor the process, spotting errors in conversion caused by errors in books.
We also needed an audit trail so that writers could easily tell which books had been updated to the Web, with ability to track version status and history for a book, a set, and the product library as a whole, with centralized QA and a way to monitor use.
Based on a sampling of our books, we concluded that most books could be converted to HTML with an acceptable level of effort. We also concluded that, once each book had been set up for conversion, the idea of updating HTML in the background as the book was updated was feasible, using WebWorks’ server-based version, Automap, which runs as a batch process on a server.
The first step was to develop a WebWorks template to support our company’s FrameMaker document templates. We developed this by modifying a template provided by WebWorks, adding HTML formatting for styles in the FrameMaker templates, including elements such as paragraphs and character formats, cross references, and graphics. Development of the template was an iterative process, based on testing with multiple manuals.
Based on the tests, we set up a QA guide that writers can use to prepare and test a manual for conversion. (In practice, our production editor became an expert and a central resource for setting up books and working with writers needing assistance.)
We added several new elements to the FrameMaker template to support HTML-mainly custom markers to support navigation links from within the HTML pages to the relevant pages on our Web site. Adding these to each manual became part of the set-up process.
We moved all documentation files from a Novell server to an NT-based server that could run Automap. Quadralay provided sample batch files for controlling unattended operation. We set up batch files that writers can use to mark books that require updating and extended these to provide logging and posting of updated files to the public Web server outside the firewall.
The HTML library provides the features described below for end-users, writers, and production managers.
HTML books are accessible to users who have passwords to our Web site. The output format supports Netscape 4+ and IE 4+ browsers on Windows and the Mac, accessible to most users. Users can jump to their choice of HTML or PDF versions for each book. Once in the HTML library, they can search and navigate quickly across the entire HTML library. (In tests we successfully set up 50 books with a single navigation window but found this unwieldy. We ended up with multiple book sets, each containing 6 to 10 related books with common TOC, search, and index. We didn’t use an available feature that lets users mark favorite pages because of concern that bookmarks would become obsolete as libraries are updated.)
Setup for each manual requires a general scan for formatting issues, an initial test conversion, and a process of cleaning up formatting problems. Non-standard FrameMaker formatting that might work with a “printed page” format in PDF often does not convert cleanly to HTML. Most problems can be fixed by following published tips. But some can require some painful repairs. For example, complex graphics with multiple layers and callouts convert flawlessly if set up correctly, but if individual graphic elements are not placed in an anchored frame in FrameMaker they end up garbled or missing in HTML even though they may look fine on the page in FrameMaker.
To update a book once it’s been set up, the writer simply clicks on a batch file in the book’s directory on the LAN. This sets a flag marking the book for update. A status history file and a copy of the conversion log in each book’s directory track when HTML updates were requested and when they were actually run and let a writer check directly for conversion errors detected by Automap.
Behind the scenes
A series of batch files developed in consultation with Quadralay automates the conversion process. A scheduled task runs on the server every 10 minutes, checking for books that need an update and launching batch jobs to convert individual books as needed. HTML files are posted to an internal Web server so writers can check their HTML output within 15 minutes after requesting a conversion. The updated HTML is copied through the firewall to the main Web server overnight using a scheduled task.
Central log files allow monitoring and debugging of problems with individual book conversions (though these have been rare). A batch command allows an administrator to scan conversion logs to check for errors in all current books in a single action. Existing Web tools allow us to monitor usage levels of the HTML books on the server.
What have we accomplished? First of all, a significant number of users have made it clear in initial reactions that they now have faster and easier access to information-the explicit purpose of the project.
In the process, we’ve also cleaned up documentation files because product teams worked together to identify the core set of information that would be in the HTML library.
By extending FrameMaker with WebWorks’ technology, we’ve leveraged existing resources to produce an HTML-based library more quickly and with less cost than possible otherwise. Because we had to learn about the technology to apply it to our environment, we now have an infrastructure that’s suited to our requirements and the understanding we need to support and extend it.
In the process, we also learned about some of the limitations inherent in our current tools and documents. It’s easier to see benefits of XML-based documents and of separating content from publishing, which our current system does only in limited ways.
By adding a publishing format that depends on automated translation, we’ve also added a set of QA requirements. Now when we write a manual, it has to work as HTML. As a result, we know that our documents conform to a common structure. This consistency can serve us in various ways. For example, presentation of information is more uniform. We can update document formats centrally. If we do decide to move to an XML-based system in the future, we have a common starting point, which would make that step much easier.
The jury is still out on how many people will actually use HTML in place of PDF. We’re monitoring its use, and have added HTML manuals as a topic for discussion at informal roundtable assessments of product information to learn which aspects users like and what their remaining issues are. We already have a wish list of features for the next release.
About the Author