Peter Dykstra, MetaphorX LLC
Moving to a topic-based information architecture? An open source CMS can provide a more accessible alternative for organizations who want to manage and publish structured topic-based information sets but don’t need all the features and complexity of DITA.
The rise of Wiki-based Content Management Systems provides new possibilities for technical communication groups. Organizations are discovering that they can use open source Wiki-based CMSs to manage large, structured documentation sets with requirements for content re-use, multi-channel publishing, and translation management.
A Wiki-based Content Management System can simplify the traditional desktop publishing process, especially for web-oriented publishing, and produce XML-based assets that can be reused in multiple contexts and repurposed for multiple publishing channels.
A Wiki-based system, used with an information architecture driven by specific business requirements, can provide a platform for managing the topic sets and the relationships among topics necessary to implement structured documentation. System features such as browser-based WYSIWYG editors, role-based architecture, document versioning, and XML processing can be used to achieve benefits such as faster review and approval cycles, reduced production time, reduced duplication of source content, and lower translation costs. The net result: more timely publishing of information with fewer errors and reduced cost.
This article gives an example of how to set up a structured publishing environment using one open source system, the Daisy CMS. It describes Daisy’s architecture and features, shows how Daisy can be used for topic-based publishing, and tells how to get started.
Introducing the Daisy CMS
Daisy is produced by Outerthought, a Belgian software company, and is available for use under the commerce-friendly Apache open source license. It’s a mature system, currently at release 2.3, with a history of regular releases since 2001.
Out of the box, the system supports editing and publishing of web-based ‘sites’ consisting of Daisy documents. Sites can include Daisy HTML documents or ‘attachment’ documents containing text, PDF, Word, or Excel documents (or virtually any file, including multimedia).
Daisy consists of two main components—the Wiki (the front end that users interact with) and the Repository (the back end where documents are stored). The Wiki talks to the Repository via a documented Application Program Interface (API), which other programs can use to act on documents in the Repository.
Highlights of Daisy’s architecture:
- A clear, well-defined document model. Daisy documents use a subset of HTML (which is also well-formed XML). Editors can assign metadata to documents and use the metadata to manage sets of documents. Each document actually contains all previous versions of the document and can also contain multiple document variants for branches and languages.
- “Big bag” repository. Documents in the repository are all in one “big bag” – you add documents and organize them for viewing by assigning them to Daisy sites. A document can appear on one or multiple sites; you can reorganize how sites are organized without changing the documents themselves. (Each site has a Navigation document used to assign documents to the site, similar to a DITA map.)
- A query language which can be used to list documents that match specified criteria. Queries can be set up within documents to include lists of documents—or their full text—within other documents.)
- A Books module that allows assembling documents into books with contents, index, and page formatting, in either HTML or PDF format.
- A full text search engine. You can search HTML documents and attachments, including PDF, Word, Excel, and text.
- Role-based access control. Administrators can assign roles and use the roles to control rights to view, edit, publish, archive, or delete documents.
- Document subscriptions. Users can receive an email whenever a flagged document changes.
- Document tasks. Authorized users can perform batch operations such as copying or deleting sets of document in one step.
Daisy feature highlights
Users with read-only access can
- View public pages without logging on
- Log on and see additional documents based on access rights
- Collect documents in a document basket and view or print them as a set
- Convert any Daisy HTML page (or the contents of a document basket) to PDF on-the-fly
- Add public or private document comments
- Search on the full text of documents they are authorized to see
- Use faceted browsing views to drill down to find documents by filtering out unwanted categories
- Subscribe to a document to receive notifications when the document is updated
- See all previous versions of a document and compare any two versions
- Download books that have been created with the books feature
Users with write access can also
- Create or edit Daisy HTML documents using a browser-based WYSIWYG editor (switchable to HTML view)
- Upload and store virtually any file as a Daisy Attachment document
- Include documents (or queries) in other documents
- Publish books as PDF or HTML
- Save documents in either ‘Staged’ or ‘Published’ versions
- Set up role-based rights to view, edit, or publish documents
- Define document types, each containing one or more standard document ‘parts’ plus custom metadata fields
- Manage translations. Generate reports to monitor synchronization of translated documents, export out-of-sync documents for translation, and import the updated translations
Sample Daisy site
A sample Daisy site, below, contains documents about adding a deck to your house. Documents have been written as modular topics, using a Document Type of “Topic”. The writer has assigned a topic type to each topic, from a list of available types (Overview, Procedure, Concept, and Reference).
A navigation pane on the left lists the topics in a site-specific hierarchy. Daisy can support multiple sites, each of which may be visible to (and editable by) different sets of users; a topic can be maintained once and re-used on multiple sites. This site uses a custom skin with a custom logo and font settings.
Figure 1: Sample Daisy Site
The WYSIWYG editor
Daisy HTML documents are created in a WYSIWYG editor (Figure 2). Paragraph styles are selected from a drop-down selection list. The editor also includes controls for creating and formatting HTML tables. Writers can enter a Table Class to support predefined table formats.
Figure 2: Daisy’s WYSIWYG editor
Writers can switch to an HTML edit view (Figure 3) to see the HTML tags in a document (these are also well-formed XML). Most editing can be done in the WYSIWYG view. (Familiarity with HTML is helpful, since the WYSIWYG editor can get confused by things like multi-level lists; the occasional problem can be fixed in the HTML view.)
Figure 3: The editor’s HTML view
Each document type can be set up to include an optional ‘Fields’ tab containing fields set up for that document type. In this case, the ‘Topic’ document type has custom fields for Product, Status, Topic Type, and Audience.
Figure 4: The Fields Tab
The Navigation document
Each site has a Navigation document (Figure 5), used to define the sequence and indentation levels for the documents listed in the Navigation pane. (A Navigation document is similar in concept and function to a DITA map.) The Navigation document can refer to individual documents by ID, or it can use queries based on built-in document properties (such as name, owner, or document ID) or customized metadata (such as ProductName, DocumentType or TopicType) to assemble a set of topics for the site.
Figure 5: Daisy Navigation document
Daisy’s Books feature provides the ability to assemble PDF books or sets of HTML pages from the Daisy repository.
An authorized user (usually an administrator or editor) can define and publish PDF books offline (Figure 9). Daisy assembles specified documents into a formatted book, which can then be downloaded by users from within Daisy or distributed separately.
A book’s table of contents can show topics at multiple levels, as specified in a book definition.
PDF books include standard features such as formatted pages and chapter and section numbering. PDF formatting can be modified by editing XSL-FO files. (Daisy uses the open source Apache FOP XSL-FO processor.)
Figure 6: Table of Contents and sample page from a PDF Book
Publishers can customize books for specific audiences, such as users of different product versions.
In the Deck example, the Audience field on each topic could be used to create separate versions for pre-sale, customers, or dealers.
Daisy’s flexibility makes it easy to extend the above sample to as many sites as necessary—for multiple products, for example, with a separate site for each product release (branch), language, and audience level.
You can customize Daisy as much or as little as you need to meet your requirements. Since it’s an open source application, the Java source code is freely available and can be modified. However, much of Daisy’s behavior can be configured and customized without the need for Java programming.
Basic configuration (setting up document types, field definitions, access rights, roles, etc.) is done using Daisy’s GUI-based Administration page. Setting up Daisy sites requires editing a set of XML configuration files.
Resources / Getting started
Interested in trying Daisy? You can download and install it by following instructions on the Daisy Web site, listed below. It can run on just about any PC, Mac, or Linux computer (and can automatically be shared by other computers on the network, if your computer is on a network).
The Daisy site provides a complete description of what you need to download and install Daisy. In addition to a computer and an Internet connection, you’ll need three main things:
- A Java environment, which can be downloaded from the Sun Microsystems Java site (www.java.com/en/download/manual.jsp)
- An installed copy of the MySQL database Community edition, which can be downloaded from Sun’s MySQL site (dev.mysql.com/downloads/)
- Daisy—which can be downloaded from SourceForge (you can link to this site from the Daisy site (cocoondev.org/daisy)
Daisy Getting Started Guide
MetaphorX also provides a step-by-step Daisy Getting Started Guide (produced using Daisy!) This covers installation and basic site setup on a Windows PC. For a copy email@example.com.
Daisy vs. DITA: a quick comparison
Daisy documents do not follow the DITA standard, so it’s not an option if you need DITA. On the other hand, used with an appropriate set of document types and writing standards, Daisy provides many of the benefits of DITA-based systems with less complexity, allowing easier up-front implementation and requiring less specialized knowledge to maintain documents.
Both Daisy and DITA allow for setting up topics, combining the topics into books or information sets, and using XSL to produce HTML, PDF, or other output formats.
A key difference is that, unlike DITA, Daisy uses the same tag set for all Daisy documents (a familiar subset of HTML tags that specify “structural” document elements such as paragraphs, headings, lists and tables).
The DITA approach provides more control of individual topic structures but requires use of a validating XML editor and a level of technical knowledge for anyone who edits content. The Daisy approach allows use of Daisy’s browser-based WYSIWYG editor, which is easier to use and accessible to anyone with edit rights. Rules for content and formatting of individual topic types are enforced externally through the use of authoring guidelines.
About the author
Peter Dykstra is founder and primary consultant for MetaphorX LLC. He specializes in helping organizations plan and implement open source content management solutions for technical publishing. He’s also worked as a tech pubs manager and software product director, with responsibility for managing writing groups, implementing enterprise technical publishing systems, and leading user-centered design for software products.