Case Study: An Open Source CMS Experience
Content management is a hot topic right now in the world of publishing and information management. Traditionally, companies considering Content Management Systems (CMSs) have had no choice but to work with a commercial vendor. The growth of the open source software movement provides another approach—using open source software.
This article discusses reasons you might consider an open source CMS approach and a case study based on one company’s experience.
Why Companies Use Open Source Software
Open source is not right for every situation, but it can offer significant benefits in some cases. The costs are mainly in time and effort rather than money, which may be appealing if budgets are an issue. It can provide a way to put your toe in the water before diving into a large vendor-driven project and reduce risk by allowing iterative development. It can increase flexibility by allowing for an open, standards-based system, and serve as a learning tool.
On the down side, using open source software can be daunting. It usually involves downloading an application or component and then figuring out how to make it work, often without much support. For complex applications, this process can require some investment of time and effort. Business solutions are usually not pre-packaged, and there can be both technical and cultural learning curves in adapting features for any particular organization.
On the other hand, many open source projects offer robust, (and well-documented) enterprise-level software. Licenses are free (though not all licenses are equal), and you can download the software for the price of a web connection. If you have the knowledge, you can customize code freely to meet your requirements. If you don’t want to go it alone, you can buy support as you need it, directly from the developers or from a growing number of experienced open source software consultancies who can help get you up and running.
Some big companies have decided that open source software is an attractive option for ‘commodity’ functions—standard capabilities that every organization needs, like server operating systems, web servers, and database applications. It can reduce the cost of operations, help ensure that systems are standards-based, and keep an organization in sync with the outside world. Is Content Management becoming one of those functions?
Why an Open Source CMS?
As CMS systems become more common and standards emerge, open source projects have developed high quality components that perform core CMS functions. These now offer some impressive capabilities.
At its most basic, an XML-based content management and publishing system has to perform several fairly standard functions. It needs an editor or other tools to create content, a content repository to store the content, a full-text search engine to find content when it’s needed, and an XML processing engine to transform content and display it. To tie this all together, it needs a content model that describes what the chunks of content look like and a workflow model describing how the chunks move through the system.
There are literally thousands of ways these functions can be implemented and configured, and vendors often offer specific features to differentiate their products. If you need these specialized features or have to get a mission-critical system up and running on a tight schedule, you will probably want to use a vendor with a proven track record.
But if you can’t justify a large budget or prefer a hands-on approach, open source offers another option. Mature components to build an infrastructure are available for free. You don’t even have to assemble them yourself—other open source projects have already done that for you.
You can download and set up a working CMS with the core functions in a few hours, ready to configure for your own use. A system you develop and test on a single PC can be scaled up to manage content for a large organization.
This approach is not for everyone—you’ll still have to define and manage your own requirements and build your own business solutions (non-trivial tasks that a vendor could otherwise provide). But if you are careful to choose a standards-based system, your content will live in standard formats, not captive to a proprietary system—you can get it in or out if you change systems.
If you have the knowledge (or hire it), you can extend or modify the systems at will. There’s a marketplace of consultants who can help get you up and running. Unlike commercial vendors, these consultants don’t own the software. You hire them as you need to do specific things.
Even if you don’t modify the code, you can have complete access to it just to understand what you’re using and how it works.
Open source CMS systems can be stable, mature, and robust, with proven track records. Stable, standards-based components, many of which have been tested and proven in live production environments, are continuing to evolve. These are being shared, re-used, and developed collaboratively in open source communities, which you’re welcome to join.
Donovan Data Systems (DDS) is a software services company that provides services to the advertising and broadcast industries in North America and Europe. A few years ago, DDS decided to set up a centralized internal library of software architecture documentation for use by developers in the US and UK. The company had already used open source components in other areas so we decided to look for an open source solution.
The goal was to provide a central library of software architecture documents which would be created and owned by 10-12 development teams. In addition to capturing each team’s knowledge, we wanted a better way for each group to learn about other groups’ projects. Our goal was to achieve broad participation by content owners and clear benefits to content users. To improve shared accessibility, we wanted a central repository with a common access method and wanted to present information in a consistent format.
We were a Java shop, so we started by looking at Java-based open source projects. This search led to the Apache Software Foundation, which has support from a broad range of major technology companies and offers a commercially friendly license (the best-known Apache project is the Apache Web server, used by most web sites in the world). Looking for products, we discovered Daisy, developed by a software company called Outerthought, located in Belgium. Though not an Apache project, Daisy uses Apache components (as well as others) and is available under the Apache license. Another recommendation: Daisy is also used to manage documentation for some of the major Apache projects.
Daisy is a self-contained CMS that you can download and install on a PC or Mac. It is Java-based and will run under Windows, Linux, or Mac OS X. It consists of two separate components—the Daisy repository, which stores content on the back end, and the Daisy Wiki, a browser-based application that you use to display and edit documents.
The repository stores documents in MySQL, a well-known (and highly scalable) open source database.
Rather than storing documents in a hierarchy, the repository stores them in a single ‘big bag’ using a simple sequential numbering scheme. Each document can have one or more fields (think: metadata) attached to it (such as author, department, document type, status, and so on); these can be customized and used as a basis for filing and retrieving documents.
The repository can contain two basic flavors of documents—‘Daisy html’ documents, which are actually well-formed XML and appear as web pages, and attachments, which can be virtually any file. Daisy provides full-text search capability for Daisy HTML and common attachment types (including Microsoft Office, PDF, and plain text attachments). Users can query the repository to find documents based on either fields or full-text search terms.
The repository keeps all versions of every document, allowing comparison of versions and roll back to previous versions.
Users can be assigned to roles, which in turn can have specific levels of authorization, so users can only see (or edit) documents they’re authorized for. Users can also subscribe to documents, so they can be notified if a document changes.
Administration of the repository is controlled by an administrator role. This role can be used to define other roles, users, and access control levels, as well as the repository structure — collections, document and field types, and Daisy sites.
The Daisy Wiki is a standalone web application which provides the user interface to the repository. The Wiki talks to the repository in the background; it provides an interface for finding and viewing documents, editing documents (it includes a browser-based WYSIWYG HTML editor), and (if you are an administrator) administering the repository.
You can set up multiple sites for each repository. A Daisy site can automatically display documents that meet certain conditions. For example, a site showing Call Center Procedures might include all live documents where Department equals “Call Center” and Document Type equals “Procedure”; these could then be sorted by type of procedure or product. Documents on a site can be listed in the left-side navigation Window (aka table of contents) in a virtual hierarchy based on document fields.
In addition to displaying Daisy HTML documents as HTML pages, you can also convert them to formatted PDF on the fly. Or you can use Daisy’s book publishing feature. This feature lets you use queries to pull together dozens (or hundreds) of topics into a formatted PDF book, complete with cover, page numbers, and a table of contents.
The Wiki communicates with the repository using a documented API. You can write your own applications to perform all repository functions via the API, independent of the Wiki. If you don’t like the Wiki for any reason, you can use your own application with the repository to store and retrieve documents without using the Wiki at all.
Our experience is described below in the main implementation stages—setup, content implementation, and rollout, followed by a summary of user response and lessons learned.
Daisy’s documentation includes a step-by-step setup guide, which was clear and complete but assumed a fairly technical orientation. Setup requires installing Java and MySQL and then using MySQL to set up several databases using MySQL’s command-line interface prior to installing Daisy. The Daisy installation itself consists of setting several system parameters on your PC or Mac, executing a series of batch files, and following the prompts. The documentation suggests that this work can be done in about an hour. If you’re not familiar with the environment, allow two or three (a Windows installer may make this easier for Windows users).
To configure the system we set up document types along with metadata fields and entries (such as a ‘status’ field allowing selection of ‘draft’, ‘live’, or ‘archived’). We then set up a document collection and a Daisy site for each work group, as well as a “Home” site containing overview material and links to the separate workgroup sites. We also made some changes to the general look and feel by modifying the Wiki skin—adding our own logo, changing the font and the color scheme, and adjusting the page formatting.
The company’s infrastructure group installed Daisy on a production server, providing central technical support and daily backups. That group also set up an LDAP password scheme so that Daisy users could use their existing user IDs and passwords rather than needing new ones.
To get started on content implementation, we developed sample documents for key topics. We set up a simple set of categories for architecture documents with a drop-down list. Each document included fields for project area, project, subject matter expert(s), review status, and date. One of the biggest demands from developers, for example, was for project overviews that would explain business context and system design rationale, so we set up “Overview” as a document category and developed sample Overview documents.
We appointed an editor whose main job was to pull content together from other sources to start building the initial Developer’s Library. He wrote a series of how-to documents for authors, describing how to create and edit Daisy documents in our environment. He also offered one-on-one training sessions for anyone who had to author Daisy documents and served as a troubleshooter to resolve problems. We encouraged developers to author their own documents but also provided editorial assistance.
It was easy to copy and paste the text of Word documents into the Daisy editor. Clean up required relatively minor formatting changes. They appeared as Daisy HTML documents. If a Word document had a lot of custom formatting or graphics, we added it as an attachment with a brief description in HTML. The library also included Excel and PDF attachments.
The project was sponsored by technical managers, who actively promoted it. With this sponsorship, all content owners, without exception, participated by supplying existing documents, writing new documents, and making themselves available for interviews.
Within about six months, we built an initial library of 1500+ documents across more than 10 teams. Because we focused on topics that dealt with problem areas, this document set provided a critical mass with enough topics to be useful. This critical mass, in turn, resulted in a high level of awareness and a consistent level of use by document consumers, who recommended the library to each other. A key feature was the easy access to documents “in one place” through both browsing and full-text search.
Editorial support was a big factor in our ability to build a critical mass of content. Contributors were glad to spend time working with an editor who would set up and edit the Daisy documents. Relatively few developers became hands-on Daisy authors (though a few did).
In follow-up evaluations many developers credited the library for giving them access to information they had not had access to before. The information access led to an understanding of other teams’ projects. The library was rated as successful at raising the general level of common technical understanding.
The site was a clear success for document readers. Users appreciated a single point of access for all teams’ documents and found the multi-site structure intuitive. The metadata was simple to manage, based on authoring context (author plus current project) plus a few simple drop-down fields. Authors found the authoring tools accessible and easy to learn and use. The browser-based architecture made it a cinch to roll out and support since there was no software to install on user desktops.
On the other hand, document creation/editing was still done by a limited group. This limitation was partly because the environment was not familiar and partly because some HTML knowledge was required to use the editor. It was easy to train motivated authors who had basic Web/HTML understanding, but it was not easy to motivate people to be interested if they were not interested.
We could not have achieved the level of participation or produced the scope of content necessary to reach critical mass if we had not provided editorial support for subject matter experts. The editor also did substantive edits on many documents, which raised editorial quality and made presentation more consistent across the library. This editorial support made the library more usable.
Our experience with Daisy demonstrated that open source software can be powerful and robust, and it provided a hands-on way to learn about implementing a CMS. It also provided significant business benefits.
The steep learning curve for the technology was spread around—a few key people had to learn a lot. Most participants had to learn a little but did not have to become experts.
The truism proved true: software is not the solution. The system had to be integrated into the workflow of the organization. Our application was fairly simple, but changing people’s behavior required a combination of technology, managers’ pressure, and agreement about perceived benefits, as well as help and continuing external reinforcement.
It also became clear that one size CMS doesn’t fit all business cases. In addition to the Developer’s Library we also began considering two other CMS applications—Web content management and general business document management. Even though the core CMS functions were the same, it soon became clear that the specific requirements for each of these business cases had significant differences, requiring different solutions.
Open source provided us with a solid, reliable system. It had some rough edges, such as occasional problems with document formatting and issues with infrastructure configuration related to our security environment, but these were minor and possible to work around. Setting the system up required external knowledge about Linux and LAN infrastructures and information architecture.
We identified features that would have required additional development. It would have been convenient to have a vendor take care of those; we lived without them for the initial rollout. A vendor might also have provided up front advice and guidance about strategy and implementation.
The Daisy product offered several strengths. Daisy was designed to manage software documentation, so it was a natural fit for our project. The integration of components, including a database, search engine, web front end, WYSIWYG editor, and messaging system, was seamless. The repository architecture and the document model provided a flexible, structured platform for establishing our own document categories.
Our application, besides being directly useful, was a good learning project; it familiarized our writing and technology groups with setting up and managing a CMS infrastructure. Overall, the experience convinced us that learning about open source was a worthwhile investment.
About the Author
Peter Dykstra is founder and principal of MetaphorX LLC, a consultancy specializing in technical communications strategy and implementation.
He has more than 25 years experience in the field of user information design and information development, with experience as a teacher, journalist, writing manager, information architect, product information director, and software product director. He is available to Best Practices readers to talk about open source solutions for content management.