Radu Coravu, Syncro Soft (Oxygen XML Editor)

My first contact with XML was about 12 years ago when, as a software engineering student, I had a couple days of practice in a company that had just started to work on an XML editor called Oxygen. I asked what XML was and what it was good for, but most software engineers are quite bad at explaining things so I was passed around until one of the partners in the company explained things to me. I was happy that XML looked similar to HTML because HTML was a very useful format for web site building, but then I became confused when I found out that it was not used directly in building web sites. Again, XML was described to me as a very good open-source way to store data in a presentation-independent format but this could also be done using open-source relational database implementations.

So there are, and have been, alternatives to XML and DITA in particular.

During the last few years, we’ve all heard about the advantages of DITA for creating technical content. But most (or even all) of the outputs we obtain from DITA are XHTML based: WebHelp, EPUB, Kindle, Windows CHM, and so on.

So have you asked yourself

  • Since our company publishes to XHTML or HTML-based formats and will continue to do so, why bother to create content in an intermediary XML format and then generate HTML?
  • Why not preserve and edit the content directly in XHTML?
  • Why go through the extra steps involved in producing and managing DITA XML content when it seems so much easier to simply write and produce the XHTML directly?

Topic-based editing is great—it allows you to reuse small topics in various places—but let’s face it, you do not need DITA for content reuse when you can write small topics directly in XHTML. The DITA Map is also a very good way to aggregate the topics in a larger publication but instead of a DITA Map you could use just an XHTML file which has references to each of the XHTML topics.

Taking only this into account, there are quite a few advantages to editing content directly in XHTML:

  • Direct control over the styling and tagging of elements. You know more precisely what the published output will look like; you can even open the edited HTML document in a web browser and refresh it from time to time to see how your work is progressing.
  • More people are familiar with XHTML. XHTML has been around for a long time; there are a lot more people available for creating a web page than for editing DITA.

But if you take the time to go into how each of the XHTML-based outputs is composed, you will soon discover there are problems with this approach:

  • The XHTML output formats might not all require the same XHTML version. For example, EPUB 3 requires XHTML 5 while EPUB 2 requires XHTML 1.1. This would mean that it is better to have a common editing vocabulary from which such slightly different formats would be published.
  • There are problems representing the table of contents information in a common manner. For example, the Windows CHM output has its own table of contents structure while the EPUB 2 output is based on using the OPF standard for the table of contents. The WebHelp-based output on the other side would work with a table of contents created with plain XHTML links.
  • There would be problems with metadata handling. For example, how would we go around marking index terms which would need to be treated in a special way in each output format?

Even with a pure XHTML-based approach to editing and publishing you still need some kind of post-processing, some specific ways to mark metadata in the content. There are editing tools which offer such complete XHTML-based editing-published approaches but once you start using them you are locked in. You cannot change the tool, and you cannot give the content to somebody who uses another tool. Although the content is mostly XHTML, the entire editing and publishing workflow is not based on a certain standard—it is proprietary.

Coming back to my original contact with XML, I was expecting an answer like this: “XML is the only way in which you can perform task X”. But this answer never came as XML was never the only way in which you can do a certain task, it was just a very good alternative.

I consider the same to be true about the DITA standard—it is not the only way you can do topic-based editing, it is just a very good way in which you can do it.

About the Author
Radu Coravu started working more than 10 years ago as a Software Architect for Syncro Soft ltd, the manufacturer of the popular Oxygen XML Editor. During the last couple of years, his main focus has been in the development of the visual XML Author editing environment and of the specific DITA support provided by Oxygen. He provides tech support for complex integrations and helps steer the product in the right direction, all this with some Java development on the side.