Selecting a Translation Tool
Tips for choosing a tool for translating DITA files
DITA is an XML vocabulary, but not just any XML. It has certain peculiarities that are not easy by an ordinary XML editor or translation tool to handle.
Like an XML editor that is good for authoring in DITA, a translation tool capable of properly handling DITA files should
- be able to resolve DITA content references, like the
conrefattribute or the
- be able to support DITA specializations, allowing the customization of translatable elements and attributes
- understand the
The Content Referencing Problem
The DITA file shown in Figure 1 has
conref attributes that reference elements from the file shown in Figure 2.
An XML editor able to resolve the
conref attributes would display that file in WYSIWYG mode as shown in Figure 3.
For a technical writer working with DITA, it is important that the chosen XML editor resolves
conref attributes and displays the referenced content.
For a translator, it is also essential to see the text being translated in a complete representation. If
conref content is not resolved when translatable text is extracted from the DITA file, the translator will lack the necessary context to perform the translation task.
In Figure 4 you can see translatable text from Figure 1 extracted by a Computer Aided Translation (CAT) tool that supports DITA content referencing. In Figure 5 and Figure 6 you see the same text extracted by two tools that treat DITA documents as regular XML.
The Figures include markers that represent the original DITA markup. In one case (Figure 4) you can see the actual text referenced by conref attributes; in the other pictures, you see only markers.
By using tools that extract complete sentences from your DITA sources, you give translators the context they need. Although this adds to the price you pay if your Localization Service Provider (LSP) charges you by words, the cost increase should be compensated by an improvement in translation quality that will require less review work.
The Customization Problem
DITA includes a set of DTDs and XML Schemas that contain almost all elements and attributes needed in a standard documentation project. Nevertheless, sometimes the standard set of elements and attributes is not enough and custom extensions are needed.
DITA has a standard extension mechanism known as “specialization.” DITA architects may modify the default set of DTDs and XML Schemas, following certain rules, to incorporate the pieces they need.
As DITA is becoming more and more popular, many translation tool vendors include configuration files for the XML filters of their tools that facilitate text extraction from standard DITA documents. Unfortunately, not all tools allow support for DITA specializations.
If you use specialization in your DITA projects, the translation tool used to process your files should
- allow you to customize the list of translatable elements and attributes
- allow you to incorporate your custom DTDs and XML Schemas in the tool’s XML catalog (if it uses one)
Even if you don’t use specializations, you may still require customized translations. For example, the standard
<draft-comment> element is normally used for internal consumption, and readers of the published documentation almost never see its content. Thereafter, the element
<draft-comment> for your content reviewers. Only if you or your LSP use customizable CAT tools will you be able to get the desired translations.
Dealing with the Translate Attribute
Sometimes you will include portions of text in your DITA files that should not be translated. To mark those pieces as untranslatable, you simply set the value of the
translate attribute to
no, as shown in Figure 7.
Some translation tools simply ignore the
translate attribute and extract the text for translation anyway.
Notice that the
translate attribute should be used with block level elements (those that contain full paragraphs or sentences), like
<p>. Setting the
translate attribute to
no in an element that appears in the middle of a sentence is a bad idea, because the translator working with the surrounding text still needs to see the element content for context. Figure 8 shows how you can safely protect untranslatable text that appears in the middle of a sentence by referencing a copy stored in an untranslatable element.
A translation tool parsing Figure 8 should be able to
- ignore the
- include the word “untranslatable” when extracting the
- ignore the
In Figures 9, 10, and 11, you can see how three translation tools interpreted the content of Figure 8:
- All respected the translate attribute in
- Only one was able to include the referenced text in
- One of them presents the
<draft-comment>element with nothing to translate in it.
Make sure your translation tool can ignore block elements that have the
translate attribute set to
The File Handling Problem
A DITA project may contain hundreds of small files. That’s not unusual but normally makes file handling somewhat annoying.
When working with a large number of files, DITA teams may opt for using a Content Management System (CMS) or a version management system like CVS or SVN. A CMS is not really required for working with DITA but it may simplify project management.
A CMS may help you separate the files referenced by a DITA map and prepare a package for translation. If you don’t have a CMS, you may use a DITA-enabled translation tool for separating the files that need translation from those that don’t.
A DITA-enabled translation tool should be able to parse a DITA map and resolve the references to all topics and subtopics, preparing a unified package that you can send to your LSP.
About the Author:
Rodolfo Raya is Maxprograms’ CTO (Chief Technical Officer), where he develops multi-platform translation/localization and content publishing tools using XML and Java technology. He can be reached at email@example.com.