Home/Publications/Best Practices Newsletter/2013 – Best Practices Newsletter/Selecting a Translation Tool – Tips for choosing a tool for translating DITA files

CIDM

June 2013

 


Selecting a Translation Tool
Tips for choosing a tool for translating DITA files


CIDMIconNewsletter Rodolfo M. Raya, Maxprograms

Introduction

DITA is an XML vocabulary, but not just any XML. It has certain peculiarities that are not easy by an ordinary XML editor or translation tool to handle.

Like an XML editor that is good for authoring in DITA, a translation tool capable of properly handling DITA files should

  • be able to resolve DITA content references, like the conrefattribute or the keyref mechanism
  • be able to support DITA specializations, allowing the customization of translatable elements and attributes
  • understand the translate attribute

The Content Referencing Problem

The DITA file shown in Figure 1 has conref attributes that reference elements from the file shown in Figure 2.

Raya_Figure1

Raya_Figure2

An XML editor able to resolve the conref attributes would display that file in WYSIWYG mode as shown in Figure 3.

Raya_Figure3

For a technical writer working with DITA, it is important that the chosen XML editor resolves conref attributes and displays the referenced content.

For a translator, it is also essential to see the text being translated in a complete representation. If conref content is not resolved when translatable text is extracted from the DITA file, the translator will lack the necessary context to perform the translation task.

In Figure 4 you can see translatable text from Figure 1 extracted by a Computer Aided Translation (CAT) tool that supports DITA content referencing. In Figure 5 and Figure 6 you see the same text extracted by two tools that treat DITA documents as regular XML.

Raya_Figure4

Raya_Figure5

Raya_Figure6

The Figures include markers that represent the original DITA markup. In one case (Figure 4) you can see the actual text referenced by conref attributes; in the other pictures, you see only markers.

By using tools that extract complete sentences from your DITA sources, you give translators the context they need. Although this adds to the price you pay if your Localization Service Provider (LSP) charges you by words, the cost increase should be compensated by an improvement in translation quality that will require less review work.

The Customization Problem

DITA includes a set of DTDs and XML Schemas that contain almost all elements and attributes needed in a standard documentation project. Nevertheless, sometimes the standard set of elements and attributes is not enough and custom extensions are needed.

DITA has a standard extension mechanism known as “specialization.” DITA architects may modify the default set of DTDs and XML Schemas, following certain rules, to incorporate the pieces they need.

As DITA is becoming more and more popular, many translation tool vendors include configuration files for the XML filters of their tools that facilitate text extraction from standard DITA documents. Unfortunately, not all tools allow support for DITA specializations.

If you use specialization in your DITA projects, the translation tool used to process your files should

  • allow you to customize the list of translatable elements and attributes
  • allow you to incorporate your custom DTDs and XML Schemas in the tool’s XML catalog (if it uses one)

Even if you don’t use specializations, you may still require customized translations. For example, the standard <draft-comment> element is normally used for internal consumption, and readers of the published documentation almost never see its content. Thereafter, the element <draft-comment> for your content reviewers. Only if you or your LSP use customizable CAT tools will you be able to get the desired translations.

Dealing with the Translate Attribute

Sometimes you will include portions of text in your DITA files that should not be translated. To mark those pieces as untranslatable, you simply set the value of the translate attribute to no, as shown in Figure 7.

Raya_Figure7

Some translation tools simply ignore the translate attribute and extract the text for translation anyway.

Notice that the translate attribute should be used with block level elements (those that contain full paragraphs or sentences), like <p>. Setting the translate attribute to no in an element that appears in the middle of a sentence is a bad idea, because the translator working with the surrounding text still needs to see the element content for context. Figure 8 shows how you can safely protect untranslatable text that appears in the middle of a sentence by referencing a copy stored in an untranslatable element.

Raya_Figure8

A translation tool parsing Figure 8 should be able to

  • ignore the <title> element
  • include the word “untranslatable” when extracting the <p> element
  • ignore the <draft-comment> element

In Figures 9, 10, and 11, you can see how three translation tools interpreted the content of Figure 8:

  • All respected the translate attribute in <title>
  • Only one was able to include the referenced text in <p> for context
  • One of them presents the <draft-comment> element with nothing to translate in it.

Make sure your translation tool can ignore block elements that have the translate attribute set to no.

The File Handling Problem

A DITA project may contain hundreds of small files. That’s not unusual but normally makes file handling somewhat annoying.

When working with a large number of files, DITA teams may opt for using a Content Management System (CMS) or a version management system like CVS or SVN. A CMS is not really required for working with DITA but it may simplify project management.

A CMS may help you separate the files referenced by a DITA map and prepare a package for translation. If you don’t have a CMS, you may use a DITA-enabled translation tool for separating the files that need translation from those that don’t.

A DITA-enabled translation tool should be able to parse a DITA map and resolve the references to all topics and subtopics, preparing a unified package that you can send to your LSP.

If your LSP charges you for file management, you can reduce cost by preparing a consolidated translation package in-house. CIDMIconNewsletter

About the Author:

Rodolfo Raya

Rodolfo Raya
Maxprograms
rmraya@maxprograms.com

Rodolfo Raya is Maxprograms’ CTO (Chief Technical Officer), where he develops multi-platform translation/localization and content publishing tools using XML and Java technology. He can be reached at rmraya@maxprograms.com.

We use cookies to monitor the traffic on this web site in order to provide the best experience possible. By continuing to use this site you are consenting to this practice. | Close