xml:tm – XML Document Text Memory

Andrzej Zydroń , xml-Intl Ltd.

xml:tm (XML-based Text Memory) is the vendor-neutral open XML standard for embedding text memory within an XML document. xml:tm leverages the namespace syntax of XML to embed text memory information within the XML document itself. xml:tm provides a radical new approach to the task of authoring and translating XML documents. To learn more about xml:tm, please read “Using XML technology to reduce the cost of authoring and translation”, and “How to Leverage the Maximum Potential of XML for Localization” by Andrzej Zydron.

At the core of xml:tm is the concept of “text memory”. Text memory comprises two components:

1. Author Memory
2. Translation Memory

Author Memory

XML namespace is used to map a text memory view onto a document. This process is called segmentation. The text memory works at the sentence level of granularity – the text unit. Each individual xml:tm text unit is allocated a unique identifier. This unique identifier is immutable for the life of the document. As a document goes through its life cycle the unique identifiers are maintained and new ones are allocated as required. This aspect of text memory is called author memory. It can be used to build author memory systems which can be used to simplify and improve the consistency of authoring.

Translation Memory

When an xml:tm namespace document is ready for translation the namespace itself specifies the text that is to be translated. The tm namespace can be used to create anOASIS XLIFF document for translation. xml:tm allows for much more focused and better defined translation memory matching:

Exact Matching

Author memory provides exact details of any changes to a document. Where text units have not been changed for a previously translated document xml:tm provides the basis for declaring an “Exact match” with the previously translated target language document.

In document leveraged matching

xml:tm can also be used to find in-document leveraged matches

Database Leveraged matching

When an xml:tm document is translated the translation process provides perfectly aligned source and target language text units. These can be used to create traditional translation memories.

In document fuzzy matching

The text units contained in the leveraged memory database can also be used to provide fuzzy matches of similar previously translated text from within the same document.

Fuzzy matching

The text units contained in the leveraged memory database can also be used to provide fuzzy matches of similar previously translated text.

Non translatable text

Text units that are made up solely of numeric, alphanumeric, punctuation or measurement items can be identified during authoring and flagged as non translatable, thus reducing the translation count metrics.

Interoperability with other Localization Industry Standards

xml:tm was designed from the outset to integrate closely with and leverage the potential of other XML based Localization Industry Standards as well as that of XML syntax itself. In particular:

SRX (Segmentation Rules eXchange)
xml:tm mandates the use of SRX for text segmentation of paragraphs into text units.

Unicode Standard Annex #29-9
xml:tm mandates the use of Unicode Standard Annex #29 for tokenization of text into words.

XLIFF 1.2
xml:tm mandates the use of XLIFF for the actual translation process. xml:tm is designed to facilitate the automated creation of XLIFF files from xml:tm enabled documents, and after translation to easily create the target versions of the documents.

GMX-V (Global Information Management Metrics eXchange – Volume)
xml:tm mandates the use of GMX-V for all metrics concerning authoring and translation.

TMX (Translation Memory eXchange)
xml:tm facilitates the easy creation of TMX documents, aligned at the sentence level.

DITA (Darwin Information Technology Architecture)
xml:tm complements the DITA standard by allowing text reuse at the sentence level within DITA documents.

W3C ITS
xml:tm mandates the use of W3C ITS Document Rules for identifying translatable text within an XML document as well as W3C ITS Best Practices with regard to XML document localization.

Implementation

The effective implementation of xml:tm benefits greatly from the existence of an environment which provides the ability to store and retrieve previous source and target language versions for a given XML document. Such an environment is usually provided by a Content Management System (CMS).

Download xml:tm

xml:tm has been approved on 21st July 2006 by the OSCAR Steering Committee for public comment prior to final ratification as a standard. Its contents and format may change prior to official adoption. The current version (July 21, 2006) can be downloaded (ZIP file) or viewed online here.

This article has been reprinted from http://www.lisa.org/standards/xmltm/

© 2005 LISA All Rights Reserved

 

We use cookies to monitor the traffic on this web site in order to provide the best experience possible. By continuing to use this site you are consenting to this practice. | Close