Terminology Management in a DITA Environment

Home/Publications/CIDM eNews/CIDM eNews 05.17/Terminology Management in a DITA Environment

Stefan Eike, DAKOSY AG

Terminology management means “to control the words that are used in written or spoken language”. Consider a fictitious global medical engineering company. Medical devices are under heavy legal controls, directives, and norms, which force the company to translate information. The company has several teams that produce textual information. In the Research and Development team, the product is designed and all parts of the product are named. The Technical Service team writes service manuals that explain how to install and service the medical device. The Software Development team develops user interfaces of the product in several languages. The Technical Writing team writes manuals. The Sales team maintains an enterprise resource planning system that contains all the product components and spare parts. The Marketing team designs web sites and marketing material that explain the product, its features and characteristics, and also maintains a web shop, where the product and its spare parts are sold. All textual information is translated into 30 languages, by translators who are not 100 percent familiar with the product itself, the subject domain, or the terms of the company. So there are many teams involved in creating different types of text for several target groups in several languages. Because the teams don’t harmonize and share terminology data, the company produces several multilingual synonyms that can only be corrected under high costs.

The technical writing team in our fictitious company creates DITA-XML-based documentation using the <oXygen/> XML editor. The team is extremely dedicated to reusing their DITA-XML modules as much as possible. Unfortunately, the team has problems with reuse because the terminology is inconsistent. The team decides to implement a terminology management for the whole company and starts to investigate the solutions.

In the planning phase, the team identifies the following requirements that the terminology solution must comply with:

  • As an author, I need a terminology checker that detects inadvisable terms in my documents when writing a text, because I cannot manually look up every single word.
  • As an occasional content author, I need a solution to manually look up terms and I need to know whether a word may not or must not be used and which word has to be used instead.
  • As a terminology manager, I need an easy-to-use solution that supports me to write down and publish terminology data, preferred or inadvisable terms, metadata, definitions, contextual information, terminology decisions, and semantical relations.
  • As a translator, I need to know how terms have to be translated.
  • As a manager, I need an inexpensive or free solution, because I only have a very small budget for terminology management.

The team determines that these requirements cannot be met without software. A team member considers using Microsoft Excel® or a MediaWiki, but the others disagree because not all requirements could be met with these tools. Another team member suggests a tool she has heard of, which is developed to work best in a DITA-XML environment with the <oXygen/> XML editor. She claims org.doctales.terminology could comply with the requirements. The team agrees to get granular on this piece of software.


The aforementioned tool with the unpronounceable name is an open source solution that hides away as a plugin for the DITA Open Toolkit (DITA-OT). It contains specialized DITA topic types that are needed to create a terminology database. The DITA Termmap topic keeps DITA Termentry topics together. It is a container for all Termentry topics, which are referenced in alphabetical order, and assigns keys to the Termentry topics.

A termenty topic represents a term and contains terminological metadata. It may contain a definition, a list of term committee members who agreed with the term, and term variants in different languages, that can be flagged as preferred or inadvisable term notations. The topic further contains a relations element that can be used to model several semantic relations to other terms, hyponyms (words whose semantic field is included within that of another word), or hypernyms (superordinate words with a broad meaning constituting a category into which words with more specific meanings fall) for instance.

org.doctales.terminology ships an <oXygen/> XML framework. This framework renders the Termentry topics as a form so that they can be edited easily without the need to be familiar with the XML structure (see Figure 1).

Figure 1: Example of the aforementioned topic as a form

The plugin contains DITA-OT transformations to convert the terminology database to other data formats.

  • A termchecker (Schematron stylesheet) can be used to find inadvisable terms in DITA topics or in XLIFF-XML documents (see Figure 2). XLIFF is a translation data exchange format that can be used for translating DITA.

Figure 2: Example of a termchecker

  • A TBX-Basic or TBX-Min file can be generated. Termbase Exchange (TBX) is a file format to exchange terminology data between the company and a language service provider (LSP) for instance.
  • A termbrowser can be generated. A termbrowser is a special web site for navigating through the terminology database. A demo with sample data can be found at https://doctales.github.io/samples/termbrowser-responsive/index.html. The termbrowser is generated using the standard <oXygen/> XML classic or responsive webhelp transformation or using the standard HTML5 DITA-OT transformation (see Figure 3).

Figure 3: Example of a termbrowser

  • A semantic net can be automatically generated that visualizes term relations (see Figure 4).

Figure 4: Example of a semantic net

  • Terminology statistics can be automatically calculated visualizing terminology figures.

Figure 5: Example of terminology statistics

The team is very happy because the plugin complies with all requirements. Fortunately, they use the latest versions of the DITA-OT and <oXygen/> XML and therefore can easily start using the plugin. They download the plugin from GitHub https://github.com/doctales/org.doctales.terminology, install it following the documentation on https://doctales.atlassian.net/wiki, and start building their terminology database. To support the development of the plugin and to get started even faster, they contact the DOCTALES team and book a personal webinar.