A Gentle Introduction to Topic Maps

CIDM

June 2003


A Gentle Introduction to Topic Maps


CIDMIconNewsletter H. Holger Rath, Head of Consulting, empolis GmbH

Challenges of the Information Age

We are living in the information age, and we are part of the information society. Millions of people all over the world surf the Internet every day-for fun or as part of their job. A large part of the economic system of the industrialized world relies on electronic information stored and managed in computers and interchanged through networks. The amount of electronically available information doubles every two years. The over abundance of rich information, not the lack of information, causes discomfort. The info glut is becoming a daily challenge.

The promise of the information society

Deliver the right information to the right person at the right time.

has turned to a threat under the avalanche of information that daily crowds our lives.

But what is giving rise to this avalanche? The simple fact that we have access to not just a single information resource on any given topic but to many. Information on a given topic could reside in company database records, company documents, parts of documents, Web pages, images, videos, and so on, coming from different sources and different repositories. We need to know the context in which the information resource is embedded to validate its relevance to our enquiry. We need pointers to related information as well. We want to know how the found resource fits in the larger picture. We want to extract knowledge from information in context.

These desires lead to a more refined objective for the information society:

Deliver the right information in the right context to the right person at the right time.

Historical Approaches to Manage Info Glut

Let’s investigate how humans tried to manage the “info glut” challenge in the past-before computers were available. Information was collected in physical containers called books, and books were collected in even larger physical containers called libraries. To locate specific information on a specific topic, we needed to access a special kind of information that told us about the books and the kind of information they contained. A library catalogue helped us to find (locate) the books related to our topic of interest on the shelves. To actually retrieve (locate) the needed text portion we used one, or often a combination, of four different paradigms: following the hierarchical structure of the table of contents, reading the text from beginning to end, browsing through the pages, and looking up the back-of-book index.

The back-of-book index is probably the most powerful paradigm to find certain information in a book for several reasons:

  • The index is a collection of terms representing the relevant subjects explicitly or implicitly covered by the text.
  • The index is a surrogate, a kind of semantic fingerprint of the book’s content.
  • The index is a result of an intellectual process in which a human selects only those subjects from the text that are relevant for the target audience.

Whatever is important in a book should be found in its index. All unimportant “noise” existing in the text has been filtered away. Searching for something in a book can be reduced to searching for it in the index. If it is not in the index, you can be quite sure that the book does not cover it. And if it is in the index, precise pointers to the pages (page numbers) or sections (section numbers) guide you to the information you are looking for.

Why Topic Maps?

If we were looking for a paradigm on which to build an information-locating system, then the purpose-built index would be our model of choice. A purpose-built indexing approach, such as we would find in the back of a book, is an information-locating paradigm that brings information into a meaningful context and supports various ways to organize information resources. As you probably guessed: topic maps are such a paradigm.

Topic maps were developed by a committee of the International Organization for Standardization (ISO) and are published as the International Standard ISO/IEC 13250. They are designed to manage the info glut, build valuable information networks over any kind of information resources, and enable the structuring of unstructured information. A topic map can be seen as an electronic super index, implementing the back-of-book index paradigm and much more.

Charles F. Goldfarb, the inventor of SGML and father of mark-up languages, branded topic maps the “GPS of the information universe.” As the Global Positioning System helps avoid getting lost in physical space and is able to guide you to a target point, a topic map says where you are in your information space and where to go to find what you are looking for (see Figure 1).

figure 136

Figure 1. “GPS of the Information Universe

A Simple Example

A simple example is a-fictitious-back-of-book index of the travel guide, Guide to the British Virgin Islands (see Figure 2). A back-of-book index was selected as an example because the topic maps paradigm was invented to model back-of-book indices electronically.

figure 222

Figure 2. Back-of-Book Index Example Introducing the Basic Topic Maps Concepts

The index entries are topics representing the concepts (subjects) covered by the guide that are relevant to the target audience and selected by a human in an intellectual process. For the reader’s convenience, the topics carry a human readable name.

The page numbers are pointing to the pages (resources) containing relevant information about the topic. These are the occurrences of the topic.

Different formatting of topic names (for example, regular font and colored italic font) helps to differentiate the various kinds of topics (topic classes). Topic classes provide a classification to simplify the finding of topics. Because our example is about the British Virgin Islands, highlighting the islands in the index makes sense.

Different formatting of page numbers (for example, regular font and colored bold font) helps to distinguish between various kinds of occurrences (occurrence classes). Occurrence classes provide some hints about the kinds of resources the reader can expect when following the occurrence link. The page numbers 71 and 82 point to city maps of Road Town and Spanish Town; all other page numbers might point to arbitrary text-based kinds of resources.

The “see” relationship illustrates multiple topic names and indicates that these are just synonyms for the same subject, which is represented by the topic. Gorda Sound and North Sound are two names for the same bay-the physical thing-in the north of Virgin Gorda.

The “see also” relationship is different-even if it looks quite similar. A “see also” relationship expresses an association between topics. We do not know the kind of association-if you know the British Virgin Islands, you would know that Road Harbour is the harbor of Road Town-we just know that they are somehow related and that we might also take a look at the topic Road Town when we are interested in Road Harbour. Associations found in a topic map can be instances of a certain declared class (for example, the association class “is harbor of city”).

Good printing practice prohibits the use of many different formatting styles in a printed index. But a topic map could have as many topic classes, occurrence classes, and association classes as required by the application.

The Topic Maps Paradigm

The simple example showed that a subject is a concept, notion, idea, and so on that is worthy of becoming a topic in a topic map. The topic maps standard does not predefine candidate subjects. Quite the contrary, the definition of subject in the standard is as general as it could be:

In the most generic sense, a “subject” is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.

– ISO/IEC 13250

The generality of the definition enables topic maps to be applied to any application domain.

Subjects: The starting point
Subjects are a result of an intellectual process performed by topic map authors to model information from resources and knowledge from people. It is essential to understand that subjects are the “things” we, the humans, have in mind when we start building a topic map. And these things are outside the computer with all its software, files, databases, and networks-the computer does not “know” the subjects. For example, all the islands, bays, towns, shipwrecks, and reefs of the British Virgin Islands are physically located somewhere in the Caribbean, but they are not inside someone’s computer. They are not part of the topic map; they are concrete parts or abstract concepts of our world.

Topics: Computerized subjects
A topic in a topic map represents a subject inside the computer. Whenever a topic is created representing a subject, the subject becomes a machine interpretable “object.” Once in this form, further assertions can be made about it in an explicitly coded electronic form, the topic map. This was not possible for the subject, the thing, because it was outside the computer (see the sidebar, The Distinction Between Subjects and Topics).

After the topic map author has created a topic, he can assign characteristics to it: names, occurrences, and associations. Furthermore, a topic can be an instance of one class or of multiple classes (see Figure 3).

figure 325

Figure 3. Topic Examples

Names: Talking to topics
Names let humans talk about topics. Because a topic can have multiple names, we can use them for various purposes-mainly to list synonyms and name translations in different languages or dialects of the subject.

Some base name examples:

  • “Gorda Sound” and “North Sound” (synonyms)
  • “Virgin Islands,” “Jungferninseln,” and “Iles Vierges” (translations)
  • “Taxi” and “cab” (dialects)
  • “Radio set,” “walkie-talkie,” and “VHF” (technical terms)

Occurrences: Pointing to relevant information resources
Occurrences bind relevant information resources to topics. Whenever a resource provides information about a topic, it should be considered to become an occurrence of the topic.

An occurrence can be an instance of one class. Figure 4 shows some occurrences and their classes.

figure 427

Figure 4. Occurrence Examples

Topic maps use a notation called XLink/XPointer URI (Universal Resource Identifier) to address a resource. This works similarly to HTML hyperlinks. Consequently, everything you can address in the Internet or in your corporate intranet can become a topic’s occurrence.

Associations: As we may think
Associations provide the context information necessary to better understand a topic. Associations simulate the way humans think and as such are essential for knowledge modeling. They establish relationships between topics.

The number of topics related by one association is not limited-it could be only one topic (rather academic), two topics (so called binary associations, which are the most common associations), three topics, or as many as the application requires. An association can be an instance of one class.

Associations do not imply a direction of the relationship. Associations are assertions, statements, which are valid independently from the direction you traverse them. But how do we know what roles the topics play in the relationship?

The concept of association roles provides the missing piece. Figure 5 illustrates that the association roles (“Ferry Line” and “Harbour Stop”) placed between the topics (“Speedy’s,” “Spanish Town,” and “Road Harbour”) and the association (which is of the class “Ferry Connection”) explain what the topics “are doing” in the association. The topics play certain roles in the association.

figure 528

Figure 5. Example Showing a Ternary Association and Its Roles

Classes: Organizing principles
We learned that topics, occurrences, and associations might be instances of classes. All these classes are represented by topics. These “classing” topics might, again, be instances of other classes, which means that a topic could be a class and an instance at the same time.

But an application needs more than just the two layers of classes and their instances. If you want to model a taxonomy or classification schema, you need a class hierarchy consisting of classes organized as superclasses and subclasses. For example, the “Bay” class has subclasses “Bay for swimming” and “Anchor bay,” and the “Island” class has the superclass “Land.”

Class hierarchies can be built for topic classes (for example, “Eating place-Restaurant-Gourmet temple”), association classes (for example, “Eating place provides Food-Restaurant serves Meal-Gourmet temple celebrates Five course menu”), and occurrence classes (for example, “Image-Photograph-Food photography”).

Modeling viewpoints with scope
The concept of scope was added to the topic maps paradigm to deal with the fact that there is rarely one view of the “world”-the application domain. Different people expect different assertions about different subjects. Typically, scopes are used to model

  • Languages: Names and occurrences are scoped by the language they are in.
  • Access rights: Occurrences and associations provide access to further information, to resources or other topics. Scopes on such occurrences or associations can declare the level of confidentiality or which user groups have the right to access this information.
  • Views: Associations, as well as occurrences, provide context information about a given topic. Scopes can declare in which contexts this further information is meaningful. The context could be the skill levels of the user, the various interests of the users, or the effectivity/validity of an assertion.

Scopes help to filter the “noise” in large topic maps and allow us to concentrate on the interesting parts. They help to build semantic slices through the topic map.

Figure 6 shows various scope sets assigned to the names, occurrences, and associations of an example topic map. Figure 7 shows the same topic map with the applied scope setting “English or Public or Politics.” All topic characteristics that are not in one of these scopes are hidden from the user. Nevertheless, they are still part of the topic map.

figure 630

Figure 6. Scope Examples

figure 731

Figure 7. Example of Topic Map with Applied Scope Setting “English or Public or Politics”

Merging topic maps
Merging is the process of joining two topic maps or joining two topics. It is built into the topic maps standard because merging of indices was the requirement initiating the development of the topic maps paradigm.

When two topic maps are merged, all topics with the same subject identity or those complying with the topic naming constraint (same name in same scope) are merged. When two topics are merged, the characteristics of the resulting topic are the union (set) of the characteristics of the original topics, which implies that all redundant names, occurrences, and associations will be removed. Figures 8 and 9 show two topic maps before merging and the merged topic map as the result of merging.

figure 833

Figure 8. Merging Example: Two Topic Maps before Merging

figure 934

Figure 9. Merging Example: One Topic Map after Merging

Family of Topic Maps Standards

The ISO committee JTC1 SC34 started two further standard initiatives in 2001: Topic Maps Query Language (TMQL, ISO/IEC 18048) and Topic Maps Constraint Language (TMCL, ISO/IEC 19756). Both are still under development, and first drafts will probably not be published before end of 2003.

  • TMQL aims to get the same status for topic maps management systems as SQL has for RDBMS-a standardized interface to query, create, and update a topic map.
  • TMCL will define a framework for the definition of topic maps schemas for vertical or domain-specific applications. It will enable semantic validation and guided editing of topic maps.

The committee also decided to define two data models for the topic maps paradigm. The models will provide precise data helping software developers to better understand the topic maps paradigm and to avoid misinterpretation of the standard’s text:

  • Reference Model (RM) defining the underlying foundation for general assertion structures
  • Standard Application Model (SAM) defining the topic maps data model in terms of an information set.

Typical Topic Maps Applications

I’ve shown why topic maps are useful and explained their concepts, but I haven’t talked a lot about possible topic map application scenarios.

Subject classification
The most obvious applications are subject classifications or classification schemas. Classifications are a major approach to organize resources and to simplify access to them. They are a key feature of knowledge management.

The topic map fulfils two functions at the same time: it represents the classification schema with its classes, class hierarchy, class codes, and cross relationships, and it assigns the resources into the schema. Because these occurrences can point to various repositories, a topic map-based classification can easily span multiple systems providing one classified view on all resources.

Knowledge representation and ontologies
The British Virgin Islands examples represented the knowledge about that specific application domain. It is a simple ontology, an explicit model of the domain knowledge.

A more typical application of knowledge representation is the corporate memory used in enterprise knowledge management. It models the knowledge about products, projects, people, policies, processes, and practices and provides it to employees.

Document management and content management
Topic maps embedded into document-management, content-management, and Web content-management systems help to organize and manage the information objects and to simplify the publication process. Multiple classifications-instead of the ordinary one-dimensional hierarchical directory tree-model the various business views on the information objects. Every information object carries a set of metadata values, which define all business information about the object in a declarative way. Templates use this metadata to define how publications are assembled from information objects ensuring the highest degree of content reuse and flexible publication of many document variants-as promised by content-management systems for a long time.

Search engines
Complete Web portals could be described as topic maps simplifying the creation and maintenance and improving the consistency, as well as quality, of the site.

“Searching powered by topic maps” could be the slogan of search engines already using the paradigm to improve their query results. Intelligent “find” technologies are the result. But not every search technology is prepared to benefit from topic maps. Only those technologies that make use of explicit knowledge models can easily migrate to topic maps. Others based on, for example, statistical algorithms do not have a knowledge model and, consequently, cannot apply the paradigm.

Typical knowledge models of intelligent search engines are based on concept hierarchies with synonyms and may be weighted similarities between concepts. With topic maps, the knowledge models could be represented in a standardized notation instead of in a proprietary format.

Who Is Using Topic Maps?

Topic maps are quite new phenomena, but several industries already apply them or will soon make use of them. The flexibility and expressiveness of topic maps, as well as the fact that they are an ISO standard, make them very attractive.

Commercial publishers are very interested, because topic maps give them a standard at hand to add value to their content.

Web portal providers use topic maps to organize their Web site and to provide clear and consistent navigation patterns.

The industry applies topic maps in call centers supporting the call center agent or the customer. A corporate memory is a further application in the industrial sector. And very innovative companies already use topic maps as a next generation content-management paradigm, or they are at least investigating the approach.

Conclusions

Topic maps are the “GPS of the Information Universe.” Coming from back-of-book indices, the paradigm defines the necessary concepts to model explicit knowledge structures over resources. Topic maps are simple but not too simple. They are lightweight but have the potential to grow together with the demands of the information age. They are an international ISO standard ensuring stability, reliability, and openness-all important for a secure information technology investment.

Possible topic maps applications range from a simple electronic index or thesaurus to subject classification and Web site organization to knowledge representation and ontologies. Topic maps can be the “brains” of intelligent search engines. Corporate memory and other key aspects of a knowledge-management solution can also benefit from topic maps. Although quite a new standard, topic maps have left the “ivory-tower” of theoretic thinking and have become a “real world” phenomenon-with real products, real projects, real solutions, and a growing list of user experience.

In this article, I explained the topic maps paradigm. But the paradigm is a technology, not a solution. The design, creation, and maintenance of topic maps, their implementation in topic maps software, and their integration with other software components are all necessary tasks to building a complete solution. CIDMIconNewsletter

About the Author

June BP48