The Information Developer’s Ontology
As information developers we face challenges in organizing information effectively and making it accessible to customers. We need a standard set of domain-specific terminology to work with so that readers are not confused by inconsistently used terms. We need to produce good indexes for PDF and print documents that reveal the structure of the documents and help readers find what they need. We need to design online navigational structures that allow users to browse to the content they want to read.
To satisfy our users’ need to access information efficiently we are told to create metadata, taxonomies, indexes, keywords, topic maps, tables, graphics, etc. We are confused by a myriad of ways to facilitate access. But to represent our information to make it as accessible as possible to users, we need to understand that each information set is part of some domain of knowledge that has a unique natural structure. Information specialists or software cannot dictate how to best classify information, but instead, each of us as information developers must understand the information domain in which we work and discover its natural structure. We can represent this natural structure as an ontology.
Recently, I have attended some lectures and read a few articles about ontologies. All of the speakers and authors stressed the importance of ontologies in the organization and retrieval of information. All of them emphasized how profound the concept of ontology was. I could not understand any of it. I began to suspect that the speakers and authors didn’t understand the concept of ontology very well either!
What Is an Ontology?
Following is a selection of attempted definitions. Wikipedia gives separate philosophical and computer science definitions. From philosophy, “Ontology is the study of being or existence.”
The computer science definition given by Wikipedia is “a description of the concepts and relationships that can exist for an agent or a community of agents,” (attributed to T.R. Gruber). Outside of Wikipedia, Natalya Noy and Deborah McGuiness provide a somewhat more understandable definition (also attributed to T. R. Gruber). “An ontology is an explicit formal specification of the terms in a domain and relations between them.” Note that in philosophy, ontology is a discipline (study of) while in computer science it is a specification. Hopefully, you’ll understand the relationship between the philosophical and computer science uses of the term as I’ve finally understood them in preparation of this article. None of these definitions is very action oriented. I’ve done some research on this scary looking term and I hope I can explain what we, as information developers, need to know, in plain English.
Type “Ontology” into Google, and you get 48 million hits. (Apple Pie gets only 20 million hits). Obviously, lots of people are using the word. I’ve waded through a number of rather academic and obtuse articles about ontology so that you don’t have to. I think the best way to understand the concept of ontology is through examples. It turns out that many of us are creating ontologies without knowing it.
Empedocles Fire Water Earth Air Aether
An early example of an ontology is the work of the Greek philosopher, Empedocles, about 450 BC. Empedocles classified everything in the universe into four categories: earth, water, air, and fire. It’s obvious how to classify things like solids, liquids, or gases in his scheme. However, things like the human body are more complex but can still be classified. The human body is a combination. Our flesh is earth, our blood is water, our breath is air, and our body warmth is fire. Of course, Empedocles knew that blood differed from water, but even blood might be a combination of earth and water. Aristotle added a fifth classification, Aether, to describe starlight and the celestial sphere which were forever out of reach. These concepts may all seem old fashioned and quaint by our standards, but if we put them into modern language, we get solids, liquids, gases, and energy, still convenient classifications for chemists and physicists.
Why is the Empedocles scheme an ontology? A requirement for an ontology is that it is able to classify everything in the universe or at least the universe of a single domain. Empedocles intended his scheme to include everything in the universe. His ontology is not just a sorting of items but represents a statement about the structure of the universe. That’s how his ontology relates to the philosophical meaning of ontology: it is a study of what exists.
The Taxonomy of Carl Linnaeus
In much more modern times, the 18th century Swedish botanist, Carl Linnaeus developed an ontology for all living things. Linnaeus had no clue about evolution, since Charles Darwin lived a century later, but he noticed that all plants and animals had close similarities with other plants and animals. He believed that God had achieved diversity by making small changes to the basic structure of organisms. Linnaeus chose the similarities in plant and animal reproduction as well as other similarities to develop his classification. The Linnaeus scheme is a taxonomy. I use the narrow definition of taxonomy as a tree structure in this article rather than its broader use as a collection of metadata. His most general classification is to divide all of life into three kingdoms: animal, plant, and fungus. All organisms fall under one of these kingdoms. Because his taxonomy included all known organisms as well as organisms not yet discovered, he had created a true ontology. The structure of his classification differed from that of Empedocles because he continued to make finer and finer divisions. The kingdoms were divided into phyla, which were divided into classes, and then divided into orders, families, genera, and finally species. Figure 1 is an example of the classification for humans.
The American crow and humans are classified in the same phylum, chordata. Remember that this classification in no way implies any evolutionary development; Linnaeus was unaware of any of the many species of prehistoric humans!
Figure 1. Relationship of American Crows and Humans
Figure 2. First four rows of the Periodic Table
Dmitri Mendeleev’s table
A third example of an ontology is the Periodic Table of the Elements, the first four rows of which are included in Figure 2. In 1872, Dmitri Mendeleev created a table of all of the known elements. He ordered the elements in the periodic table by their atomic weight, that is, the weight of a single atom. Hydrogen has an atomic weight of 1, helium 4, lithium 7, and so on. He found that if he properly organized the rows and columns in his table, he could create columns of elements with similar chemical properties. If he repeated his rows properly, hydrogen, lithium, sodium, and potassium would all be in the same column. He arranged the table this way to demonstrate that all elements in the same columns have similar chemical properties.
Mendeleev’s table is an ontology. All known elements were included. Many new elements have been added since. They all fit perfectly in the table. Mendeleev’s table was created at a time when nothing was known about the internal structure of atoms. The electron was discovered in 1897 and the proton in 1918. The periodic table became instrumental to our understanding of the internal structure of the atom after discovery of the electron, proton, and neutron.
The Nortel Information Model
A corporate example of an ontology comes from Nortel in the 1990s. In order to facilitate use of its telecom products in the telecommunications industry, Nortel wanted all of its information organized in a way that would facilitate the greatest access for its diverse user community. The Nortel ontology is in the form of a three dimensional table.
Each column on the table in Figure 3 on page 102 represents a kind of activity related to Nortel’s telecom hardware and software. These activities are job related. A telecom user will probably only be interested in a few of the columns. But all possible telecom activities are listed in the columns. Some column names are Customer Support, Technology Fundamentals, Plan & Engineer, Install Hardware, Install Software, etc.
Figure 3: The Nortel Networks Information Model
The table rows are organized by category of use. Some row headings are Fundamentals (concepts), Tasks, Troubleshoot, Tools and Utilities, Verify, etc.
Each of Nortel’s products is represented by a complete table. The structure of activities for all of Nortel’s products is therefore three dimensional. The model is a model of activities structured in the way its product users perform their jobs. Nortel was able to use the model to develop and structure its information sets to match the natural information needs of its customers.
As new products are developed, activities related to those new products will fit into this ontology as another two dimensional sheet.
What do Ontologies Have in Common
What can we say about ontologies from these examples?
- Ontologies are more than collections of entities. They can contain a place for all of the entities known in a domain as well as entities not yet existing or discovered.
- The structure of an ontology contains more information than just the sum of its entities. In the case of the Linnaeus taxonomy, Darwin was able to use this ontology along with his own observations to understand that the taxonomy was a map of the evolution of organisms. The mother-daughter relationship exhibited by the Linnaeus taxonomy was not only a convenient way to classify similar organisms but actually a description of a descent of each organism from its ancestors. The structure of the periodic table provides valuable clues to an understanding of the internal structure of the atom even though Mendeleev had no idea that atoms even had an internal structure. Not all ontologies will lead to scientific revolutions but all have a natural structure.
- The content of a domain drives the structure of the ontology. A taxonomy would not have worked for the periodic table, and a table would be a poor choice for the classification of organisms. Of course, we can always display a taxonomy graphically as a table or a table as a taxonomy. The display format, however, doesn’t change the structure of the ontology.
- The terminology used in an ontology is common to all of its users. In the case of my scientific examples, international scientific organizations manage the terminology. When a new organism is discovered, all biologists agree on its Latin name. That doesn’t mean that the terminology is static. When the panda was determined to be genetically a bear, its place in the taxonomy had to be changed. Changes are continuously being made to the placement of organisms as scientists refine their genetic relationships.
How Can Information Developers Use the Concept of an Ontology?
So what does all of this have to do with the information-development field? We are classifying and structuring information all the time. We develop navigation schemes for our data, whether on paper or online. Many of the structures we use to create our navigation schemes might be used to create an ontology, if appropriate to the domain structure. Both the Dewey decimal system and the Library of Congress classification system are ontologies. Any book ever written can fit into their classifications.
Most of us have created indexes. These are valuable navigational tools but usually are not ontologies because they apply only to one document. But anyone who has created an index or used an index knows that they have a structure dictated by the content of the document. There are good and bad indexes depending upon how well the index structure matches the natural structure of the content. A good index contains a wealth of information in its structure. In fact, with a good index, it is possible to learn a lot about the content of a book just by studying the index.
Topic maps are like indexes except they are normally thought of as online navigational tools. Unlike indexes, topic maps are not tied to particular documents but instead apply to a domain of information. Therefore, topic maps are ontologies. Like indexes, they can have a wide variety of structures.
We have all been frustrated both as users and information developers by the difficulty of using navigation schemes. In fact, some of us even favor a full-text search over navigation because of this frustration. At least with a full text search we feel in control!
Recently, I visited a music store just before Christmas because I wanted to buy a CD of Frederick Handel’s Messiah for a gift. As I browsed the store’s collections, I found that all the CDs were organized by performer. This organization was used for classical as well as popular music. Since Handel no longer performs his own works, I was frustrated. Finally I left to find a store where music was organized by composer. The manager of the store must have been puzzled about why his classical music sales were so low. By not understanding the differences in structure between classical and popular music information, the store management made half of its offerings inaccessible. I think we can learn some lessons from the concept of ontology to improve the navigation to content as well as structure of the writing we create for ourselves and our customers.
Know the information you will be writing about. Too often we see that information developers try to paraphrase information about products from product developers rather than gain an understanding of the information themselves. They let the structure of the information default to the product developers who may have no idea that the notes they are providing will define the design as well as the content that will reach the users of the information. Product developers never view information in the same way as users. They make poor information designers.
Create a terminology base for your information. Product developers may use a variety of terminology but they have no responsibility to develop domain-oriented terminology that will be natural for users of the domain information. Unlike the sciences, no governing body exists that defines terminology for most domains, and therefore for most companies’ information. Usually, that’s the job of the information developers. We see information from the same company and even for the same product with multiple terms for the same part or process. Just because two product developers use different terminologies does not mean that users should have to be exposed to the same terminology confusion. Uncontrolled terminology will result in greater translation expense and translation errors as well. Information developers should not begin to design their documents and navigation until a terminology base for all languages along with a corresponding glossary is in place.
Understand the natural structure of your information. You will need to understand the information in the domain before you start designing. If you lack extensive experience in the domain information, you will need to gain insight from professionals who may also be users of your company’s information. Every information domain has a structure or structures. It is a rewarding challenge to discover them.
Design your documents and navigation based on the natural structure of your information. With an understanding of the natural structure, you will find that your information will be better designed and more accessible to your users. Though we are tempted to look for software solutions to design indexes and topic maps, a professional human who understands the domain information is still the best designer of documents and navigation.
In today’s hectic world of information development, we are pressured to meet budgetary and staffing restrictions. We have product deadlines. After all, we are here to help our companies make profits. But with a little consideration of some of the issues of ontologies and the natural structures of information, we can make our jobs more rewarding and make our information products more accessible to our users.