The Advent of the Semantic Web, Part One: Simulacrum Symbiosis
In 1980, I moved to San Francisco, where I lived for two years before returning to Los Angeles. I rented a third-story walk-up apartment and, shortly after moving in, noticed an odd pair of cabinet doors set low in the wall beneath the windowsill in the small dining room that adjoined the kitchen. When I opened the doors, I was surprised to find a small metal enclosure, dotted with half-dollar size holes, cantilevered outside the building. Later, I learned that this cabinet was a kind of urban root cellar in common use at the time the building was erected in the early 1900s. Housewives would store potatoes and other tuberous vegetables in these cabinets. The cool, moist air rolling in off the bay flowed through the ventilation holes in this otherwise dark enclosure, simulating the environment of a basement storage bin.
My grandmother kept food cold by storing it in a tightly sealed cabinet, which contained a large block of ice: an icebox. Regular deliveries of freshly cut blocks replenished the melting ice, maintaining a safe storage temperature for perishable foods. The household delivery of ice was backbreaking, repetitive work for the “icemen” who undertook it. And, it was also a lot of work for Grandma, who had to manage the whole process, keeping a constant vigil to see that the ice did not melt more quickly than anticipated and ordering extra ice if necessary.
I’ve never met an iceman, in the literal sense of the word. I grew up in a house with a refrigerator, my family’s mechanical iceman. When the temperature inside the refrigerator rose above a preset value, its thermostat activated a system of compressed freon to replenish the cold that helped to preserve our perishable food. Most of us enjoy the same convenience today. But we probably don’t think of our refrigerators as servants or smart machines. But, they are to some degree both of these things. Grandma was no dummy, but the icebox required her absolute and constant attention. One slip-up and the week’s groceries would be ruined. If she were alive today, she might not be able to break the habit of continuously checking the icebox-she’d likely never get used to calling it a refrigerator either. But if only she could be convinced to give up her assiduous practice, she could do so without putting the food supply in jeopardy, secure in the knowledge that the refrigerator is “smart” enough and servile enough to take on the job. My grandmother did actually live out the end of her days in a house with electricity, but she never really got the hang of it. She kept the holes of the electrical outlets covered with tape. When asked why, she replied, “So the electricity won’t leak out.”
The body electric
Since long before the digital and industrial revolutions, we have sought to create thinking machines, some of them in our own image. In her book Edison’s Eve (Knopf 2002), Gaby Wood writes of the twin automata of Neuchâtel, Switzerland. Since they were first exhibited in 1774, these extremely lifelike automated effigies of two young boys have faithfully performed for audiences traveling from afar to see them; one writing several lines of text using a quill pen and the other drawing flawless portraits of kings Louis XV and George III. Wood states, “…the writer… communicates to its audience an eerie philosophical joke: `I think,’ it writes, `therefore I am…’ these artificial beings have enchanted, frightened, and perplexed their viewers.1“
In spite of their amazing performances, the twin mechanical boys are not really human; in fact, they’re not even thinking machines. The little boy may appear to taunt us with his prose, but in fact, it is its maker, Pierre Jaquet-Droz, who stings us with such deliciously ironic humor. In the time of Jaquet-Droz, information was a rare commodity. Today, we are rife with information but, at times, adrift in it, bereft of understanding. Richard Saul Wurman cites in his book, Information Anxiety 2 that, “A weekday edition of The New York Times contains more information than the average person was likely to come across in a lifetime in seventeenth-century England.2“
The Internet big bang of the information age occurred less than a decade ago. We’re still struggling to discern meaning from the bombardment of stimuli coming at us seemingly from all directions at once. We develop methodologies of information architecture, structured languages, such as XML, and metadata attributes to help us coalesce understanding from the chaotic data stream. We’ve even become pretty good at manipulating content for reuse and repurposing3. But, we still grapple with the immensity of our data stores and the difficulty inherent in representing the connotative meaning of their constituent content chunks in a way that can be transmitted fluidly, lucidly, and reliably from machine to machine and person to person. And, all this we do in pursuit of the apotheosis that is wisdom (see Pursuing Wisdom).
The Semantic Web
Your refrigerator, the twin automata of Neuchâtel, and your Web browser all have one thing in common. They process containers without understanding the meaning of the content within. The refrigerator lowers the temperature of its contents, perhaps containers of sauerkraut and milk. But don’t ask it to distinguish between the two, or you’re in for an unpleasant surprise-or the next big food fad. And just as the twins process and display graphical and textual content on demand in print format, your Web browser does the same online and on paper at the click of a mouse.
When the World Wide Web came into being, with its free-form distributed and highly scalable infrastructure, critics argued that the lack of a central standardized classification scheme was its fatal flaw. In libraries, people typically look to the venerable card catalog as a context-rich resource locator to find specific information. The library card catalog is the archetypal service discovery mechanism. It provides a consistent and dependable view of and a navigational system for all of the content in the library. But card catalogs, or for that matter relational database indexes, have at their core a predefined static representation of the information environment they encompass, which makes scalability difficult. Have you ever carefully organized your music CD collection using, perhaps, an incremental numeric system only to discover later that is was nearly impossible to add new titles to the rigid organizational structure that you had defined?
When he invented the World Wide Web, Tim Berners-Lee envisioned a scalable, interconnected, and ubiquitously accessible collection of information woven together to form unfettered streams of knowledge. Achieving this vision required some compromise. At the expense of consistency, the Web enjoys untrammeled scalability. Anne Jordan-Baker of the A. C. Buehler Library reports, “According to the Chicago Tribune, as of mid-1999, there were approximately 800 million publicly indexable web pages (about 15 terabytes).5” In Internet time, a four-year-old statistic is ancient history; the size of the Web has grown significantly since 1999, keeping pace with the exponential growth in the total number of Web users. “At the end of 1995… there were about 16 million users… In early 2001 there were over 400 million; reliable forecasts point to a billion users in 2005.6” The lack of consistency in the information environment of the Web is amply compensated for in its inherent organic scalability.
But in spite of the unfathomably deep and broad collection of information, the infrastructure of the Web is a “dumb” system, little more than a refrigerator cooling its containers of sauerkraut and milk. HTML containers hold the content of the Web in place without understanding the meaning of or relationship between individual blocks of information. XML adds a degree of understanding with its extensible tags and attributes that support the use of customized mnemonic naming conventions. But XML alone does not go far enough. It doesn’t support the wrapping of data in envelopes made of relational metadata. The XML environment by this limitation flattens the rich three-dimensional information construct into a mono-planar representation.
Environments are not passive wrappings, but are rather, active processes which are invisible…. The main obstacle to a clear understanding of the effects of the new media is our deeply embedded habit of regarding all phenomena from a fixed point of view7.
Search engines continue to evolve and improve but still often serve up an unpredictable menu of selections. As a service discovery mechanism, search engines still have a long way to go in terms of providing relevant and highly useful access points to information pathways.
For example, my interest in Sir Francis Bacon might lead me to enter the word “bacon” into a Web search engine. Even using the best available search engine, such a query is likely to produce something similar to the following:
Search for Bacon returned 2,130,000 results.
Results 1 – 10:
The Oracle of Bacon at University of Virginia…. If you like the Kevin Bacon game…
Allyn & Bacon/Longman: Publishers of College Textbooks &…
ilovebacon.com-dumb but fun
Sir Francis Bacon (1561-1626) Sir Francis Bacon, Renaissance author…
Francis Bacon Image Gallery… images biography… themes… articles… newfound… photo…
Allyn & Bacon’s Sociology Links Home Page
Allyn & Bacon Public Speaking Website
The Official Bacon Brothers Fansite with the latest info on Kevin…
Davis-Bacon Wage Determinations
The Davis-Bacon Wage Determinations contained on this Web site are wage determinations…
This query returned over two million hits. Fortunately, from the first ten of them we can draw inferences as to the meaning of the referenced Web pages and deduce that the fourth item in the list is most likely the correct choice. We know this because of our ability to infer context from a variety of loosely implied clues embedded in the list of Web links otherwise denuded of meaning. But, what would happen if we asked a machine, a computer program, to make this kind of differentiated choice or worse, a series of interrelated decisions based on this kind of information?
Imagine the following scenario. David and JoAnn decide to meet for lunch to discuss the upcoming Content Management Strategies conference. Both of them maintain very busy schedules. JoAnn lives in Denver but travels on business quite often. David lives in Los Angeles but spends a lot of time in Sedona, Arizona. David also travels often, for both business and pleasure, but less frequently than JoAnn. Both David and JoAnn have specific dietary requirements and culinary preferences. David steadfastly avoids eating pork products but has decided to splurge and give in to a craving for a Bacon, Lettuce, and Tomato sandwich. His better judgment prevails in the end, though, and he opts for soy-bacon. Their travel schedules have them both crisscrossing the globe, luckily crossing paths in New York, London, and Charlottesville, Virginia. JoAnn teaches an all-day distance learning class every Tuesday, and David leads a conference call every Wednesday at noon. Now, imagine the spaghetti string of emails, voicemails, Web searches, and crosschecking it would take to arrange such a lunch meeting.
Let’s postulate what might happen if we employed an automated service discovery agent program to arrange this meeting using the unannotated, naked content currently available on the Web.
Bacon, Lettuce, Tomato, New York, London, Charlottesville, NOT Tuesday, NOT Wednesday.
David and JoAnn just might find themselves at a breakfast meeting of the “Six Degrees of Kevin Bacon Club” at the University of Virginia. They’d likely enjoy themselves at such an event but would probably not accomplish the goals they had in mind for their lunch meeting.
The Six Degrees of Kevin Bacon game, whether it is played on the Web or as an unplugged brain teaser, works by drawing inferences from well-defined data sets processed using a set of rules about the information. The Web version, while not specifically an example of Semantic Web technology, does use an algorithm of inference logic in parsing data collected from multiple film databases to connect the dots between Kevin Bacon and any other actor.
The Semantic Web, proposed by Tim Berners-Lee and the others leading this initiative, posits that the Web must be understandable to both humans and machines. If David and JoAnn had been able to schedule their lunch meeting using a Semantic Web service discovery agent, it would have used the knowledge represented in a specific ontology to produce the desired result of a lunch meeting that fulfilled all of the specified requirements. A Web page from a restaurant serving Bacon, Lettuce, and Tomato sandwiches would not be confused with pages about Kevin Bacon or Francis Bacon because the Semantic Web agent would understand the purpose and meaning of each of these pages, their sub-elements, and how they correlate to other pages on the Web and their sub-elements.
In the context of the Semantic Web, an ontology is a specification that defines the meanings of a particular set of content objects on the Web and their relationships to each other. An ontology goes far beyond XML attributes and tag naming conventions to infer complex meaning that can be retrieved and used by Semantic Web agents that browse the Web to perform complicated tasks that would otherwise burden humans whose time and attention is better placed elsewhere.
A Semantic Web ontology consists of two components: A taxonomy, which defines classes of objects, and a set of inference rules, which defines the relationships between these objects and classes of objects using a syntax called triples. Each triple expresses a specific relationship between two objects using a coded representation of that knowledge. These triples mirror English language syntax using noun and verb phrases: