[about] access service… samples of what was worth getting and information where to get it. A catalog… continuously updated, in part by the users.
– Stewart Brand2
While this edition of The Last Whole Earth Catalog was not truly the last-in the intervening years and decades since its publication it would come to be updated, epilogued, and sequeled-today the World Wide Web is our Whole Earth Catalog. The Web is a vast and ever expanding source of information as well as an access point for the purchase of products. The downside wrought by this limitless digital treasure trove is the onset of information overload and anxiety, exacerbated by fluctuations in content quality.
Every day, more people connect to the Web. Every day, new Web sites and Web Logs-also known as Blogs-are posted by anyone with a computer and an Internet connection, many using free or inexpensive software to do so. And every day, the size of the global vox populi information base increases. To make sense of it, when we surf the Web, each of us must use our own judgment to determine what information is credible and personally relevant. It is our ability to discern meaning from textual and graphical objects that allows us to make these kinds of differentiations in weaving together ad-hoc webs of knowledge from the many undifferentiated Web pages of the Internet. With each surge of data, however, keeping up with the volume of information becomes more difficult.
The Semantic Web initiative proposes to make Web content equally as understandable to machines as it is to humans. See part one of this series, “The Advent of the Semantic Web, Part One: Simulacrum Symbiosis” in the April 2003 issue of Best Practices and online at <www.walske.com>. Through the use of semi-intelligent agents called service discovery programs, the Semantic Web promises to free us from much of the time-consuming labor of content evaluation that lies between us and access to services, products, and information.
Compare the stated goals of The Last Whole Earth Catalog to those of the Semantic Web, which proposes to provide us with information that, by each individual’s own personal criteria for evaluation, is deemed
- high quality
- easily available
In his book, The Innovator’s Dilemma (Harper Business 2000), Clayton M. Christensen draws a distinction between sustaining and disruptive technologies. “Most new technologies foster improved product performance. I call these sustaining technologies. Some sustaining technologies can be discontinuous… while others are incremental…. Occasionally, however, disruptive technologies emerge: innovations that result in worse product performance, at least in the near-term.3” Many current mainstream technologies began as disruptive technologies exerting a chaotic effect upon the markets into which they emerged. Digital cameras and cell phones are two notable examples.
The Semantic Web is indeed in the earliest phases of development and may not see broad application for several years. This is, however, not meant to imply that the Semantic Web will remain cloistered for long. The proponents of the Semantic Web envision it as a sustaining technology with revolutionary impact. Its implementation is to be incremental and gradual, a process of intertwining with and strengthening existing Web infrastructure. Compare this to the symbiotic relationship between the rainforest lianas, a vine-like plant that lacks a stabilizing root structure, and the trees upon which it relies for support to reach sunlight far above the jungle floor. These lianas plants are a part of the rainforest’s upper canopy, a web of growth that forms a stabilizing cross grid, bolstering the underlying infrastructure of trees against the sheer-force of prevailing winds and tropical storms.
XML (eXtensible Markup Language), now an omnipresent buzzword echoing through the corridors of business, is a disruptive technology. The software industry has been scrambling to update existing tools and techniques to embrace XML, sometimes with less than perfect results. XML, as a practical technology, is currently in a precarious state of transition. A state described by Geoffrey A. Moore in Crossing the Chasm (Harper Business 1991) and in Inside the Tornado (Harper Business 1995). Moore references the well established model of the Technology Adoption Lifecycle, originally developed in 1957 at Iowa State College, which divides the technology market into a continuum of five distinct segments: innovators (enthusiasts), early adopters (visionaries), early majority (pragmatists), late majority (conservatives), and laggards (skeptics).
Modifying this model somewhat, Moore identifies a separation in the bell curve of this continuum, which he describes as “the deep and dividing chasm that separates the early adopters from the early majority4” over which new technologies must negotiate safe crossing to survive and prosper. (See Figure 1.) XML is rapidly advancing toward early majority group status; the pragmatists have heard the buzz and have been swept up into the tornado. The Semantic Web currently rests somewhere between innovators and early adopters. And that is exactly where it should be right now. The ultimate success of the Semantic Web depends upon a standardized and stable, widely installed base of XML and related technologies, which have yet to fully emerge.
Figure 1. The Technology Adoption Lifecycle
The construction of the intertwining Semantic Web, already well underway, makes use of existing but nascent technologies such as XML, Resource Description Framework (RDF), and one or more sets of ontological vocabularies, such as those of the Dublin Core Metadata Initiative and the World Wide Web Consortium (W3C) Ontology Web Language (OWL). These are all established but still developing technologies.
“Hey, wait a minute. Wasn’t XML supposed to be the answer to all the ills of the HTML-based Web?” Yes and no. More directly, it’s a good start. XML exemplifies a return from the gold-rush chaos of HTML to the order of a structured standardized markup language. Both HTML and XML are subsets of SGML (Standard Generalized Markup Language). HTML went a bit astray and, in fact, was never originally intended to serve some of the transactional business functions that are possible with XML. For more information on markup languages, such as XML, HTML, and SGML, see the article, “Tools of the Trade: Part Three, XML Transformation” in the April 2002 issue of Best Practices and online at <www.walske.com>.
The strictness of XML syntax provides the predictability required for dependable machine processing of Web-based information. Conversely, the flexibility of XML’s extensible tag set makes it a much more suitable language for parsing information by type than HTML’s limited hierarchical but otherwise largely non-descriptive tag set. Unlike HTML, XML allows you to adopt a schema for your documents that defines whatever set of tags you find most appropriate for your data and for the way in which you interact with it-both directly and programmatically. XHTML is a sort of hybrid language that applies XML’s strict rules of syntax to HTML’s pre-defined tag set.
XML allows the use of tags and metadata attributes as necessary to better describe the content of individual data elements. In Figure 2, below, the tag names in the XML code fragment sample clearly indicate the nature of the content within. The tag pair:
<introduction> </introduction> tells us a lot more than the standard HTML tag pair:
<p> </p>. Of course if I did choose to retain the standard HTML tag pair, I could always add an attribute to describe the content:
<p content = "introduction"></p>.
In either case, as a human reader of this code, I understand the meaning of the tags and attributes and can quickly determine which paragraph contains introductory information. For that matter, viewing this information as a rendered Web page, without even inspecting the code, I can surmise the nature of the content. I do this by interpreting the meaning of the English language sentences and making inferential judgments as to the placement of this paragraph in relation to the other objects that precede or follow it. When I use my computer to process this information in some way, the program that reads and processes the code follows the set of rules predefined in my schema.
But what happens when I need to transfer information from my computer to yours? Because my schema may not be suitable for your data, it is entirely possible that you may have defined your own. This would make it difficult or even impossible for me to transfer data from my program to yours. Maybe instead of the tag pair:
<introduction> you use the tag pair:
<intro> </intro>. How is your computer to know which of the chunks of content sent from my computer is introductory information? Maybe instead of the tag pair:
<author> </author> you use the tag pair:
<creator> </creator>. How is your computer to know which of the chunks of content sent from my computer contains the author’s name? At the very least, a transfer of information would require some kind of translation layer to reconcile the differences between our two unique schemas. Multiply this complexity by every possible schema combination in the world and the impracticality is obvious.
RDF provides a code-based linguistic framework for describing metadata. RDF does not specify a specific vocabulary of metadata but rather a syntax for expressing and interchanging it. In part one of this series (see “The Advent of the Semantic Web, Part One: Simulacrum Symbiosis” in the April 2003 issue of Best Practices), we posed the hypothetical task of arranging a business lunch between two very busy and mobile business travelers. To make the necessary arrangements in person was clearly a waste of time and human potential. But relying on some kind of hypothetical service discovery agent operating on the Web as it exists today appeared likely to be futile. No matter how intelligent the agent program, given the current state of the Web, it would be impossible-or nearly so-for a programmatic process to understand the meaning of many of the textual and graphical elements on the Web that would be required to complete the task successfully.
In the code sample presented in Figure 2, simply determining who wrote the article, as described in the previous section, might prove to be a difficult task. In contrast, RDF syntax provides a reliable and scalable interchange framework for transmitting this information within a heterogeneous network environment. This syntax is based on three core components:
Figure 2. XML Code Fragment Sample
- Resource-Any object identified with a URI (Unique Resource Indicator). URI examples include a Web page address such as: http://www.walske.com as well as sub-page elements, such as a graphic or an individual XML element.
- Property-Based on a specified ontological vocabulary, a property is a description of the purpose or meaning of a resource.
- Statement-A construction that makes a declaration about a resource.
The sample RDF/XML code fragment in Figure 3 uses an RDF triple to declare the author of the article in machine addressable code using a standardized syntax and vocabulary. It states that “David Walske” is the “Creator” of the document article.xml located at “http://www.walske.com.” This can be transmitted unambiguously from one computer system to another because both the sending and receiving computer programs are able to use the same syntax and a specified standardized vocabulary set to understand the statement.
Figure 3. RDF/XML Code Fragment Sample
As stated previously, RDF provides only the framework for describing resources. Just as human language requires a specific vocabulary of words to communicate within a defined syntactical structure, RDF statements require a specific ontological vocabulary for machines to understand and communicate meaning. There are any number of possible vocabularies that can be used with RDF. Several initiatives have been underway for some time to develop relevant collections.
The Dublin Core Metadata Initiative (DCMI) began in 1995 with a workshop held in Dublin, Ohio. Since that time, the DCMI has continued as an organization dedicated to developing an interoperable core of metadata standards and specialized metadata vocabularies. The DCMI works in cooperation with many standards groups including the W3C’s RDF working group.
The W3C Ontology Web Language working group is responsible for the development of OWL (Ontological Web Language). The OWL vocabulary supports the requirements of RDF and the Semantic Web. OWL includes three sub-languages: OWL Lite, OWL DL, and OWL Full. The OWL Lite subset provides minimal classification hierarchy and simple constraints. OWL DL provides a deeper expressiveness while maintaining its usefulness as a programmatically resolvable vocabulary. The full OWL vocabulary provides maximum expressiveness but at the loss of computational guarantees.
Keys to Successful Content Management
By embedding intelligence into the data rather than building it into the agent program, the Semantic Web initiative seeks to distribute the complexity of the agent task. And through the use of a common syntax that references multiple standardized vocabularies, the Semantic Web seeks to guarantee interoperability among highly disparate data sources. Imagine building software-user assistance documentation that is dynamically customized to suit the specific needs of each user scenario by drawing reliably upon the full resources of the Web. The Semantic Web, while at present still in the early stages of development, promises to become an important part of information delivery and a key to successful content management in the near future.
About the Author
1 The Last Whole Earth Catalog, Stewart Brand editor. (Random House 1971), ISBN: 0394704592 p.1
2 The Last Whole Earth Catalog, Stewart Brand editor. (Random House 1971), ISBN: 0394704592 p.439
3 The Innovator’s Dilemma, Clayton M. Christensen. (Harper Business 2000), ISBN: 0066620694 p. xv
4 Crossing the Chasm, Geoffrey Moore. (Harper Business 1991), ISBN: 0066620694 p. 20