Hide and Go Seek
I spend a lot of time looking for things. I swear my car keys have a mind of their own and wander off on a regular basis. I never seem to be within 20 feet of a working ballpoint pen. I go days without my electronics because I can’t find the right charger. I spent years of my life searching for Waldo with my children.
Unfortunately, my searching isn’t limited to these little things that might take an extra five or 10 minutes out of my day. Instead, I interact on a daily basis with this nebulous entity called The Internet, gamely trying my luck at finding some small tidbit of information that I need. From directions to a friend’s house, to recipes for my evening meal, to quotes and images I might use in a presentation I’m creating, I spend an inordinate amount of time each day trying to make my trusty research assistant, Google, reveal the secrets of the world wide web. Sometimes Google spills those secrets like a busy body neighbor running from door to door, giving more information than I would ever want to know; other times it holds those secrets tight to its chest, refusing to give even a hint about what I want to know.
I’m not alone in my endless and often frustrating quest for information.
According to <www.internetlivestats.com>, “Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.” I can’t help but wonder how many of these are successful? Frequently quoted studies by the IDC report that approximately 50% of all Internet searches are abandoned1 and only 21% of knowledge workers can found the information they needed 85% to 100% of the time2. These studies are actually 15 years old, but with an estimated 10 exabytes (1018) of information being added to the internet each year3, I can’t imagine they have gotten any better.
All this leads me to the million dollar question: what can we as information developers do to help our users sort through the plethora of information available to them and find that small nugget of truth that will help them complete their task at hand, solve a nagging problem, or make a purchasing decision? How can we create an uncluttered path through the forest that takes them directly to our content, eliminating distractions and unnecessary forks in the road?
Comtech has conducted several user studies in the last six months exploring this very issue, trying to help our clients understand the search strategies of their users and how to better support these strategies. Regardless of the client, the industry, the type of product, or the experience of the users, the problem boils down to one of two issues:
- Users don’t know the correct words to use and receive no relevant search results.
- Search terms used are so common that users receive massive numbers of results.
In the first case, we have a vocabulary disconnect. We don’t use the same words as our users; our content doesn’t contain the terminology they expect. We’ve found that the most persistent users have a very methodical search strategy in which they intentionally enter a very broad term, not expecting to find a topic that actually applies to their specific problem, so they read the top five to ten results and learn vocabulary that will make their next search closer to the mark. After four or five of these searches, each of which narrows the results a bit more, they have collected a set of terms that actually lead to relevant topics.
In the second case, we have a differentiation problem. All results are considered equal and relevant, frequently with no way for the user to group, prioritize, or distinguish among them. In this case, users are absolutely relying on the relevance algorithms of the search engine to float the best results to the top; if what they aren’t are looking for isn’t in the top three or four, they abandon the search and try again.
For years we’ve been told the solution to this problem is good metadata. If we tag our information with the right keywords and metadata fields, our search engine’s relevance algorithm will move the tagged content to the top of any search results list. Furthermore, with a good search engine that supports faceted searches, users can narrow their results based on a specific metadata value. For example, an automobile manufacturer might tag its content according to the relevant model of the car. A user searching for the word car might then narrow the results to the model that applies.
Nevertheless, the problem persists even though we’ve been using metadata for years. To a certain extent, the root problem remains the same: vocabulary. We are still dependent on using terms in our metadata that match what the users are entering. Many corporations are trying to solve this problem with web analytics, tracking the terms that are used in searches with the goal of adding these terms to keywords or even into the text itself. Although this approach can certainly improve the search experience, frankly it does nothing for the authoring experience. Instead, we have a lot more work to rewrite content and add keywords to our files, which may or may not actually help the next user, who may start with a completely new term.
As I’ve explored the issue, I’ve come to believe that metadata is just a small part of the necessary larger solution. What we need to develop is a full-blown taxonomy or ontology of terms and their relationships within the context of our business. A corporate taxonomy must include:
- Key facets or categories of information, each consisting of a controlled vocabulary of values that are clearly unique. Information must be classified by one and only one value, and those values must be meaningful to our end users. Our traditional metadata typically fills this role, but often the values overlap so that it is difficult to determine which category is appropriate.
- Synonym rings, or a group of terms that are considered equivalent when it comes to locating information. For example, if a user enters “car” and the author used the word “auto”, a synonym ring ensures that the auto topic appears in search results without the need for the author to tag the content with “car” as a keyword.
- Relationships between terminology that enables users to broaden or narrow search results based on specific associations with the search word. For example, if a user enters a specific type of car, they may be able to follow a “part” association to determine what size tires their vehicle has, or a “related terms” path to find a mechanic who specializes in repairing that type of car, or a “parent” path to determine what manufacturer makes that type of car.
With a complete taxonomy underpinning the content, users enter what (little) information they know and find guidance in many forms to get them where they need to be, without the need to read lots of extraneous information to find relevant terms and without the need for the author to tag the information with lots of extra keywords. The synonyms and relationships are automatically associated with the content based on the words it already contains.
Of course, nothing is ever easy and I’ve oversimplified. It takes a lot of effort to define a complete taxonomy and tools are critical, including a search engine and likely a taxonomy management tool. However, I strongly believe that a little time and investment up front will save a lot of time and energy later. We’re finding that more and more of our clients are exploring what taxonomies can do for them. I encourage you to do the same.
Now, if I could only apply a taxonomy to help me find my keys…
1. Quantifying Enterprise Search, IDC, May 2002
2. The High Cost of Not Finding Information, IDC, July, 2001.
3. 2011 IBM Global Chief Marketing Officer Study