Linking in Reusable Content with Soft Linking
If you create content that is intended for reuse, you are familiar with the problem posed by links.
Let’s say you have a topic, Widget Overview. In this topic, you make reference to cranking widgets. Naturally, you have a topic on how to crank widgets, so you would like there to be a link from the reference to cranking widgets in the Widget Overview to the topic Cranking Widgets.
So far, so good. Except that you also want to use the Widget Overview topic in your marketing material, and the marketing literature, naturally, does not include detailed task information about cranking widgets. The Cranking Widgets topic is not part of that content. If you reuse the Widget Overview topic in the marketing literature, that link to Cranking Widgets will be broken.
There can be other problems as well. Let’s suppose you have two different tools that can be used to crank widgets, a GUI tool and command line tool. You want to include the Widget Overview topic in both the Widget Workbench User Guide and the Widget Command Line User Guide. Each of those guides has a topic on cranking widgets. One covers cranking widgets from the GUI interface and the other cranking widgets from the command line. When Widget Overview is included in the Widget Workbench User Guide, you want it to link to Cranking Widgets in Workbench. When it is included in the Widget Command Line User Guide, you want it to link to Cranking Widgets on the Command Line.
Now you have one topic, but three different linking strategies depending on where the topic is to be used. What to do?
There are a number of possible solutions to this problem. One that is frequently recommended is simply to remove all links from content that may be reused. Removing the links would allow you to use Widget Overview unmodified in all three locations, but it would make it harder for the reader to find information on cranking widgets. As I have argued on my blog, <http://everypageispageone.com/2011/06/25/findability-the-last-mile/>, links are the last mile of findability, and without them, readers may have difficulty finding the information they need.
Another potential solution is to use conditions:
<p if=”marketing”>Cranking widgets correctly is important to ensure safe operation.</p>
<p if=”gui”><xref target=”crank-widgets-gui“>Cranking widgets</xref> correctly is important to ensure safe operation.</p>
<p if=”cli”><xref target=”crank-widgets-gui”>Cranking widgets</xref> correctly is important to ensure safe operation.</p>
(By the way, none of the examples in this article is based on a particular markup language. The techniques discussed can be implemented in a variety of ways. The examples I use here are simplified to illustrate the principles.)
There are a couple of problems with using conditions. First, conditions can be hard to manage, and they always create overhead for the writer. Secondly, the content is not reusable unmodified in a new situation. If you want to reuse it in a new situation that is not covered by the current conditions, you will need to edit the content to add the appropriate conditions. Add up all those edits over a documentation set, and you have a lot of overhead to deal with.
A third possibility is to externalize the links. Rather than creating the links in the document itself, you create a separate document to express the links:
<link from=”widget-overview” to=”widget-cranking-gui”/>
… <!—other links –> …
This approach is supported in the little-used XLink standard and by DITA via maps and reltables. DITA 1.2 also supports indirect linking, which lets you locate the links in their proper place in the text, but they still require an external map to resolve the link destinations. This approach also creates overhead. A separate map must be created for the links for each topic, each time it is reused. Again, this means that the topic is not really reusable unmodified, though here the edits required are to an external mapping file, not the topic itself. Also, this approach means you now have more objects to manage. You not only have the topic files, but the map files as well. This problem is only compounded when you try to manage content from one product release to the next.
None of these solutions is particularly satisfactory. The first removes links altogether, while the other two create additional overhead to manage links, which in practice means that you will be forced to create fewer links—there are only so many hours in the day.
What you need is a method that lets you maintain all of the useful links (and ideally create more) without creating additional overhead. Fortunately, there is such an alternative. It is not new, in fact it has been in use for over 20 years in any number of SGML and XML projects. As far as I know, in all those years, nobody stopped to give the technique a name. I’m going to give it a name—I call it soft linking.
Soft linking is very simple. Instead of writers creating a link to a topic, they create a reference to a subject:
<p><task>Cranking widgets</task> correctly is important to ensure safe operation.</p>
This markup simply records that the words “Cranking widgets” are a reference to a task. It is not a link to another topic, either directly or indirectly. Marking up a reference to a subject does not require the writer to locate a target topic or even to know whether one exists. They know they are making a reference to a particular subject—in this case, the task of cranking widgets—and that is all they need to know.
Locating a topic with information on cranking widgets is no longer the responsibility of the writer of Widget Overview. It is now the responsibility of the publishing system, which will look at the set of topics in a particular document and find the right topic to link to. I’ll talk about how the publishing system does this in a moment.
How Soft Linking Solves the Reuse Problem
Postponing the creation of links to build time neatly solves the reuse problem:
- When the Widget Overview topic is used in the marketing material, there is no topic that covers widget cranking and so no link is made. The text of the reference is simply printed as plain text.
- When the Widget Overview topic is used in the Widget Workbench User Guide, the system finds a topic on cranking widgets—the topic Cranking Widgets in Workbench—and creates a link to that topic. The other widget cranking topic, Cranking Widgets on the Command Line, is not in scope and so no link is made to it.
- When the Widget Overview topic is used in the Widget Command Line User Guide, the topic Cranking Widgets on the Command Line is in scope, and the link is made appropriately.
In each case, a link is formed to the correct topic because only the correct topic is in scope and available to be linked to. There is no need for conditions in the source topic or for external maps to create the links.
Making It Work
So, how does the system figure out which topic to link to? Simple: we index the topics. For instance the topic Cranking Widgets on the Command Line would have an index something like this:
When the build scripts find the reference <task>cranking widgets</task> in the Widget Overview topic, it looks in the indexes of all the topics in the current document and finds a reference of type “task” with the key “cranking widgets” in the index of the topic named “crank-widgets-cli”. It then creates a link to that topic.
What If There Are No Matching Topics?
It is possible that the build scripts will not find any topics to link to for a particular reference. When the Widget Overview topic is used in the marketing material, we don’t expect a reference to cranking widgets to link to anything. This means that soft linking references must be created in such a way that the text still makes sense if there is no link. You can’t say things like “click here” or “for more information, see”. The topic still has to read correctly even if no link is formed.
There are a number of reasons why a reference may fail to resolve to a link. Some of these failures can tell us valuable things about our content.
- A reference may not resolve because there is no topic covering that subject. This can tell us that we are missing a topic we should have. It could also tell us that the topic is mentioning a subject that is out of scope and should not be mentioned. Either way, we have learned something that we can use to improve the docset.
- A reference may not resolve because the author has used an incorrect term. This alerts us to a terminology issue. It could also be that an incorrect term was used in a topic index. Either way, we have discovered the terminology issue and can now fix it.
- A reference may not resolve because the author of the topic on the subject in question has not indexed it correctly. This situation tells us that the topic indexing needs to be fixed.
What If There Is More Than One Link?
What if there is more than one topic in scope that is indexed for the task of cranking widgets? This condition could be an error. It could mean that there is a topic in the document that shouldn’t be there. Perhaps both the command line and workbench versions of the widget cranking topic have found their way into the Widget Command Line Guide. In this case, soft linking has once again helped us to discover a problem in the document.
What If the Reference Term and the Index Terms Don’t Match?
In some cases, there will be more than one form of a word or more than one word that an author could use to make a reference to or to index a topic. This is not a problem for most references, because most of the time you will be making references to things that have proper names, like wizards, or functions, or configuration settings. But for a task, such as cranking widgets, a reference could be phrased “cranked widgets” or “crank widgets” or “cranking widgets”. You will need a way of telling the system that these phrases are equivalent. Many approaches to this problem are possible, including the use of taxonomies and topic maps, but for many systems a simple list of synonyms will suffice.
The Benefits of Soft Linking
Soft linking has a number of benefits besides making it easier to reuse content.
- Soft linking is less expensive than conventional hard linking even in content that is not reused. To create a hard link, the writers have to search the repository for the best topic to link to each time they want to create a link. The cost of this search increases with the size of the repository, and its success also depends on how well the writers know the repository and the material it contains. But with soft linking, the writers only need to know what subject they are writing about.
- You can make a reference to a subject before a topic describing that topic exists. Not only does this simplify the authoring task, it can also be a mechanism for discovering missing topics in your topic set.
- Because soft linking does not require the writers to stop what they are doing and look for topics to link to, it helps the writers to remain in a state of flow, which means the writers will be both happier and more productive.
- Because topics to link to are identified by their indexes, it is the people who write topics on a subject who decide which are the best topics on a particular subject. In other words, the links to a subject are chosen by experts on that subject, and that leads to higher quality links.
- Because both reference markup and topic indexes express inherent properties of the topics they belong to, they are stable: as long as the subject matter itself does not change, there is no need to change the markup. This stability can mean that topics have to be edited less often than they would be if they contained hard links or conditions. This improves authoring efficiency and reduces content management overhead.
Analecta Communications Inc.
Mark Baker is the owner of Analecta Communications Inc., a communications consulting firm in Ottawa, Canada. His former positions include Manager of Information Engineering methods at Nortel and Director of Communications for OmniMark Technologies. Mark has written and spoken extensively on structured writing and markup solutions. He is co-author of HTML Unleashed, 2nd Edition and author of Internet Programming with OmniMark. He blogs at everypageispageone.com. He can be reached at email@example.com.
Can you do soft linking with DITA?
Many organizations are doing structured writing with DITA at the moment, so it is natural to ask whether you can do soft linking with DITA. The answer appears to be no, yes, and maybe.
No: you can’t do soft linking with DITA, because DITA does not include support for soft linking. As far as I know, you won’t find soft linking support in the DITA open tool kit or any of the commercial DITA packages.
Yes: you can do soft linking with DITA, because DITA is an open system and you can specialize your DTDs to support reference markup and topic indexing, and you can write script specializations to include soft linking support in your docs build.
Maybe: you may be able to add reference markup and index markup to your DITA system to enable soft linking, but there are issues to consider. Soft linking works best with what I call Every Page is Page One topics. That is, topics that are designed to be complete and self contained, topics that can be the first and perhaps only topic a reader needs on a particular subject, and can be reliably indexed to indicate which subjects they cover. But in many DITA implementations, topics are often much smaller than this, and if every topic were separately indexed for the subject it covered, a reference might resolve to link to a dozen or more topics, which is clearly too many to be useful. If your topics are like this, you may need to figure out another strategy for choosing the units to be indexed for soft linking.