CIDM Matters

CIDM Matters is an electronic newsletter published on the 1st and 15th of every month. Browse these articles published in CIDM Matters or subscribe to the newsletter by selecting the “Join Us” option on the navigation bar.

A Homebrew Reuse Analyzer

Larry Koller
August 1, 2020

Computers are good at brute-force tasks. For example, they can compare thousands of paragraphs with each other, looking for matches or near-matches without getting tired or bored.

A content developer can use the results to:

  • find and correct inconsistencies
  • create collection files for reusable text.

I built a simple reuse analyzer from existing open-source tools and code, using a script to loop through any number of topics (after stripping markup).

Behind the scenes

Reuse analysis uses a technique called fuzzy matching. In a traditional comparison, the result is always a Boolean — true or false. Fuzzy matching gives a floating-point result between zero and one, where 1 is a perfect match, 0 is no match at all, and 0.95 might be “close enough.”

For example, the following two strings are not identical, but should be in a technical document:

Click OK to close the dialog.
Click OK to close the window.

Comparing these strings returns a score of 0.93 — in other words, 93% identical.

Fuzzy matching, at least in this implementation, uses an algorithm called the Levenshtein distance. This is the number of single-character changes (or edits) — additions, changes, or deletions — required to change one string to another. The algorithm looks complex but can be expressed in less than 30 lines of code. A WikiBooks page provides implementations in many different programming languages.

Calculating the score is equally simple: if l1 and l2 are the lengths of the two strings, and d is their Levenshtein distance, the score is: (l1+l2d)/(l1+l2).

There are other fuzzy matching techniques, but I used this one as a starting point.

Preparing the content for analysis

Ideally, the content needs to be stripped of all markup. The text of one block element should all be on one line. My original thought was to write (or find) a DITA-OT plugin that would publish a bookmap to CSV, where each record would contain the file name and one block (or paragraph, if you prefer) of text.

This took more effort than the analysis script, believe it or not. After a brief experiment with a “plain text” plugin, I decided to try exporting to Markdown, a transform built-in to DITA-OT 3.1 and newer. From there, a utility called pandoc stripped the remaining markup and eliminated line-wrapping. The commands can be placed in a shell script:

dita --format=markdown_github --input=book.ditamap --args.rellinks=none
cd out
for i in *.md; do
    f=`basename $i .md`
    pandoc --wrap=none -t plain -o $f.txt $i
done
delete index.txt

A medium-sized book contains perhaps 2000 to 3000 block elements. Creating a book-of-books would be useful to look for reuse possibilities over multiple books.

The analysis script

The script is written in awk for rapid development, ease of maintenance, and maximum portability. (One can even install awk on a smartphone, but I would not recommend trying to do reuse analysis on it.) Although the original awk release was in the 1970s, the language has found a modern niche in “big data” processing applications. The entire script, including […]

Understanding Maslow’s Hierarchy of Needs for an Effective Storytelling

Anu Singh, Fiserv
July 15, 2020

Ever wondered why do we always write, “Click OK”?

This question can be answered from three different perspectives.

First, mostly because we would like to reinforce that it is safe to click the OK button to accomplish a task. Secondly, we do not build our content considering an individual’s psychological or self-actualization needs to connect with an audience. Third, we often do not realize that all content is not created equal, and it should not be for the targeted audience. […]

Unpacking Authoring Practices in DITA (Part 3)

Nolwenn Kerzreho, IXIASOFT
July 15, 2020

Some writers and specialists claim it’s too difficult to write in DITA and that the learning curve is too steep. Some other specialists and documentation managers claim that they can start new hires writing in DITA in a couple of hours. This article is the final part of a series of posts about writing practices that authors must adapt, adopt, or shed. […]

Unearthing Precious Lessons from the Hidden Gift Vase of Telecommuting

Sairam Venugopalan, QUALCOMM
July 1, 2020

Regardless of the nature of work that we might be performing in our professional lives, be it of the content-curation ilk or of the more fanciful engineering role, the vast volumes of time that we expend towards the readiness, ride, and return in the home to office trajectory has always been a throbbing area for several of us in the corporate countryside. While we might have yearned for jobs that would have enabled us to have the flexibility and cushion of not having to trek to our workplaces, be it far or at a stone’s throw from our abodes, many of our day-jobs have often necessitated physical presence at our desks and office cubicles. In certain cases, this lack of freedom might have been for logistical reasons with the organization for which we were working, while in other cases, the general mandate might have been for remote-working to be availed in a judicious way for exigencies. […]

Investing in Your Content’s Future

Sabine Ocker, Comtech Services
July 1, 2020

When migrating to a structured markup publishing environment such as DITA, most organizations feel they are well-positioned to move forward once tools and the information model are in place, and their content converted and migrated into the cCMS… but are they really finished?

Newly created content complies with the information model, and so going forward will be well aligned with the new standards and guidelines, but not the existing content corpus.

Many organizations decide to wait until after their content has been converted into DITA to do clean up. Given the pragmatic realities of software release schedules and learning new tools, some find the momentum flags for investing in their existing content. […]

Working on Far-Flung Teams (How I learned to stop worrying and love the Zoom)

Larry Kunz, Extreme Networks
June 15, 2020

Since the beginning of 2020, many of us by necessity have grown accustomed to working away from the office. Whether you use the seemingly ubiquitous Zoom or another tool, you’ve no doubt become familiar with videoconferencing and experienced both its strengths and weaknesses.

You’ve probably noticed that it’s harder to collaborate with remote teams than with teams that are co-located. It’s harder to keep everyone in sync, all rowing together to move the boat forward. Team members might struggle to share timely, accurate information. They might experience confusion and misunderstand things without even realizing it. […]

Create Exceptional Documentation using Enterprise Grade Reviews

Amanda Barfield, VMware
June 15, 2020

Prior to working at VMware as a Senior Technical Writer, most of my experience was working with start-up companies where I created a solid foundation for their documentation to grow and scale with them. One of the many things I’ve learned from VMware is how essential it is to incorporate Enterprise-Grade documentation reviews in each release cycle, regardless of company size or geographic differences. Working at a start-up is exciting, it’s thrilling, and there’s constant action, with the opportunity to build something from the ground up. The high agility allows engineering to release new features, updates, and fixes at the flip of a coin. As a Technical Writer, my biggest struggle was trying to figure out how to keep the documentation accurate and helpful to our customers, while still flowing with the agility of the start-up environment; I believe VMware Workspace ONE UEM has found a solution. […]

Unpacking Authoring Practices in DITA (Part 2)

Nolwenn Kerzreho, IXIASOFT
June 1, 2020

Some writers and specialists claim it’s too difficult to write in DITA and that the learning curve is too steep. Some other specialists and documentation managers claim that they can start new hires writing in DITA in a couple of hours.

This article is the follow-up of a series of articles about writing practices that authors must adapt, adopt, or shed.

We will focus here on the new practices that authors must adopt when writing using DITA. Of course, the trajectory and requirements for each company are different. Still, the most important hurdles for authors usually relate to these three aspects: accessibility, navigation alternatives, and writing for reuse. […]

The Road Not Taken

Pam Sheridan, anonymous company
May 15, 2020

Five years ago, we began the momentous transformation of our technical content from a monolithic document-dominated proprietary process to a DITA-standard, topic-oriented content management paradigm. The scale of the migration was not only unprecedented in our organization but also not wholly supported within an industry that rarely prioritizes investment in tech pub over software development.

[…]

Content Strategy and Self-service Support Portals — A Match Made in Heaven

Sabine Ocker, Comtech Services
May 15, 2020

Enabling user self-service is at the center of many organization’s content strategy. We know that users who are used to finding virtually any piece of information they require through the internet expect the same behavior for any product they use as well. They expect to have ready answers to their questions, step-by-step instructions to complete their tasks, and specifics for troubleshooting and resolving any issues they encounter. In fact, some research studies indicate that 90% of today’s consumers expect an organization to offer a self-service customer support portal, and that number will likely increase to 100%. Self-support portals set up an organization’s user-centric content strategy in three essential ways.

[…]