A Baseline Reuse Productivity Metric for DITA Writers and Managers
We all need a firm baseline for technical documentation metrics among writers, writing groups, business units, and companies, especially when dealing with reuse in DITA.Ftable
Reuse is the primary early and continuing benefit of DITA because DITA makes reuse easy to do. The DITA Maturity Model1, even at level two of six, assumes writers have the unrestricted capability to reuse information.
Reuse has a profound effect on technical writing productivity. However, any metric of productivity can make writers uncomfortable. There is so much variation in difficulty authoring and researching that technical writers may see any metric as capricious. If my target is to maintain 1,000 topics and write 200 new ones every year, but I only manage 500 and 100, does that make me a better or worse writer? What do topics have to do with my job? Is it OK to miss my target through increasing reuse? For the manager of technical writers, or independent contractors, a more serious problem is senior management; the customer and the budget prime often have the same concern. Is the metric presented sufficient to cost, track, and evaluate the performance benefits or losses due to DITA over a period of years?
Recently, our company, Samalander-OS Ltd., dealt with exactly these issues. We have made significant progress in solving the metrics problem in our customer, management, budgeting, and writer context. This article describes how DITA and the idea of “Words-Counted-Once” and “Published-Words” allows us to produce numbers that satisfy the writer, our manager, senior management, development groups, and related business silos simultaneously.
Our methodology is substantially the same as others, especially the third methodology, Percent Repository Words Reused in Context or PRWRC, presented in “What is the Best Metric to Measure the Success of Your Reuse of DITA Topics?”2 by Bill Hackos. But, we believe, ours has wider applicability for four reasons.
- First, it is easily applied and understood by small groups and contractors as well as much larger organizations.
- Second, it is applicable at all levels of the DITA Maturity Model.
- Third, it includes a well-defined concept of “repository” that is easily tied to the budgetary process and project management.
- And fourth, we have developed software that supports the methodology for groups still operating in the file system rather than a CMS. For more information see the Samalander DITA Metrics module, part of the Samalander Software Center at <www.samalander.com>. CMSs may also be supported if they can easily export all the content for a specific project or deliverable.
Why Focus on Words?
On the face of it, “What do Technical Writers do?” seems a silly question. But, a brief review of the literature related to DITA and technical writing metrics on the web shows that there are few references to “writing” or “words” when it comes to DITA metrics.34 It seems that most approaches to creating metrics for technical writing ignore “writing” and “words” and instead tend to focus on metadata such as “topics”, “pages”, or “procedures”, data that are at least one level of abstraction away from our actual activity.
The historical and technical reason for this abstraction is easy to understand. Take any large book written in DITA and save it to PDF, Microsoft Word, and text. Save the PDF and MS Word to text as well. Now do a word count of all these outputs in FrameMaker, MS Word, and your favorite text editor. You will find that these word counts can vary by 20 percent or more. Differences may include, for example, refusing to count words in tables or counting words in headers.
Because of the difficulty in counting words5, we—managers in the technical writing industry—have been forced to count abstract characteristics. With DITA, some simple tools, and a solid methodology, metrics based on abstract categories are no longer required.
DITA, Reuse, and Words-Counted-Once
To understand the concept of Words-Counted-Once, let’s consider a concrete example: a five page, three topic, 1,000 word “chapter” telling the user how to contact technical support.
If we have a suite in which the support “chapter” appears in each of 50 titles (also known as books, maps, Web threads, help files, collections, and so on), then reusing that one “chapter” will reduce maintained page and word from 250 pages (five pages x 50 titles), 150 topics (three topics x 50 titles), and 50,000 words (1,000 words x 50 titles) to five pages, three topics and 1,000 words.
This reduction, no matter how it is measured, is an obvious benefit to the company! It’s not every day the technical writing team can reduce any portion of their workload by 98 percent (1,000/50,000 = 2 percent).
Detailed Look at the Reuse Example
In order to make our calculations easier, let’s assume that every “chapter”, just like our Technical Support example, is 5 pages, 3 topics, and 1,000 words long. In addition, let’s say there are 10 chapters in every “book”. This gives us the numbers in Table 1.
These baseline numbers are not very informative. Let’s take reuse into account by counting everything only once. If we do this, the table changes as shown in Table 2.
Percentage Reuse Defined
The definition of reuse we are going to use is:
x 100 = Percentage Reuse
This equation can also be written as:
/ “Published-Words” x 100 =Percentage Reuse
Substituting the numbers Table 2: Taking Reuse into Account we get:
(1-451,000 / 500,000) x 100 = 9.8% or
(500,000- 451,000)/500,000) x 100 = 9.8%
Before going further let’s be very specific about what the terms in this equation mean.
This is the number of words in Published-Words minus reused words.
In practical DITA terms, these are all the words in elements that are CONREF’d (or otherwise imported) into Published-Words or referenced in DITAMAPs but counted only once. So, if we have a warning of 25 words in an element, and that element is CONREF’d in Published-Words 10 times, we count 25 words, not 250 words.
These are all the words that the customer will see in the project deliverables.
These words are not the same as all the words in, say, a PDF or online help file generated from the source content. Why? Because these words are only counted in the source DITA files. They do not include headers and footers, tables of contents, tables of figures, indexes, or other words generated during production. This definition guarantees that the number is not changed by the output. If this year we are producing PDFs and next year we are producing eBooks, what is counted will not change.
Published-Words and Double-Counting
It is very important to note that Published-Words are themselves only counted once. Because DITA is so well adapted to producing multiple outputs, Published-Words must be related to only one deliverable. The OASIS DITA standard case makes the reason why very clear. There is a certain (enormous!) effort devoted to authoring the standard. However, once authored, the content is directed to several publishing pipelines including, at the least: raw DITA files, HTML, and PDF. Counting the published words in all three formats may be relevant to a production department, but is misleading from an authoring point of view. Authoring effort generally does not go up if we decide, for example, to add Eclipse-Help as one of the output formats.
So, Published-Words can also be expressed, more verbosely, as Published-Words-Counted-In-One-Deliverable.
The clear exception is of course the minimal amount of work that may be required to produce a particular output. Samalander has handled this problem by assigning an arbitrary level of effort to a particular output that duplicates content found in another output, for example, counting all 100,000 words in a PDF book, but assigning a word count of 1,000 to the same content also published in JavaHelp.
Words-Counted-Once, Published Words, and Repositories
A repository is a place or a system where all your source files for a project are stored. This could be a CMS, a CCMS, or the file system.
Words-Counted-Once may be identical to the total number of words in the repository but usually is not. Words-Counted-Once is a subset of all the words in a repository (in a CMS or file system), specifically the subset related to Published-Words for a specific project or deliverable.
Another way of looking at the issue is that we are using Published-Words as the definition of the relevant-to-project-budget repository words.
“1-” and “x 100”
Because of the way Published-Words and Words-Counted-Once are defined the ratio between the two can never be higher than 100 percent (1) or less than 0 percent (0). In our technical support chapter example, if all 50 titles consisted of just the technical support chapter, all Published-Words would be 50 titles x 1,000 words, or 50,000 words. But, Words-Counted-Once would be the original chapter, or 1,000 words. So, our formula would work like this:
(1-1,000 / 50,000) x 100 = (1-0.02) x 100 = 0.98 x 100 = 98% reuse
This intuitively makes sense. As reuse goes up, workload goes down, all other factors remaining equal.
Words-Counted-Once, Published-Words, and Budgets
Words-Counted-Once is directly related to Published-Words because we only count words that are published. Reuse is defined as reuse within the complete set of Published-Words.
This is easily tied into normal budgeting processes. Funding is almost always provided according to deliverables tied to products. If the writing team supports more than one product or product line, or for contractors, supports more than one customer, the writing team budget is actually composed of an estimated cost of delivering (or publishing) words the customers of each product or product line will see. This is also how engineering and manufacturing budgets are calculated. Having the writing group follow the same convention facilitates discussion.
But I Like My Metrics!
Does Words-Counted-Once mean you have to redo all your metrics? Not at all. Whatever the productivity metrics you are currently using, even topic-based metrics, valuable work and creativity has been poured into the numbers you generate. The long-term benefit of Words-Counted-Once is that it can be integrated into other measures of productivity to increase their robustness or extend them by including an accurate baseline-reuse view of productivity.
For example, in the varying number of topics example explored in the box Reuse Measured in Topics or other Abstract Categories, we could use Words-Counted-Once to take into account that increased reuse tracks with increased chunking of information into topics—a necessary step in moving up in the DITA Maturity Model.
The key point here is that Words-Counted-Once is a robust measurement of reuse activity in a suite that can then be refined or retrofitted to existing metrics. Unlike many other metrics, Words-Counted-Once provides an invariable basis for measuring productivity with no cultural or business context assumptions and no abstraction from the actual day-to-day activity of technical writing. Because of this, it provides a solid basis for refinement and customization.
Many of us have gone through the pain and suffering of developing a business case for introducing DITA to a business unit or a whole corporation. Many of have won that battle, introduced DITA, and demonstrated the benefits. But what of the future?
Figures 1 and 2 show how Words-Counted-Once can be used to help management understand the benefits achieved and desirability of future investment in writing technologies.
As reuse increases, the number of Published-Words increases dramatically. Perhaps we could achieve even better results by moving up in the DITA Maturity Model?
Having well defined and very restrictive baseline metrics such as “Words-Counted-Once,” “Published-Words,” and “Percent Reuse” establishes a firm baseline for technical documentation metrics between groups, business units, and companies. Furthermore, it is easily applied by any size of group or independent contractor, at every level of the DITA Maturity Model, and is independent of the technology used for managing DITA content.
Most importantly, it is easily understood both by writers and senior managers over time or even across group and company boundaries, and so provides a basis for discussing reuse and productivity with all stakeholders.
About the Author:
Peter Fournier is the ex-manager of technical documentation tools, web publishing and online help for the Data Division at Nortel. Currently he is an active technical writing freelancer and the president of Samalander-OS Ltd. Samalander publishes information and software tools that enable least-cost implementations of DITA for small groups, small to medium size businesses, and freelance technical writers.
1. “DITA Maturity Model” by Michael Priestley, IBM and Amber Swope, JustSystems, 2008 <http://na.justsystems.com/files/Whitepaper-DITA_MM.pdf>. Also here as a Wiki page submitted by Bob Doyle, 2008 <http://dita.xml.org/wiki/introduction-to-the-maturity-model>.
2. “What is the Best Metric to Measure the Success of Your Reuse of DITA Topics?” by Bill Hackos, CIDM Information Management News June 2008, as of June 8, 2012 <https://www.infomanagementcenter.com/publications/e-newsletter/june-2008/reuse-of-dita-topics-what-is-the-best-metric-to-measure-the-success-of-your-reuse-of-dita-topics/>.
3. For examples see the following: “The Illusive, Writing Productivity Metric: Making unit cost a competitive advantage” by Mike Eleder, CIDM Information Management News February 2011, as of June 8, 2012; <https://www.infomanagementcenter.com/members/pdfs/reprints/BP2011-02Eleder.pdf>; “DITA Metrics: Similarities and Savings for Conrefs and Translation” by Mark Lewis, <http://dita.xml.org/resource/dita-metrics-similarities-and-savings-for-conrefs-and-translation>; “DITA Metrics: Reuse Strategy and Savings Trend With Warehouse Topics” by Mark Lewis, <http://dita.xml.org/resource/dita-metrics-reuse-strategy-and-savings-trend-with-warehouse-topics>; “Measuring Productivity” by Pam Swanwick and Juliet Wells Lecken 2010, as of June 8, 2012, <http://intdev.stc.org/2010/09/measuring-productivity/>.
4. An exception is “DITA 101: Fundamentals of DITA for Authors and Managers, Second Edition” by Ann Rockley, Steve Manning, and Charles Cooper with Mark Lewis. Although it does mention reuse metrics, it does not go into detail about how to calculate reuse. However, overall it seems a good match for small groups in large corporations, SMEs, and independent contractors. <http://www.amazon.com/DITA-101-Ann-Rockley/dp/0557072913>
5. For more information on this issue see professional technical translation sites. They are all very specific on word counts and, from my brief survey of sites, rely on their own toolsets for counting words.