Edwina Lui, Kaplan Publishing

As Information Architect for Kaplan Publishing, I have led two separate efforts to specialize a DITA-based content model for Kaplan test preparation content. Attempts to use the first iteration of our customized model surfaced requirements and challenges we hadn’t considered in our initial specialization effort and led us down a second specialization path (in November 2011) of both more granular specialization and greater flexibility—a combination that seemed at first to be paradoxical.

Our Goal for DITA Adoption

In late 2009, like many companies in the non-technical publishing space, Kaplan Publishing found itself at the crossroads of older, delivery-specific content development processes and evolving customer expectations, particularly in regards to more interactive digital offerings. The message we’d heard from both competitors and partners in both the publishing and test prep industries was standardization, standardization, standardization. To that end, we began, just two years ago, to look for ways to streamline and consolidate our content and product development processes, including standardization of content authoring, authoring towards multichannel delivery, and standardizing products around repeatable structures.

In particular, we looked at tech docs for inspiration, eventually deciding on DITA as the base architecture around which we would standardize our content. Structuring content and deliverables as modular, reusable chunks of information seemed ideal for Kaplan’s content; if not tailor-made, then ready-to-wear with just a few adjustments. By January 2011, after analysis of a subset of our existing unstructured content, we had a Kaplan-specific DITA model with some specializations around the assessment model, but without any major deviations from DITA 1.2. Sticking with more general structures, largely as defined out of the box, would, we felt, give us the flexibility to support new content not surfaced in our initial analysis.

The Problem of Standardization

As we began converting legacy content and experimenting with authoring new content, we quickly found that our flexible, adaptable content model was, in fact, not. In hindsight, it seems clear now that in our eagerness to adopt an XML architecture to assist in our standardization efforts, we had not adequately defined the levels at which we would attempt to standardize, and in so doing, attempted to standardize where it wasn’t possible (see Figure 1).

Figure 1: Opportunities for Standardization


Speaking in generalities, opportunities for standardizing content are many. For example, you might choose to standardize the individual components of a document, ensuring that content creators author semantically unique and precisely defined types of content within a specific document. Then, there is standardization of the document structure itself. Moving further up the content ladder, you may standardize the way in which those documents are aggregated, at both sub-product and product levels.

In our implementation at Kaplan, we assumed that we could achieve a measure of standardization at each of these levels, normalizing currently variable structures and semantic definitions of content to a small number of generalized structures. Our greatest perceived advantage in this effort, as opposed to publishers of long-form trade fiction and nonfiction, was Kaplan’s near-complete ownership of content published in our products. However, while Kaplan owns the majority of the content authored for its products, we do not own the way in which that content is structured.

As a company dedicated to providing comprehensive and effective test preparation materials to students, one of our mandates is to simulate, as closely as possible, a test-like experience, in order to better prepare students for test day. In other words, not only do we strive to simulate the look and feel of a test, as defined by the test makers, but we also must support and author content within the structures defined by the test makers. Kaplan Test Prep and Kaplan Publishing provide study materials for approximately 90 different standardized exams, designed and maintained by a multitude of educational testing and assessment organizations, each of which may structure tests, test sections, and even individual question items uniquely from one another. In a content model that assumes the ability to normalize like content structures, mandatory support for structural variation quickly became a difficult challenge.

Rethinking Specialization Strategy

In response to this requirement, we reconsidered several pieces of our specialization strategy:

  • Levels at which Kaplan could normalize content structures
  • Identification of truly common semantic units
  • Ideal granularity of content items

Because assessment structures are defined by bodies outside Kaplan, it became clear that our opportunities for content standardization were largely limited to normalization of our product structures and, to some extent, high-level test structures. Deeper analysis of the components of different tests surfaced multiple variations on what we had assumed were static question types, such as the single select multiple choice question. Different tests presented single select questions in highly modular ways, invalidating the OOTB DITA definition of a multiple choice question as question text, answer option group, and explanation text. Therefore, normalization of question content to a shortlist of structured interaction types was impossible.

Instead, we were able to identify additional semantic granularity within these base components, and found examples of those semantically unique components both within standalone question items and as shared components in question sets. For example, though an answer option group might be found within a question item or outside a question item as shared content, it is still semantically an answer option group, with the similar presentation and interaction requirements.

Given these findings, we refocused our efforts, not on defining common structures at the sub-test level, but on defining common semantic units that could be assembled in multiple ways, as required by the different test makers. We determined that the truly portable units of content were not necessarily whole questions, but the individual pieces that make up a question. So, instead of coding the structure of a multiple choice question into a DITA topic, we defined a question—any question—as a flexible collection of common components, captured as specialized question component topics organized by a generic question map.

Flexibility through Granularity

In the second iteration of the specialized Kaplan DITA model (completed just this month), we created these specialized maps and topics to chunk our assessment content at a much more granular level. Essentially, we based our model, not on building blocks, but on minute grains of sand. Doing so has afforded us much greater flexibility in supporting new and existing structures whose definition is owned by other organizations (see Figure 2).

Figure 2: A More Granular Assessment Model


Of course, more granularity can—and, in this case, does—lead to greater complexity in content management and content authoring. It should be noted that this is just one strategy for supporting flexibility. For many organizations, the costs of managing and authoring a highly granular XML architecture may outweigh the benefits of using a highly flexible model. Clearing these potential hurdles is Kaplan’s next challenge.

Author Bio
Edwina Lui (edwina.lui@kaplan.com) is the Information Architect at Kaplan Publishing, the trade publishing group within Kaplan Test Prep, the world leader in test preparation. Edwina has led the ongoing implementation of DITA XML at Kaplan, and plays a key role in Kaplan’s evolving content management and digital product innovation initiatives. She will be delivering a presentation on Kaplan’s DITA implementation efforts at CMS/DITA NA 2012 in La Jolla, CA.