Designing Adaptive Documentation with XML: From Formal to Rhetorical Markup
One of the goals of single sourcing is to design and develop what we may call adaptive documentation, technical product information that may be rendered in different modalities (text, interactive graphics, and sound) and in different media (print, computer screens, and small displays on handheld devices) and may be assembled like Lego blocks in virtual networks according to users’ needs and wishes.
The question, of course, is how do we design technical content that is malleable enough, as it were, to serve all these purposes?
Part of the answer, no doubt, lies in adopting more modular, or object-oriented, approaches to information design as well as introducing media-neutral formats, such as XML, into the information-development process. But object-oriented approaches and new standards like XML are not enough.
We also need new ways of enriching content with information about its underlying communicative, or rhetorical, strategies and priorities. If, for instance, a document is to be automatically reduced so that it may be shown on a mobile phone, it is crucial that condensing the content should take place in such a way that the central message of the text is rendered while less significant details are suppressed. But condensing the content can be done only if the key contents of the document have been identified and encoded in advance.
Rhetorical signalling also seems to be needed in cases where users are expected to navigate effectively in vast hypermedia spaces. To create coherence in such environments, users need access to information about how individual content units are related to each other conceptually and rhetorically.
What I would like to do in this article is to discuss rhetorical markup as one possible strategy for enriching technical content1. More specifically, I would like to define rhetorical markup in text-theoretical terms and to briefly examine how communicative intentions and structures might be modelled and represented using existing XML standards.
Document markup may be defined as inserting code into a document to give relevant information about that document. In XML, markup comes in two flavours: elements and attributes. Elements constitute the most fundamental way of identifying and labelling text segments, while attributes are additional categories that may be attached to elements. Attributes are filled with specific values.
<para author=”Smith” color=”red”>This is a red paragraph written by Smith.</para>
Normally, markup in technical documentation has one of the following functions:
- structural markup (identifying sections in the document and possibly their relations)
- presentational markup (indicating how the document is going to be rendered)
- semantic markup (encoding important domain objects and concepts)
- metatextual markup (giving information about the document as an information resource, for instance, author, date of publication, key words, and so on)
These functions are broad categories and the boundaries between them are fluent. Accordingly, it is not always possible to unambiguously categorize markup in a given document. For instance, is the element <contact.person> in a <press.release> a form of structural, semantic, or even metatextual markup? And is a <list> a structural element or in fact a presentational one?
As a first attempt to define rhetorical markup, we may say that it is a kind of structural markup-the function of which is to represent a document’s communicative structure. That is to say, an explicit identification of the author’s intentions as they are realized in the various parts of the document. But what is communicative structure, then?
Formal and Schematic Structure
The idea of communicative structure is an important one in text theory (genre theory, discourse analysis, and so on). In systemic functional linguistics, for example, a distinction is made between two types of text structure: formal and schematic (see Eggins 1994). The formal structure of a document is the way in which it is divided into units, such as chapters, sections, and paragraphs, while its schematic structure reflects its functional organization, that is, the configuration of semantic units through which the author seeks to achieve his or her communicative goals (for example, introduction, background, argumentation, and conclusion). The manifestation of the schematic structure is done through language and is, in principle, independent of the formal structure, although there will be a one-to-one mapping of the two in well-designed documents.
Schematic structure is closely bound up with the concept of genre or text type. Genres or subgenres may be described as sets of documents sharing similar schematic text structures. A genre is said to have a generic structure potential (GSP) that defines the range of valid schematic elements and the patterns in which these elements are permitted to occur.
On the basis of these two concepts, we may initially define two types of structural markup:
- schematic markup (the representation of a document’s semantic components as defined by its genre)
- formal markup (the representation of a document’s concrete building materials-text, graphics, video, and so on)
Thus, a screen dump in a procedure in an installation guide, say, may either be marked up as a <figure> (formal markup) or as a <result-of-user-action> (schematic markup) or both. A text segment in a business proposal may be identified as a <paragraph> or a <conclusion> or both.
Formal and Schematic Markup
Typical XML document representation languages like XHTML and DocBook are, not surprisingly, well suited to describe formal document structures.
DocBook is a book-oriented markup language and contains a multitude of elements to describe the mortar and bricks of (technical) publications: <book>, <chapter>, <section>, <para>, <title>, <footnote>, <figure>, <mediaobject>, and so on.
In terms of schematic structure, DocBook offers some elements for marking up genre-typical meaning components. An author may insert a <dedication> in a book, an <abstract> in an article and a <warning> and a <step> in a procedure. DocBook’s syntax is relatively loose, though. The language also allows markup patterns that seem to violate genre conventions. For example, an author is free to insert an <abstract> in a <procedure>.
XHTML, the successor to HTML, is primarily a Web language and, therefore, does not have book-oriented formal elements but a host of others: <head>, <body>, <div>, <p>, and headings on six levels <h1> to <h6>.
In XHTML, elements for schematic markup are almost non-existent, which does not mean, however, that text segments cannot be assigned schematic roles. In XHTML, a