Stilo logoHelen St. Denis, Stilo
December 1, 2022


Migrating unstructured content to structure

There are several ways to convert your unstructured content to a structure such as DITA XML. You can do it in-house, of course, if you have the resources available. It will usually involve writing, adjusting, and maintaining scripts to do the heavy lifting. The more formats the content is in, the more scripts you will need.

Alternatively, you can gather up all your content and send it to a vendor to convert it for you. It is often better to let a specialist do the work so you don’t have to. But it may mean a content freeze for some time while the conversion is undertaken.

In either case, there will inevitably be some pre- and post-conversion work required. This can be a pretty large commitment of time. For example, if your unstructured content is in Word, you will pretty much always need to go through the documents and check that your paragraph styling has been done correctly. If it’s InDesign, you may need to go through and make sure all your content is properly anchored in place.

After conversion, there will be cleanup to do as well. Often, all your content will have been converted to generic topics. That may not matter, depending on your needs, but it might matter a lot. XML editors can help with this, but it will always be time-consuming.

There may well be a need to update semantic markup, which may not have been accurately picked up in conversion. There may be spots where a note or a list item includes text that should not have been part of it. These are just a few examples.

A third alternative provides another method of refining your content structure. A user can modify the rules that govern a conversion at any point, using an interface that does not require a developer to operate. This can help reduce the amount of cleanup at either end of the conversion.

If there are errors in the conversion, there is likely no need to revise the input, nor to fix the resulting DITA, which in some cases can be a significant amount of work. Simply adjusting a rule, in a user-friendly interface, and then rerunning the conversion, can often avoid the necessity for huge structural changes, or repetitive small changes.

Our products

Stilo’s Migrate provides this functionality. The built-in rules editor is prepopulated with rules especially configured for your content and your specifications. You can easily improve the quality of the conversion, and because this is software in the cloud, you can convert on a just-in-time basis, so there’s no need to freeze content before a release.

Stilo has been involved with migrating unstructured content to structured for almost 40 years. Our first product was OmniMark, still the language of choice in the SGML world as well as for large XML projects.

Analyzer has been added to the list recently. It allows users to analyze their unstructured content for redundancies, and to calculate potential cost savings from content reuse.

Optimizer runs on DITA files. It allows the user not only to identify identical or nearly identical elements, but also to create the conrefs or update ditamaps so the deduplicated content is ready to go.

If any of this is of interest to you, please contact us at [email protected]