Customise Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorised as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyse the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customised advertisements based on the pages you visited previously and to analyse the effectiveness of the ad campaigns.

No cookies to display.

CIDM Matters

CIDM Matters is an electronic newsletter published on the 1st and 15th of every month. Browse these articles published in CIDM Matters or subscribe to the newsletter by selecting the “Join Us” option on the navigation bar.

Building Intelligent AI Systems with Structured Semantic Data

Headshot photo of man with short dark hair, dark full mustache and beard and glasses.Amit Siddhartha
May 15, 2025

A Path to Accuracy and Efficiency

As AI adoption accelerates across industries, the challenge shifts from model size to data quality. Unstructured and inconsistently formatted content introduces ambiguity, leading to hallucinations and inefficiencies in AI systems.

This article explores a structured approach to content curation and data enrichment using standards like DITA-XML, RDF, and OWL ontologies to build semantically rich datasets that are optimized for training and dynamic retrieval. By converting raw content into modular, typed DITA topics and enriching them with domain-specific metadata and relationships, enterprises can construct knowledge graphs that support accurate, explainable, and context-aware AI outputs.

We present real-world implementations across Information Technology, Software, Banking, Financial compliance, MedTech research and compliance, and Legal Contract automation where semantically enriched, intent-aligned content has improved retrieval accuracy, reduced content duplication, and enabled intelligent search using a combination of SPARQL queries, knowledge graphs, and vector-based retrieval models.

Technical implementation details include the use of Protégé for ontology design, Neo4j for knowledge graph storage, and the integration of structured content with Retrieval-Augmented Generation (RAG) pipelines for dynamic AI performance.

The paper demonstrates how semantic enrichment not only enhances content usability but also reduces computational cost and training time. This approach provides a scalable path to building trustworthy, domain-aligned AI systems driven by meaningfully structured knowledge.

Introduction

As the adoption of AI accelerates across industries, the focus has shifted from building large-scale models to curating high-quality training data. Structured semantic data—rich with contextual meaning, metadata, and defined relationships—enhances AI’s reasoning, retrieval accuracy, and explainability.

This article presents a pragmatic approach to transforming unstructured information into semantically enriched data using RDF standards and DITA-XML standards to support ontology-driven AI systems and reduce computational overhead.

Computational Impacts of Unstructured Input

Organizations store vast amounts of unstructured content in formats like DOCX, PDFs, emails, or HTML. However, such data lacks clarity, consistency, and machine-understandable semantics. Training AI models directly on this content often leads to:

  • Hallucinations and incorrect outputs
  • Redundant or conflicting knowledge
  • High compute costs in retrieval and indexing
  • Inability to trace or explain AI decisions

Large Language Models (LLMs) perform better with structured input—especially content annotated with taxonomy and domain semantics. Without structuring and enrichment, these models operate in an ambiguous knowledge space.

Beyond accuracy concerns, unstructured content imposes a computational and financial burden.

  • Increased Processing Time
  • Higher GPU/CPU Utilization
  • Memory Overhead
  • Complexity in Querying

Example

In one contract management project, the organization was using an LLM-based assistant to extract clause-specific insights from thousands of agreements. Without content structuring, the model had to process entire documents each time, resulting in query response times of 30–40 seconds per file and GPU consumption spikes by over 60%. After converting contracts to modular DITA topics and mapping them to an ontology, response time dropped to under 5 seconds, with a 40% reduction in compute resource usage.

This clearly illustrates how structured, semantically enriched data not only improves AI accuracy but also reduces the total cost of ownership […]

ConVEx 2025 Convenes in Heart of Silicon Valley

Chuck Martin
May 1, 2025

More than 200 content developers and strategists answered “Yes” to the title of the famous Dionne Warwick song and indeed found their way to San Jose in early April for the 27th ConVEx conference, hosted by the Center for Information-Development Management (CIDM). […]

Believe in Knowledge Graphs: An Introduction, Ted Lasso Style

Sweta Bhagat, ServiceNow
April 15, 2025

Howdy, folks! Ever heard of Coach Ted Lasso? The coach who knew nothing but taught us everything. When I first watched Ted Lasso, I wasn’t just entertained but was inspired. Ted didn’t need to be a football expert to build a winning team. He focused on structure, relationships, and trust. […]

CIDM Sponsor Profile – Fluid Topics

Fluid Topics' logoKelly Dell, Fluid Topics
April 1, 2025 

Why Your Content Workflow Broke (and How to Fix It)

Life was simpler for technical writers when they only had one content authoring tool and one endpoint to which they delivered content. Then, as tools and systems multiplied at both ends of the workflow, the complexity of the documentation tool stack also intensified, and it quickly became clear that there was a missing piece as the publishing process became unsustainable. […]

The Art of Managing Up: A Strategic Advantage for Technical Communicators

Dr Amanda Patterson, Comtech Services
March 15, 2025

Managing up is an essential skill that technical communicators must master to ensure their contributions are recognized, their work is supported, and their teams are set up for success. In a recent CIDM roundtable, members shared their experiences, challenges, and strategies for managing up effectively within their organizations. […]

Boosting DITA XML Workflows with Artificial Intelligence

Headshot photo of man with short dark hairAlex Jitianu, Syncro Soft/Oxygen XML Editor
March 15, 2025

Artificial Intelligence (AI) has become a transformative force across industries, and the field of technical documentation is no exception. However, while AI offers immense potential, it’s important to approach its integration thoughtfully. […]

CIDM Sponsor Profile – DeltaXignia

DELTA Xignia logoDeltaXignia
February 15, 2025

DeltaXignia: Redefining Content and Data Change Management with a New Identity

MALVERN, February 03, 2025 – DeltaXignia, formerly known as DeltaXML, proudly unveils its new brand, signalling a transformative shift towards enterprise-grade solutions for managing content, document, and data change. […]

CIDM Sponsor Profile – Bluestream

Andrew Douglas, Bluestream
February 1, 2025

 

AI Chatbots & Portals Coexistence

Here at Bluestream, we recently conducted a survey on the future of Portals, and if AI / Chatbots will eventually make its way into this space.
The survey findings indicated that 70% of respondents believe both portals & AI chatbots will coexist.
[…]

Go to Top