How an AI-supported approach can help build new terminology - a blc success story!
Initial situation
A termbase with potential
Many companies know this situation: there is a terminology database available, but it is not complete. It may contain a few key terms – often without context, clear definition, or metadata. This was the case for our customer – a medium-sized Swiss company with an international focus, ambitious goals and a clear plan.
The aim was to future-proof its terminology work. Specifically, an AI-supported terminology process needed to be set up and tested on a small scale – as a prototype for a later large-scale application in the company.
The starting point: a narrow term base with no structured definitions or context – but with the clear idea that this must change. That’s exactly why blc was brought on board.
Approach
From raw text to professionally approved terminology
Term extraction: The basis
Using the blc Data Toolkit, we extracted new term candidates from a collection of specialised texts and structured them for subsequent review. For us, term extraction follows a hybrid approach, i.e. a combination of AI and statistical methods.
The focus was not only on creating a pure term list, but on preparing all the relevant information that would later be required for the creation of definitions. This particularly involved collecting context sentences from the original material.
Term validation: The finishing touch
During automatic term extraction, depending on the setting, the resulting list of extracted candidates can be long – often including some which are not term-worthy. That’s where our Data Toolkit comes in again. Using an AI model trained specifically for terminology testing, the extracted terms were validated. Of course, we did not rely solely on the machine: At blc, the automatically extracted and pre-validated terminology always goes through human quality assurance by our experts. The preparatory work done by the AI leads to a significant speed-up, though!
Of course, we also compared the new terms to existing terminology – after all, we do not want to create duplicates.
Creating definitions with AI: The actual gamechanger
The real highlight of the project was the automated generation of definitions. For this, we developed a pipeline that used context sentences from customer documents as input for AI-supported definition creation using an LLM and a RAG process.
This was how it worked:
- Relevant contexts were automatically extracted from the original texts for each term.
- These contexts were used as a basis for generating technical definitions with an LLM.
- Prompt engineering ensured that the definitions met the formal and technical requirements.
- Each definition contained an indication of the underlying source so that it was always clear what the definition was based on.
The result: technically correct, well-formulated definitions with traceable origin – quickly created for the entire batch.
Terminology work: Still necessary, but focused, easy and fast!
In the next step, each generated definition underwent a terminology check by our experts. They checked the correctness of form and content, consistency and relevance. In some cases, definitions were slightly revised, in others they were further sharpened with feedback from external technical experts. Our conclusion: most definitions were usable straight out of the box – only few had to be adapted!
Once the definitions were there, the concepts could be prepared: we merged synonyms and spelling variants, defined usage status and added additional metadata (e.g. context, source, grammatical information).
Import: Into the target system without a care
All the checked terms and metadata were then – again using the Data Toolkit – converted into the customer termbase’s data model. The new terminology was then simply imported into the existing termbase and used directly for all editorial processes.
The Result
A termbase that matters
By the end of the project we had provided a successful extension of the termbase with a first batch of new terms – including complete, tidy metadata and above all: High-quality definitions.
The advantages for the customer:
- Improved termbase quality through reliable, machine-generated, professionally reviewed terminology including definitions.
- Significant time savings thanks to AI-assisted extraction, validation, and definition creation providing sound suggestions in a short time.
- Traceability through reference to the source, increasing confidence in the content.
- Data delivery directly into the system – ready for productive use.
Conclusion – AI understands terminology (and so do we!)
This project has demonstrated impressively that an AI-based pipeline for building terminology works – in practice, under real conditions, and with convincing results. The mix of intelligent automation and expert human validation makes the difference and leads to a termbase that can be used reliably.
Outlook – What's next?
The project has laid the foundation, but there is more to be done:
- Additional terminology will be extracted and enriched with metadata.
- Existing terms will be supplemented with definitions and other metadata to take the entire termbase to a new level.
- And in the long term, our AI-assisted process is ready to be scaled – for continuous, large-scale terminology maintenance.
We look forward to taking the next steps – and would be happy to take your terminology projects to a new level too!
Curious about how such a process might look like in your company? Get in touch with us!