Processing language data
Data are valuable - your company's language data even more!
Good language data are essential for high-quality use of the latest AI-technologies, such as machine translation, semantic search engines or chatbots. But that is not all! A good language data base is also essential for terminology, ontologies or high-quality translation memories.
But what are language data?
Language data are everything stored as text, in written or spoken form on cloud or on-premise servers, in content management systems, in website backends, etc.
The very best language data is well structured, often bilingual, groomed, and can be processed automatically.
There are no such data in your company yet? No worries, unstructured and monolingual data with low quality can become a valuable asset – with our help!
With us, your language data finally pays off.
Never mind in which formats, quality levels, or languages your data exists, if monolingual or bilingual, structured or unstructured, as XML, TXT, PDF, CSV, DOC, HTML… We analyze, extract, enrich and modify language data according to your individual requirements. And this is how we do it:
Analyze language data to understand the data quality
With our data analysis module, we create detailed reports as well as graphical overviews to give you, us, and your management an idea of the state of your language data. That way you know what the quality of your terminology database, translation memories, and other language data really is, if they are useful to you, and if there is optimization potential.
Export language data to expand your database
If needed we extract other language data from your company’s content bases to complete the existing database or to add further languages for example.
Enrich and modify your data to use it more efficiently
We use customizable scripts and automation techniques to fulfill your individual data requirements. And we process your data in such a way, that they can be integrated into your company’s processes in the desired format and quality.
Never mind if AI or classical language application: We process your language data properly and securely on-premise and ensure that it can be used efficiently – no matter in which application!
Use Case “Terminology Migration and Cleanup”
Challenge: A medium-sized mechanical engineering company wanted to switch to a Translation Management System (TMS) that was a better fit. Extensive language data was available in the form of multilingual translation memories and terminology tables in Excel, which needed to be imported into the new TMS. While doing so, it turned out that personalized fields from the Excel spreadsheet could not be transferred to the terminology entries of the new system.
Solution: Since the entry structures of the Excel spreadsheet could not be mapped to the target system with on-board equipment of the TMS, blc converted the initial data into a valid import format (e.g. TBX) to ensure a clean and automated import of all terminological information. In addition, during this migration, translation units were marked to show unwanted terminology in the translation memory, allowing targeted segment cleanup.
Use Case “Finding synonyms using vector space models”
Challenge: An automotive manufacturer needed to expand the database of an application for guided troubleshooting. The reason: Many users used a variety of names and abbreviations for specific components, error patterns, and error locations during troubleshooting and often received no results.
Solution: berns language consulting created a new, expanded database with as many variants as possible. Language data from databases, translation memories, after-sales literature, and other sources were extracted and processed. This data was used to create a vector space model of all the terms. With this data model, synonyms for existing term lists could be identified during troubleshooting, making this process highly efficient even for a wide variety of input variants.
We make your valuable language data even more valuable.
Analyze language data
- Analyze structure & content of language data
- Create detailed reports about problematic data areas
- Create graphical reports
Extract language data
- Extract language data from company content base
- Extract language data from external content bases
- Align language data
Migrate language data
- Modify & enrich language data
- Standardize variants
- Delete unwanted data
- Migrate language data