Term extraction is a manual, semi-automated or fully automated procedure for determining relevant terms for the construction of a terminology and the filling of a term database. For this purpose, potential terms from existing corpora, such as website texts or documentation, are extracted, requalified and entered into a database.

[…] After we explained in our last blog what term extraction actually is, we are now devoting ourselves to the methods of term extraction. Before a company approaches term extraction to build a termbase, it is important to look at the need and available resources to establish a meaningful approach. Since we at blc have committed ourselves to designing efficient processes for optimal output, we are starting today with the prerequisites. What methods and tools are available for extracting terminology from the source texts?

Manual vs. automatic term extraction

Manual term extraction searches for term candidates in the source text using a visual inspection. The advantages are that the terminologist examines the technical terms in their immediate context. With the help of his terminology expertise, he can assess whether the candidates are term candidates. The drawbacks are that manual term extraction is very time-consuming, depending on the document quantity. In addition, the results depend on the individual’s assessment.

The alternative to manual term extraction is automatic term extraction. A list of term candidates from selected source documents is generated with machine support. Manual checking of the output term candidate list by a terminologist is essential: A machine cannot be able to assess whether the extracted words or phrases are actually terminology. Nevertheless, a major advantage of automatic term extraction is the considerable time saving compared to manual term extraction: Instead of the complete source documents, only the automatically generated term candidate lists must be checked.

Monolingual vs. multilingual term extraction

In monolingual term extraction, source language namings are extracted. The transfer to other corporate languages can be done downstream after inclusion in the termbase.

An alternative is bilingual or multilingual term extraction. Here, the target-language equivalents are immediately assigned to the namings from the source language. Translation memories or aligned source and target documents are used as a starting point. […]

