SMT, NMT, adaptive MT, PEMT… what does that mean exactly?

Anyone who has ever dealt with machine translation (MT) will probably have bumped into at least one of the following abbreviations: SMT, NMT and PEMT. But what do these abbreviations actually stand for?

This question will probably only put a weary smile on the faces of die-hard translation scholars. For laymen on the other hand, the meaning of these abbreviations may not be apparent immediately. However, they are playing an increasingly important role in the translation Industry. Therefore, this blog post is intended to serve as a short guide through the jungle of abbreviations in the world of translation.

SMT – the statistician

The SMT approach represents the Statistical Machine Translation. The translations provided by an SMT-System are based on probabilities of word translations and word order, which are then incorporated into a translation and language model. These probabilities are calculated on the basis of large mono- and bilingual text Corpora. Those are representative for the use case of the SMT engine. Given that SMT-Systems translate sequences of words, the approach is also known as phrase-based (Phrase-based Statistical Machine Translation). Additionally, the greatest weakness of the SMT approach is its phrase-centeredness. This results in syntactic dependencies in sentences that may not be detected correctly. Consequently, the legibility of translations is not very good, especially where longer sentences are concerned.

NMT – the networked

The second, current MT approach, is the Neural Machine Translation, in short NMT. Since 2015, it has been the dominating topic within the translation industry. Contrary to SMT, NMT-Systems work based on neural networks. These networks belong to different types of network models. The latter are also used in other areas of machine learning for the purpose of classifying data. Furthermore, unlike SMT, NMT makes use of abstract representations of words. These are known as “word embeddings”. This way, more context information is included in translation related calculations. In addition to that, NMT stand out due to a more intensive consideration of the overall context of individual words. This allows the Generation of translations that are often well-formed both grammatically and stylistically. Nonetheless, you need to be cautious. Even sentences with high legibility can often differ in content from the original sentence (e.g. omissions or additions)!

Adaptive MT – the learner

What exactly does “adaptive” mean in the context of MT? It refers to the adaptation of an engine to specific user inputs. To clarify, the MT engine adapts to the corrected version of a raw machine translation that originates from the same engine. Essentially, any engine that can be trained can also be optimised by adding new or corrected translations. This usually happens when a critical amount of new training data has been accumulated. The engine is then trained with the new data and is thus adapted to the user input. However, in the context of translation tools, adaptive MT is usually understood as follows. The engine continuously adapts to user input, i.e. during the translation process, without needing the user to explicitly initiate training. This enables the system to “learn” in real-time during operation and to adopt the translator’s writing style and choice of terminology step by step.

And what does PEMT stand for?

As previously mentioned, MT-Systems do not work completely error-free. Consequently, the output of both systems has to be post-edited. This process is referred to as post-editing of machine translation, in short PEMT. Its role in the translation industry is growing constantly. There are various gradations for PEMT, which are applied depending on the final translation quality required. Here, ISO 18587:2017 distinguishes between Light Post-Editing (LPE) and Full Post-Editing (FPE). Nowadays, PEMT is usually carried out in an interactive context. This means, that an MT system is embedded directly into a translation system.  Like this, pre-translations can be generated segment by segment. Then, the translator can edit each segment as he sees fit.

All those abbreviations and their exciting meanings in the world of (machine) translation are a topic to be reckoned with. But still, it remains to be seen what they’ll have in store for us in the future.

 

Picture: Lysander Yuen on unsplash.com

Tags:

Related Posts