Today’s Translation Management Systems (TMS) are usually equipped with a variety of connectors for the integration of Machine Translation Systems (MT systems). More and more systems, like Google AutoML, are now enabling users to train engines with their own language data.
With an increasing number of user-specific engines in various languages and specialist areas (domains), the TMS, connector and machine translation system need to work together smoothly to enable flexible integration of the engines.
Today’s installment of our blog series showcases this integration process via the example of the cloud-based MT system KantanMT and the TMS Across. First, we will discuss the MT System KantanMT and then go on to explain how it is integrated into the translation process within Across.
How-To: Machine Translation
Put simply, machine translation (MT) is the automated transfer of a source text into a target language without human intervention. MT engines are trained with specialist or customer-specific translation units, and terminological data. If the desired engine is later integrated into a translation project, you can choose between automatic pre-translation or interactive use of MT-suggestions during translation.
Nevertheless, keep in mind that machine translation alone usually does not achieve the desired translation quality, even though the MT engine itself contains high quality training data. If you want to obtain a target text that is adapted to individual or cultural circumstances, post-editing (i.e., editing by a human) is always a necessary step.
KantanMT: Swiss MT knife from Ireland
KantanMT, an MT service provider based in Dublin, Ireland, started with customized MT Training early on and today, provides a diverse function catalogue. These functions enable both MT professionals and beginners to train engines for a wide range of applications. Here, we are going to introduce the main KantanMT functions.
Training Data: Kantan Fleet and Kantan Library
There’s too little language data available to train an engine from scratch. Sound familiar? The Kantan Fleet is a “fleet” of pre-trained engines for various domains, designed to solve this problem. A sorting function lists the engines in the fleet according to domains such as generic, automotive, finance, legal, medical, and technical. If an engine is available in the searched domain and language pair, it can be copied to the training area of the system with one click. From there, it can be further specialized with its own language data. If a domain is not available in the required language combination, you can easily make a help request.
Full Control: KantanMT Dashboard and Alias System
On the Dashboard, the KantanMT threads come together: organization, training, and engine activation for use in TMS, CAT tools, and API interfaces.
One especially important feature for KantanMT integration in TMS and CAT tools is the Alias-Function: each engine can be given a label (alias), with which the engine is assigned to a group or domain. This alias enables the automatic selection of the desired engines in the correct domain via a connector in the TMS or CAT tool, such as Across. In Figure 1, the aliases GENERIC and LEGAL were assigned. These will be associated with the corresponding subject areas in the TMS.
Configuring Machine Translation in Across
Across offers a large pool of options for the integration of machine translation. This means that many MT systems can be selected in the system settings and be configured according to special requirements.
Besides KantanMT, interfaces to DeepL, Moses, SYSTRAN and Google Translate are currently available. As a user of Across, you are not forced to decide on a specific MT system. You can work with the system that suits you best. The use of several systems is also possible, as is the connection of further MT engines by the manufacturer.
In our example, we have linked KantanMT to the Across Language Server via an API and assigned an optional alias. This acts here as a specific label and enables both clear identification and automatic selection of the correct engine. Aliases are later associated and linked to the assigned subject areas.
But does the engine work perfectly? This can be tested in Across before you even have to create a translation project. You can use the “Check service” button to make an initial translation request that initializes the MT system. If the connection is running, a positive response is then issued.
For each engine configured this way, additional language pairs, system attributes (e.g. subject area “Legal”) or extended settings with regard to match values and paragraph lengths can then be defined. Users are free to determine the individual conditions for the use of their connected MT engines.
KantanMT in the Across translation project
Whether an MT engine is finally triggered or not is decided when a translation project is created.
In addition to standard settings, the Project Wizard of Across is used to define the project type, e.g. standard or MT, and project attributes. The latter are essential for the use of the MT engine within the project.
But why is that? The answer is simple: By adding the subject area, such as “Legal” as a project attribute, the link to the configured MT engine with the corresponding alias is established. Across recognizes the conformity of the subject area and knows which MT engines are suitable for the project and which ones can be excluded.
Next, the language pair must be defined. It plays a decisive role in the assignment of the MT engine, since it must match the configuration of the MT engine just like the project attributes.
So far so good. But how can I make sure that a MT engine is really triggered for my project? Very easy: If the project settings match the configuration of an MT engine, the MT symbol of the manufacturer appears on the right. This confirms the use of the corresponding MT engine without further ado.
The exciting thing about this is that you can see exactly which engine is triggered for the project at hand and for what reason. To do this, simply move the mouse over the MT symbol of the manufacturer.
After the project creation is completed and the translation job is opened in crossDesk, the translation editor of Across, the individual segments are already visibly pre-translated by machine. A corresponding marking of the processing status of the respective segments ensures that clients can also see exactly whether segments have been translated by machine, with TM matches or manually. There is therefore a transparency that many clients want.
crossView: Segment status
CrossView, a special toolbar in the translation editor on the left, provides even more transparency and flexibility. Here, you can see the processing status of all segments in the document at a glance.
What is the ratio of segments translated with MT to those translated with TM matches? Was the MT suggestion adopted without revision, or were segments post-edited? Has a segment been translated or revised manually? And by who? These questions are always answered.
You can also use the function bar to define precisely which segments – MT and/or post-edited – are to be transferred to the TM later. The user is free to act as he wishes and can always benefit from machine translation within the TM while only paying for it once.
Would you like to integrate machine translation into your Across workflow? Or would you like to know how you can use MT optimally for yourself? Contact us (firstname.lastname@example.org), together we will find the best solution for your individual requirements, processes, and systems.
Image: Photo by Josh Redd on Unsplash