The introduction of machine translation in a company is a serious step that raises many questions regarding its technical implementation, its incorporation into the existing translation workflow and its effect upon data security and cost efficiency. Because of the newly sparked interest in the general applicability of machine translation caused by NMT (Neural Machine Translation), we are dedicating a longer article to this subject aiming to answer some fundamental questions and to prepare the ground for an initial assessment.
These questions and the associated uncertainties are just one reason why MT is only now – with some hesitation – making its way into German companies. Another reason to consider is that the use of customized MT services via programming interfaces (APIs) has only gradually spread over the last few years, thus creating easier conditions for integration into existing infrastructures and making costlier on-premise solutions superfluous in many application scenarios.
Despite better integration options, many decision-makers still face the questions: Why should we use MT? Will it pay off? Is it even worth trying?
Clear expectations – what should MT do?
It’s easier to answer these questions when it is clear what the MT output is to be used for, because MT comes in different flavors. Although the ultimate goal is, naturally, a translation that cannot be distinguished from a human translation in terms of content and style, there are applications for the MT that require relatively little integration and preparation effort and offer added value outside of the classic translation workflow. For example, MT can be used in application development where localized content does not have to be faultless, e.g. for testing purposes at an early stage. There is great potential for savings here, as human translation and the associated communication channels are no longer necessary due to the dialogue with the MT server. If you want to use MT for internal communication, then you will need to perform more extensive engine optimization. Internal communication should be understandable but has a tolerance for individual errors as long as the information content is preserved for all communication participants (‘gisting’). The optimization effort can include the compilation and correction of the training data, the enrichment of the data with further language material as well as error analysis. After the training phase, you get productive engines for the required language pairs which don’t have to be trained further unless great changes are made to the source material.
MT output with very high quality demands requires further optimization and analysis effort which usually has to be supplemented by a correction process, such as pre- and post-editing, in order to produce ready-to-be-published texts (e.g. brochures and websites). Terminology can play a decisive role in the entire MT process. The more terminology is available for the languages used in the MT workflow, the more reliably central corporate terms are assigned the correct translation in the target text. Whether terminology can be used for optimizing, depends on both the method and the consistency of the terminology used in the source texts. Therefore, the use of authoring tools during content creation is a great bonus for the MT process because it makes source texts much more consistent.
In short: High MT quality can be achieved by carefully preparing and compiling the training texts and using well-maintained terminology – especially when translating in specific subject areas (domains).
SMT vs. NMT – robustness or elegance?
Although neural MT will undoubtedly lead the MT bandwagon in future, it may be worthwhile to take a closer look at the individual strengths of the different methods. An important criterion that speaks for SMT (Statistical Machine Translation) despite convincing NMT results is the control over the output. It is easier to analyze the relationships between source and target phrases with SMT, which is why individual translations can be controlled more precisely than with NMT. The use of terminology is also still an obstacle for NMT, but the systems are making steady progress here. NMT’s clear advantage is the processing of the sentence as a whole, which results in the correct sequence of words and in a better legibility of the translation. However, due to the hidden processing within the neural network, specific translations are not controllable and errors are more difficult to analyze. As far as training is concerned, SMT systems are more robust against erroneous training data, but they also require more data than NMT.
In short: The decision whether to use SMT or NMT should be made after checking how these methods suit the specific use case. Despite generally good readability, NMT cannot be fully recommended in some scenarios!
What about the costs?
As in the human translation workflow, the number of language pairs required and the planned translation volume determine the efficiency of use. The volume of up to 2 million characters per month, which is free of charge with many providers, is far from sufficient for larger volumes, but comprises over 1300 standard pages of 1500 characters each (excluding spaces). In addition to the volume, the language combinations determine how many engines have to be purchased. If you want to translate in different domains, the number of engines will be multiplied by the number of existing domains. Monthly costs vary depending on the different pricing models which either include a fixed number of engines or are determined only by the translation volume. The SMT vs NMT decision is currently reflected in the pricing of offers. The cost of re-training depends on the frequency and scope of the content changes in the texts to be translated. A clearly defined catalogue of requirements containing quality requirements and the available data serves as the basis for a solid proof of concept, which will save you from unpleasant surprises and false management expectations through clear evaluation conditions and a transparent statement of feasibility.
In short: With the increasing number of language pairs and the decreasing quality or availability of training data, the preparation effort and thus the costs for introducing an MT system increase.
What MT system is really safe?
Once the decision to deploy MT has been made, data security concerns arise. Cloud service providers face these with military grade encryption standards and selectable server locations. If MT deployment via the cloud is out of the question for compliance reasons, some vendors will also offer an on-premise solution. It should be noted that the hardware and integration effort for a high-performance local solution should not be underestimated and requires regular updates in order to keep up with the vendor standards. When assessing data security during transfer, it is advisable to consult existing corporate security standards, in which humans usually represent a greater risk of data leaks than the encryption technology used. If there are also concerns about the transfer of sensitive data (personal data, specialized terms) in the MT workflow, anonymization options can be used to reduce the fear of targeted hacker attacks.
In short: The security standards of MT services in the cloud are very high and the requirements of the corporate landscape are taken into account in increasingly customized solutions.
I want to give MT a try – where do I start?
Given the increasing diversity of suppliers, who also boast special versions of MT methods, it is not easy to find the right candidate. Often companies do not have enough specialized personnel to make an informed and well-founded choice based on extensive testing of the services. In this case, it is advisable to let independent experts accompany the objective decision-making process. It is also vital to outline the use case very clearly so that the best-suited application can be selected from the great variety of available systems. Knowledgeable experts can be of great help here, too. Last but not least: try to keep your expectations in check! MT is not a cure-all – but it can actually work wonders if used properly.