In the last article of the GDPR series, Kerstin Berns made the connection between the most important principles of the GDPR and translation processes. Now, we are looking into the practical aspect to see how these principles can be implemented with current translation systems and which solutions the manufacturers already possess.
Where is personal data (PBD) hidden?
PBD can be saved in a CAT tool in the following locations:
- Original text and the translated text (visible)
- Metadata (hidden)
- In workflow systems, in the master data
Metadata is information about processing that the system automatically saves. This includes, for example, the name or ID of the creator or processor (which can be project managers, translators, reviewers, etc.).
Metadata is stored in:
- Translation Units
- Translation Memory (TM)
- Terminology
- Log files and invoice data in report workflow systems
- Comments, track changes and dictionaries- depending on the tool
How can you find PBD reliably?
Let’s say that we have been working with our CAT tool for several years and have accumulated a considerable amount of data (all of which contain non-anonymized PBD). According to GDPR, we need to be able to identify the PBD, export it, and erase it. But that’s not an easy task.
PBD in metadata
Deleting PBD in metadata may seem comparatively easy. Most tools provide the ability to search translation memory (TM) records for a certain editor filter and apply this filter when exporting. Depending on how these TMs are managed, this can turn out to be quite the manual effort. Often, there isn’t even a filter for terminology data. Furthermore, if the processor is anonymized, it is no longer possible to trace who has made a translation error in the event of a complaint. So anonymization would have to take place via encryptable and decryptable or speaking IDs. But is that feasible at all?
PBD in the text
Things seem a lot harder with PBD in the texts themselves. Because we want the PBD to be translated, without saving them in the system. They must be anonymized in the system, without being anonymized outside the system. So, they would be replaced by an alias, a placeholder, etc., while importing into the tool, and show up again as the original when exporting. Why does this seem so difficult?
- PBD are often content giving. If you delete them from the text, the translator may no longer understand the meaning.
- PBD have one grammatical form that cannot always be adopted in the translation without a reference word.
- Errors can occur when replacing the PBD, resulting in not enough or too much being replaced. Of course, this then leads to a quality loss in the translation.
- You cannot reliably find the PBD in the text. This might work with regular expressions with e-mail addresses, but it gets difficult when it comes to telephone numbers or location information. Additionally, for proper names you would need either a sophisticated process via Named Entity Recognition (NER), Part-of-Speech-Tagging (PST) or a specially developed script or a list of all names that appear in the text. And who even has that?
All in all, the determination and anonymization of PBD in text seems risky and uncertain.
So what are the CAT tool manufacturers doing?
PBD in texts especially is a source of frustration for CAT tool manufacturers. Some argue that the text that goes into the tool is the text creators’ concern. Put simply, this means that whoever sends PBD into the tool is to blame. Others suggest that the texts should already be anonymized when they are sent into the tool. Yet another point of view is that there should be a translation without translation memory (even though this would result in higher translation costs, due to less pre-translation). A few others provide the option to replace PBD in text, but with several disadvantages. How complicated!
However, there are some good approaches to the treatment of PBD when it comes to metadata. Here are a few examples:
- At Plunet, the visibility of a contact during a deletion request in ongoing projects is automatically restricted for users with certain rights.
- XTM only works with aliases instead of real usernames, so that no real names are saved in the metadata. The alias is decrypted with the user account.
- SDL MultiTrans offers a comprehensive role-rights concept that controls the visibility of the user IDs. In addition, temporary, generic users can be applied here, which are only saved in the system for the duration of a project and are automatically deleted after the project is complete.
- Also, with SDL MultiTerm workflow, group names or roles can be used from the start instead of usernames. Upon request, usernames are also deleted or anonymized from all log files, voting histories, etc.
A Case Study
At this point, one could say that the implementation of GDPR in translation systems is fundamentally difficult and anything but helpful to text quality. Nonetheless, there are some solutions, especially when it comes to handling personal data in system metadata. PBD, on the other hand, poses larger problems in the text.
In the next and last installation of this series, we are presenting a case study that can be processed with the help of SDL Trados Studios and its GDPR solutions are being processed. Stay tuned!