Summer Placement Diaries
Elena's Summer Placement at XTM
My first encounter with the technical side of machine translation (MT) and natural language processing (NLP) was in the scope of EM TTI. Having been in the language services sector for about ten years, I’ve been acquainted with the user-side of various computer-assisted translation (CAT) tools and the increasing importance of MT output; however, I have just started discovering the inner workings of the technologies that underpin NLP and neural MT. I must say I was immediately hooked and became eager to learn more about these technologies and how I can contribute to their development.
To stay on this track, for my first-year placement I selected XTM International, a company that merges linguistic expertise and the latest developments in translation technologies. It has been working in this domain for 18 years, with offices in the UK, the USA, Japan, Poland, Ireland and Argentina. Its mission is to help clients manage large volumes of content, embodied in XTM Cloud, a cloud-based translation management solution allowing users to automate localization processes and perform in-context translation and review. Its latest addition is a bilingual terminology extraction feature, which automatically detects terms and their corresponding translations in an aligned parallel corpus. The process is supported by XTM’s Inter-language Vector Space, a newly-developed term extraction approach based on neural networks and big data.
At the introductory meeting with Dr Rafał Jaworski, XTM’s linguistic artificial intelligence (AI) expert at the helm of the AI and NLP team, and Mr Andrzej Zydroń, CTO of XTM International, we agreed that I would be an intern in the AI and NLP department. An internship roadmap was then set up, which was reviewed and expanded at the weekly meetings with Mr Jaworski.
I began by getting acquainted with the functionalities of XTM Cloud in general and the automatic bilingual terminology extractor in particular. Throughout the internship, my main task was to test the terminology extractor’s performance on low-resource languages. The final goal was the development of a step-by-step evaluation scheme to help assess the extractor’s ability to recognize true terms and their correct translations.
To accomplish this goal, I employed both my experience in translation and my newly-acquired programming skills. The termbase that the terminology extractor produces contains extensive metadata to help a linguistic evaluator make an informed decision on the relevance of the extracted term and the correctness of its translation. I developed small Python scripts to gather data, as well as to extract pieces of information from unstructured text and present it in a user-friendly way. Throughout this process, I was supported by the XTM team, who were always willing to answer my questions and provide guidance in overcoming obstacles.
The internship was an excellent opportunity to put into practice the combined set of programming and linguistic analysis skills I gained in the first year of my studies in a real-world business environment.