News
student sitting at desk, with laptop open in the foreground

Summer Placement Diaries

Universitat Oberta de Catalunya logo

Daniya's Summer Placement at the Open University of Catalonia

.

This autumn, I undertook my EM TTI placement with the Open University of Catalonia (UOC) under the supervision of Prof. Antoni Oliver Gonzalez

At the very beginning, Prof. Oliver Gonzalez created a detailed plan of weekly activities for me that matched my academic and professional interests and have hopefully benefitted the MTUOC machine translation project, which was at the core of my placement. This project, run by UOC, offers a turnkey solution for translators who want to integrate neural and statistical machine translation (MT) systems into their workflow. While a lot of software for training and using MT systems is now available to language professionals, it is not always easy to integrate in practice as it often requires the user to have advanced IT skills. The MTUOC project seeks to offer a simplified solution that anyone could adopt, which includes scripts for corpora pre-processing and MT system training (you can find out more about it here).

As the focus of my dissertation is on the use of translation technologies in multilingual media and specifically in interlingual subtitling, my supervisor and I decided to attempt training an MT system that could be used to translate subtitles for news reports and short films from English into Russian. Therefore, my placement plan was composed with this goal in mind.

During the first weeks, I explored a range of freely available corpora, using OPUS Corpus as my first port of call, and then looked into certain task-specific corpora, such as the WMT 2019 news translation task dataset. At the same time, I read up on the applications of MT to subtitling to see which types of corpora tend to yield the best results when used as training material for MT systems for interlingual subtitling.

Keeping that information in mind, I selected candidate corpora to train our system. At the same time, I started to get acquainted with the scripts and programs available within the MTUOC toolkit and tested some of them. The next step was to train the system, and as this process requires a lot of computing capacity, my supervisor offered to carry out the training himself using suitable hardware. In the meantime, I focused on a slightly different assignment: an online course on MTUOC organized by UOC. My role here was twofold: I participated in it as a student and I also translated study materials into English from Spanish, which is the original language of the course.

When the systems tailored for subtitle translation were ready, we ran a range of tests on subtitles for different types of content, including news reports, explainers and short fiction films. It was quite exciting to see how a tangible solution to a practical task that I used to face daily at work as a media translator came to life. I then analysed the output of our task-specific MTUOC engine and compared it to the output of other systems. We found that in some instances – specifically, in the translation of subtitles for a short film – the MTUOC engine appeared to outperform some of the publicly available MT systems in terms of output fluency and accuracy.

Undertaking this placement with UOC has been a very enriching experience, and I have no doubt that the skills that I have acquired will help me in my second year of studies, where greater emphasis will be placed specifically on machine translation.

Post written by Daniya Khamidullina