Default blog picture

Annual Event – Year 2: Student Vivas, Cohort 1

Profile picture of Karina Arzumanyan

Karina Arzumanyan

Technologies in Interpreting: the Impact of Glossary Management Tools on the Preparation Stage of Interpreting

Abstract: This pilot study aims at evaluating the impact of glossary management tools on the preparation stage of interpreting. After discussing the theoretical background on interpreting technologies, we will provide a comparative analysis of three terminology management tools and the results of an experiment conducted with nine students of the MA programme in Technology for Translation and Interpreting (EM TTI). The experiment was conducted to investigate whether the use of terminology management tools would have a positive impact on the preparation for an interpreting assignment. The participants were asked to prepare for interpreting at a conference devoted to Climate Change using four different methods: traditional preparation without any tools and preparation done with the help of InterpretBank, VIP and SketchEngine. For each method, the participants had to prepare three files (all extracted terms, relevant terms, and a bilingual glossary) within a limit of 15 minutes. Time spent on preparation and the number of extracted terms were counted. After the experiment students completed a survey to evaluate their experience. Our hypothesis was that the use of glossary management tools would improve interpreters’ productivity during the preparation stage. The findings of the study suggest that using the tools for preparation saves time and allows extracting a larger number of relevant terms. In addition to that, respondents’ expressed their preference towards the use of CAI tools. According to the results of comparative analysis and the survey, InterpretBank was the tool that received the highest score.

Keywords: interpreting, interpreting technologies, terminology extraction, interpreter preparation, terminology management systems

Bio: My name is Karina Arzumanyan and I am from Russia, from the city of Stavropol. I graduated from North-Caucasus Federal University where I studied Theory and Practice of Translation and Interpreting. In the course of EM TTI programme I spent the first year in New Bulgarian University and now I am finishing my second year at the University of Malaga. For my internships I chose Mitra Translations and KUDO. The topic of my dissertation is Technologies in Interpreting: the Impact of Glossary Management Tools on the Preparation Stage of Interpreting.

Profile picture of Anna Iankovskaia

Anna Iankovskaia

The Sources of Text Complexity for Machine Translation

Abstract: Neural Machine Translation (NMT) has been recently known as the state-of-the-art solution in the industry. However, despite the progress achieved in automatic translation in the time of neural networks, NMT might still be prone to errors. This study aims to investigate the sources of text complexity, i.e. textual components in the source that are likely to be mistranslated by NMT engines for the domain of news commentary in the English-Russian language pair. This work includes two parts: (1) empirical analysis of complexity sources where we analyse the output from two NMT systems, namely DeepL and ModernMT, in order to trace back the output errors to textual components in the source and understand the problem behind them (e.g. polysemy, anaphora); (2) development of a tool of hybrid architecture that is capable of detecting some of such sources in a text and assessing its partial translation complexity before a text is translated. The main result of this study is a classification of complexity sources that combines 33 complexity types at four levels (lexical, syntactic, grammatical and discourse), with polysemy and noun phrases as the most frequent ones at the lexical and syntactic levels, respectively. Besides, considering the f-1 score of 0.67 achieved on the task of phrasal verbs detection, we conclude on the potential of architectures with BERT (Devlin et al., 2019) as a core component to detect complexity sources. At the same time, the f-1score of 0.18 for the tool suggests that more than two of them should be included in the implementation.

Bio: Anna Iankovskaia is a linguist and translator with five years of professional experience. Her working languages are Russian, English and French. Anna obtained her first degree in Translation and Translation Studies from Smolensk State University (Russia) in 2014. Being a first-cohort student of the EM TTI programme, Anna spent the first year of these master studies at the University of Wolverhmpton and the second – at the University of Malaga conducting her research under the supervision of Prof Dr Ruslan Mitkov and Dr Cristina Toledo Báez. Anna’s research interests are in the areas of Neural Machine Translation and Deep Learning.

Profile picture of Marine Ovesyan

Marine Ovesyan

Comparative Evaluation of Generic NMT Systems: Human Judgment and Automatic Metrics

Abstract: A fine-grained error analysis of a machine translation output is an essential step in identifying the issues that remain unsolved. This kind of analysis informs the research directions to be taken to offer further solutions. In this work, we focus on error analysis of translations generated by four generic neural machine translation (NMT) systems for English-Spanish and English-Russian language pairs. We present a linguistically motivated error taxonomy that extends the existent well-established taxonomies to accommodate the specificity of both target languages. Additionally, the four levels of granularity constituting the taxonomy provide the exact specification of the information necessary to identify machine translation errors. Given that NMT systems are dependent on the training data which can control their behaviour with regard to specific domains to a different extent, we have chosen creative and expository text types. This enables us to scrutinise the errors deriving from literary, marketing and informative-analytical texts. The performance of the systems on these text types is evaluated automatiacally and manually by horizontal comparison of the outputs of the fours systems and vertical comparison of the outputs of the two target languages. The research comprises three case studies. The first case study explores the change of NMT capacity over a one-year period by comparing the outputs automatically produced in 2020 and 2021. The second case study focuses on the evaluation of NMT performance in the translation of creative texts. The third case study follows the research design adopted for the second case study to investigate the NMT performance in the translation of expository texts.

Bio: Marine Ovesyan is EM TTI cohort 1 student. In 2019-2020 academic year, she studied at the University of Wolverhampton. In 2020-2021, she continued her studies at the University of Malaga. Her research focuses on machine translation quality evaluation. She is currently working on two projects at Unbabel and Machine Translate.

Profile picture of Tandon Rea Bartlett

Rea Bartlett Tandon

The Future of NMT

Abstract: With the latest advances of deep learning (neural machine translation), there are claims that the quality of machine translation is getting closer to human quality. This dissertation aims to explore these trends and establish the facts by conducting evaluation experiments on three of the machine translation market leaders. The dissertation will also perform an analysis of the most frequently occurring error types in order to identify individual and universal weaknesses in current machine translation technology.

Bio: Rea Bartlett Tandon is a final year student in the EM TTI programme. Coming from a languages/translation educational background and with experience as a professional translator, her research remains focussed on how translators as well as non-professional users interact with translation technologies, and what the future might hold for the translation industry.

Profile picture of Ali Hatami

Ali Hatami

Domain-Specific Adaptation in Neural Machine Translation

Abstract: The development of deep learning techniques has enabled Neural Machine Translation (NMT) models to become extremely powerful where large parallel corpora are available. However, for many domains and language pairs, scarce or unavailable high-quality corpora lead to poor model performance. A domain may consist of text on a well-defined topic, or text of unknown origin with an identifiable vocabulary distribution, or a language with some other stylistic feature. Domain Adaptation approaches use both out-of-domain, in-domain monolingual, and bilingual data to transfer knowledge from one domain to another. In this work, we explore techniques related to data-centric and model-centric such as Incremental Training (Re-training), Ensemble Decoding (of two models), Combining Training Data and Data Weighting. While NMT models on domain-specific data can achieve good translation performance by using domain adaptation techniques on a representative training corpus, these techniques face some challenges. These include Over-fitting, Catastrophic forgetting, Data selection, Unknown words. In this thesis, we explore the challenges of domain adaptation techniques in NMT and investigate theoretical and practical solutions for each. We analyze the performance of these techniques by considering perplexity and accuracy metrics on the validation dataset and BLEU score on the test dataset. Based on our experiments, Data Weighting improved the performance of the baseline NMT model and outperformed more than other approaches by a small margin (about 7 in BLEU score) in English-Farsi language pairs.

Short Bio: Ali Hatami is an Erasmus Mundus student in "Technology for Translation and Interpreting" at the University of Wolverhampton. He also holds a master's degree in Artificial Intelligence. His research interests lie in the areas of Machine Translation and Deep Learning. During the master's thesis, he works on Domain Adaptation in Neural Machine Translation.

Profile picture of Daniya Khamidullina

Daniya Khamidullina

Corpus Linguistics Applications to Interlingual Subtitling: A Study of Language Technology Uptake and Needs of Audiovisual Translators

Abstract: This study is an exploration of the habits, attitudes and needs of interlingual subtitlers with regard to existing and prospective language technology. To investigate these topics, we conducted a questionnaire-based survey, the focus of which was on three groups of solutions: corpus compilation and management methods and tools, terminology extraction and management methods and tools, and machine translation (MT). Survey participants were also asked to evaluate a range of additional CAT features in terms of their potential usefulness in a subtitling workbench. In total, 162 responses from practising subtitlers representing over 30 countries were gathered. The overall levels of translation technology uptake within our sample turned out to be low, with only a minority of respondents reporting using the aforementioned solutions in their day¬ to-day work. Just 19,8% of participants stated that they employ corpus compilation and management methods and tools, 21% identified themselves as users of terminology extraction and management solutions, and 34,6% indicated that they use MT. For non-users of these technologies, the most frequently mentioned barriers to solution adoption were lack of awareness of the methods in question (with 47,7% of those who do not employ corpus compilation and management tools mentioning this factor) or perceived lack of usefulness of these solutions in the context of subtitling (reported by 43,8% of non-users of terminology extraction and management solutions and 80,2% of non-users of MT). These findings partially reinforce the research claim that the levels of translation technology uptake among interlingual subtitlers are at present quite low due to the limited applicability of existing tools to subtitling. At the same time, the results of evaluation of potential CAT feature usefulness in AVT practice suggest that certain technologies could facilitate the audiovisual translation process if they were integrated into subtitling software.

Short Bio: Daniya Khamidullina received her undergraduate degree in Linguistics, Translation and Interpreting (Russian, English, Spanish) from Lomonosov Moscow State University. After that, she worked as a translator, interpreter and localisation supervisor in mass media outlets for several years before joining the EM TTI programme. Daniya’s MA dissertation focuses on language technology for interlingual subtitling.

Profile picture of Halyna Maslak

Halyna Malask

Automatic generation of multiple-choice test items with deep learning

Abstract Multiple-choice questions are widely used in knowledge assessment not only in the formal education setting but also while testing a candidate’s knowledge during the job application as well as in various entertainment quizzes and games. Although the research on the automatic or semi-automatic generation of multiple-choice test items has been conducted since the beginning of this millennium, most approaches focus on generating questions from a single sentence. In this research, a state-of-the-art method of creating questions based on multiple sentences is introduced. It was inspired by semantic similarity matches used in the translation memory component of translation management systems. The performance of two deep learning algorithms, doc2vec and SBERT, is compared for the paragraph similarity task. The experiments are performed on the ad-hoc corpus within the EU domain. For the automatic evaluation, a smaller corpus of manually selected matching paragraphs has been compiled. The results prove good performance of Sentence Embeddings for the given task. Although the doc2vec method failed to find the exact matches from the evaluation corpus, human evaluation of the retrieved paragraphs is needed to fully assess its performance. The used algorithms showed a significant difference in their efficiency, particularly the time required for the completion of this task. The method presented in this research can also be used for improving translation memory matches in computer-assisted translation tools and the performance of plagiarism detection tasks.

Keywords: multiple-choice questions; deep learning; doc2vec; SBERT; NLP for educational purposes; paragraph similarity

Bio: Halyna Maslak is an EM TTI cohort one student. She obtained her BA and MA in English Language and Literature at Chernivtsi National University in Ukraine. Within EM TTI, Halyna spent the first year at New Bulgarian University and the second year at the University of Wolverhampton. She is an experienced translator and teacher interested in computational linguistics and programming. Halyna is currently a first-cohort student in a Nanodegree summer programme in software development sponsored by world-known companies.

Profile picture of Elena Volkanovska

Elena Volkanovska

Recognition, Translation and Disambiguation of Multiword Expressions with Literal and Idiomatic Meaning in Parallel Corpora

This dissertation explores semantically ambiguous nominal and verbal multiword expressions (MWEs). The aim of the study is to present an end-to-end pipeline of automatic disambiguation, in the scope of which the MWEs are examined from three aspects: their frequency and distribution across seven English-German parallel corpora of various domains, the translation strategies that have been adopted for a sample of expressions, and the possibility of resolving ambiguity with dynamic word embeddings and lexical cohesion scores. First, a set of MWEs were selected for a dictionary-based MWE identification task, which resulted in the creation of a new corpus with MWE instances only. Then, a sample of in-context MWE examples was selected, on which I performed a qualitative linguistic analysis of translation strategies and a disambiguation experiment. The results of the translation analysis and the experiment performed on the sample corpus were compared to the output for the same source text of Google Translate (GT), a neural machine translation engine. The analysis shows that ambiguous MWEs are infrequent in parallel corpora, with most of the unique MWE types occurring less than 50 times. The most frequent translation strategies in the sample corpus were paraphrase and idioms that were either dissimilar or partially similar to the source-language expression. The GT output contained mostly idioms similar in form to the source-language expression, or literal translations. Finally, it was found that the disambiguation approach based on dynamic word embeddings and lexical cohesion scores performed well for low-frequency idioms, while GT did well in instances in which a literal translation of the source-language idiom had established itself as an idiomatic expression in the target language.

Keywords: Ambiguous MWEs, lexical cohesion scores, dynamic word embeddings, translation strategies.

Bio: Elena Volkanovska is a Cohort 1 EM TTI student who spent the first year of studies at the University of Wolverhampton and the second year at the University of Málaga. She holds a BA in English and German Language and Literature and an MA in Conference Interpreting. Her research interests include corpus building and management, and identification and translation of figurative language. The EM TTI dissertation research was conducted under the guidance of Dr Emad Mohamed and Dr María Rosario Bautista Zambrana.

Profile picture of Olha Makukh

Olha Makukh

State of the Art: Quality of Video Remote Interpreting

Abstract: Remote interpreting (RI) is becoming increasingly popular due to the rapid development of communication technology. Such popularity was further escalated by the outbreak of COVID-19 and the social restrictions associated with it. Therefore, the question of the quality in RI is one of the hot topics nowadays. This paper aims to analyse the quality of remote interpreting compared to on-site interpreting. It attempts to discover how the use of technology and the lack of social presence influence the quality of interpretation. To answer this question, a survey aimed at interpreters experienced in both modes is conducted. The purpose of the survey is to determine the quality that can be achieved remotely by assessing perceptions of professional interpreters regarding remote interpreting compared to on-site. It helps to determine the main challenges that interpreters are faced with when interpreting remotely.

Bio: Olha Makukh is a second-year student in the EM TTI program at the University of Malaga. Olha obtained her degree in Translation Studies at the Ivan Franko National University of Lviv in Ukraine (2013). Having experience as an interpreter and being interested in the way AI systems can aid interpreters in the process of interpretation, her research focuses on remote interpreting as a mode of delivering interpreting services.

Profile picture of Nadia Basciu

Nadia Basciu

HealthCOR: Implementing a Template for Discharge Documents (English-Italian-Spanish). A corpus-based study.

ABSTRACT Corpus Linguistics in now a firm discipline which has found great application as a methodology in other branches of Language Sciences, including Translation Studies. This Dissertation focuses in particular on the translation of medical texts, precisely documents of discharge, in three different languages: English, Italian and Spanish, by exploiting the advantages of Corpus Linguistics and corpora. The aims are therefore analysing discharge documents from a linguistic point of view by considering the text type, genre and register, in order to study their main characteristics according to each language of this study; then, proceeding with the compilation of an ad hoc comparable trilingual corpus (HealthCOR) which is representative in the abovementioned language combination; and finally, exploiting the HealthCOR corpus to implement three templates of discharge documents based on the study of the macro and microstructures of the three subcorpora by means of specific corpus management programmes. The methodology used to build the corpus involves a clear design criterion and a specific compilation protocol consisting of four steps: locating and accessing resources, downloading data, text formatting, and data storage, in order to verify a posteriori the representativeness of the HealthCOR corpus and to finally draft the three templates of discharge documents in English, Italian and Spanish. The data employed for compiling the HealthCOR corpus consists of documents of discharge that have been collected throughout a year with a manual research by contacting directly healthcare institutions located in United Kingdom, Spain and Italy and also retrieved online by setting specific parameters in the search engine. In conclusions, the draft of the three templates can be found at the end of this Dissertation as well as the demonstration that even corpora which are small in size can be representative and be used for achieving specific objectives. Keywords: Corpus Linguistics, comparable corpora, ad hoc corpus, trilingual corpus, Medical Translation, discharge summary, discharge letter, templates.

SHORT BIO My name is Nadia Basciu and I am from Italy. I hold a Bachelor’s Degree in Languages and Cultures for the Linguistic Mediation at the University of Cagliari (Italy) where I studied English and Spanish as second languages. I developed an interest in Translation when I undertook a few placements during my former studies and that it is also the reason why I applied for the EM TTI Master.