TC42, Student Presentations

Conferences

24/11/2020

TC42, Student Presentations

Translating and the Computer: TC42 Online

18, 19, 20 November 2020

This year the long standing and world renowned conference – Translating and the Computer – has moved to a fully online delivery. As it has, very generously been free to access for all, our students were encouraged to sign up and attend. Expect a number of conference reports over the coming week.

Three of our students have had papers accepted to the conference. Congratulations to our three students: Anna, Daniya and Marina.

Their papers are detailed below.

Screenshot of Anna Iankovskaia's presentation

Screenshot of Daniya Khamidullina's presentation

Screenshot of Marina Tonkopeeva's presentation

Anna Iankovskaia

The sources of text complexity for NMT

Abstract:

The present project attempts (1) to analyse the sources of text complexity for NMT on the following three levels – conceptual, lexical, syntactic, and (2) to build a transformer-based algorithm able to process a text and predict places in it where NMT is most likely to fail.

WordNet is used as the main tool for analysing the conceptual complexity whereas the search for the sources of lexical and syntactic complexity is built as a manual text analysis. As for the neural implementation of the algorithm, BERT is used as a pre-trained model for further fine-tuning. As the result of the project, it is expected to have a program able to estimate the complexity of the source text for NMT and give a preliminary idea about the quality of the MT output and the required level of post-editing prior to the text translation.

The project considers the translation of news commentary from English to Russian with two NMT engines freely available online. It is expected to present preliminary results – the conclusions from the three-level analysis and general methodology – within the framework of the conference.

Daniya Khamidullina

How can terminology extraction and management technology help language professionals in broadcast media?

Abstract

Modern broadcast media is characterised by a high degree of internationalisation. As a result, interpreters have become an integral part of multilingual news production teams. Although media interpreting is a highly technology-reliant activity, digital solutions aimed at facilitating this process for interpreters still seem to be relatively scarce. This paper proposes a prototype of a digital tool to assist the interpreter at different stages of the process. The tool may also be applicable beyond the realm of interpreting by providing terminology-related solutions to other language specialists on the news production team (translators, news writers, output editors, etc.).

The prototype covers the interpreter’s needs during assignment preparation, in the booth and for post-assignment de-briefings. It includes components for terminology extraction, terminology management and, provisionally, a speech recognition solution to be used in the booth. Special focus is placed on the terminology extraction component of the tool; results of several approaches tested using available solutions (by Sketch Engine and Terminotix) in the Russian‑English language pair are presented. Pilot terminology extraction tests were run on a 268.000‑word ad-hoc bilingual corpus, and the results indicate that pre-processed thematically arranged subcorpora appear to yield the most usable output that requires the least amount of processing.

Marina Tonkopeeva

Investigating interpreting and translation strategies: A corpus-based approach

Abstract

The present paper examines strategies used by interpreters and translators to communicate verbal and deverbal nouns in the process of simultaneous interpreting and translation from English into Russian. In order to achieve the objectives of this study, the compilation of a parallel intermodal corpus of English source speeches, transcriptions of correspondent simultaneous interpreting and translation, is necessary.

The UN Web TV is used as the source of the speeches and their interpretations. The oral data for the corpus were transcribed with the use of speech recognition software (YouTube automatic captioning and iOS speech recognition tool), and then manually post-edited. The corpus is offered as part of the SketchEngine platform. The analysis of the corpus was based on tracking the differences in the linguistic strategies used in simultaneous and written modes of translation: lexicosemantic, syntactic and lexicosyntactic transformations. This is a study in progress which plans to enlarge the corpus, the scope of material analysed and envisages the employment of Natural Language Processing techniques.

EM TTI

TC42, Student Presentations

TC42, Student Presentations

Translating and the Computer: TC42 Online

18, 19, 20 November 2020

Anna Iankovskaia

The sources of text complexity for NMT

Daniya Khamidullina

How can terminology extraction and management technology help language professionals in broadcast media?

Marina Tonkopeeva

Investigating interpreting and translation strategies: A corpus-based approach

Latest news