Summer Placement Diaries

Darya's Summer Placement at the Open University of Catalonia

In June 2021, I had my first placement as a part of the EM TTI Programme at the Open University of Catalonia. The placement was supervised by Prof. Antoni Oliver González. At the time when we were invited to choose a placement provider, I was hoping to find something related to my dissertation so that I could have true insights into theory implementation processes as well as to benefit from the opportunity to practice with solving various NLP tasks. My dissertation topic is on domain-specific automatic term extraction, and I was glad to discover that Professor Antoni Oliver who was and still is working on this topic. He delivered several lectures during the academic year as a part of our invited talk series and I realized that this placement could become a unique opportunity to learn about the field that I was interested in, even before the placement had started.

Time planning and communication are key aspects of the efficient and productive internship – that’s why we agreed on a detailed schedule of the tasks and experiments that I had to carry out during these 4 weeks. As data constraints pose a significant challenge in NLP, we started looking for open-source comparable corpora available for the English – Russian language pair and Professor Antoni Oliver provided many recommendations and resources where I could look for the necessary data. He also advised on appropriate corpus pre-processing tools and libraries. Overall, my placement was very interactive and practice-oriented. The small size corpora that we have found was easy to process and I managed to quickly obtain the lists of terms and, thus, to compare the flaws and strengths of different methods, when applied to distant language pairs. Our regular meetings with the Supervisor and his immediate feedback helped me to stay on track with the goals as I could discuss the obtained results and change the strategies, if necessary.

My placement was very hands-on. There are many different approaches to domain-specific term extraction and my main intention was to try as many of them as possible considering the time constraints. First, we started experimenting with more conventional techniques and tools that allow extracting terms based on statistical and linguistic information using the TBXTools developed by Prof. Antoni Oliver González. We also had time to test more modern techniques, such as the application of word embeddings, in order to obtain more features and to increase the number of correctly extracted terms. This small-scale test was a reproduction of the actual future research that will serve as a foundation for my dissertation work; it allowed me to discover minor steps and details of smart research planning that I didn’t initially take into account.

I was very happy to test myself in a different setting where I could practice both programming skills gained in the first year of studies and apply my previous translation and linguistic knowledge while evaluating the obtained results.

I believe that such placements are incredibly valuable for networking, gaining experience, and ultimately helping students with their research and future work.

Post written by Darya Filippova