Author Archives: Antonio Toral

Abu-MaTran at EAMT 2017

The Abu-MaTran project was present at the 20th Conference of the European Association for Machine Translation (EAMT 2017), held in Prague (Czech Republic).

Project researchers Miquel Esplà and Mikel Forcada from Universtitat d’Alacant and Antonio Toral from Dublin City University presented a poster about the Abu-MaTran project focusing on a set of selected project outcomes of interest to the conference audience.

In addition, Antonio presented a research paper co-authored with project researchers Filip Klubička from University of Zagreb and Víctor M. Sánchez-Cartagena from Prompsit Language Engineering entitled Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation.


Antonio (left) and Miquel (right) presenting Abu-MaTran’s selected final results poster.

Workshop on Hybridisation of Machine Translation for Irish

Abu-MaTran, in conjunction with the Tapadóir project, hosted a workshop on Hybridisation of Machine Translation (MT) to build translation solutions for English–Irish at Dublin City University on the 29th of April 2016.

The workshop consisted of 3 talks from researchers working on MT for Irish (1 of which from the Tapadóir project), 2 talks on hybrid MT from researchers of the Abu-MaTran project, an invited talk on hybrid MT in the HyghTra project and a final session for open discussion. The workshop was well attended with 15 participants, attracting attendees also from outside Ireland (e.g. Wales), thus showing that the topic is of interest also for MT involving other Celtic languages.

Below we include the programme with links to the slides of each talk (where available) and the motivation to carry out this workshop.

Programme
09:30 Welcome. John Judge (ADAPT Centre, Dublin City University)
09:40 Tapadóir: Statistical Machine Translation for Irish. Meghan Dowling (ADAPT Centre, Dublin City University)
10:00 IRIS: English-Irish Translation Sytem. Mihael Arcan (Insight, NUI Galway)
10:20 Rule-based MT for Irish. Jim O’Regan (Trinity College Dublin)
10:40 Developing a Hybrid MT system from Rule-based architecture: Experience from HyghTra project. Invited talk by Bogdan Babych (University of Leeds, UK)
11:20 Coffee break
11:40 Hybrid MT in the Abu-MaTran project. Víctor M. Sánchez-Cartagena (Prompsit Language Engineering, Spain)
12:20 Hybridisation through system combination. Antonio Toral (ADAPT Centre, Dublin City University)
12:50 Open discussion
13:30 Final remarks

Motivation
While data driven, statistical approaches to MT have proven successful in rapidly developing translation solutions, these approaches do not recognise the inherent linguistic rules and structures of a language and the need for these to be considered when translating. High performing open source reference implementations of both data and rule driven MT exist in the Moses and Apertium platforms and much recent work has focused on combining the advances made in both paradigms into hybrid MT systems.

This workshop discussed recent advances in the state of the art in both rule-driven and statistics-based MT approaches to translate English text to Irish and presented a variety of hybridisation techniques which have been successfully developed and tested on other languages. The workshop included an open discussion session for researchers to consider how best to replicate the hybridisation successes for the English–Irish language pair and to propose next steps.

Organisers
John Judge (ADAPT Centre, Dublin City University)
Antonio Toral (ADAPT Centre, Dublin City University)

Abu-MaTran at WMT15 Machine Translation and Quality Estimation Shared Tasks

Project researchers Raphaël Rubino and Miquel Esplà-Gomis represented the Abu-MaTran consortium at the Tenth Workshop on Statistical Machine Translation (WMT 15), co-located with EMNLP. We participated in two shared tasks (Machine Translation and Quality Estimation), and in both cases our submissions ranked first: Machine Translation for English-to-Finnish and Quality Estimation at word-level, respectively.

Raphaël presented the systems submitted by the Abu-MaTran consortium to the Machine Translation shared task. We participated in the Finnish–English language pair, in which we tackled the lack of resources and complex morphology of the Finnish language by (i) crawling parallel (FiEnWaC) and monolingual (FiWaC) data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Our submissions were the top performing English-to-Finnish unconstrained (according to all automatic metrics) and constrained (according to BLEU), and Finnish-to-English constrained (according to TER) systems.

Miquel presented the systems submitted to the Quality Estimation shared task. We participated in the word-level sub-task with a method that uses external sources of bilingual information as a black box to spot sub-segment correspondences between a source segment and the translation hypothesis produced by a machine translation system. We used two sources of bilingual information in our submissions: machine translation (Apertium and Google Translate) and the bilingual concordancer Reverso Context. Our system ranked first in the sub-task.

IMG-20150917-WA0000

Abu-MaTran at RANLP 2015

This week the Abu-MaTran project has also been to the RANLP 2015 (Recent Advances in Natural Language Processing) conference in Hissar (Bulgaria).

Nikola Ljubešić and Filip Klubička from the University of Zagreb are attending the conference and presenting a paper jointly written with another Abu-MaTran researcher, Miquel Esplà-Gomis from Universitat d’Alacant, which describes a tool for predicting inflectional paradigms for unknown words to be added to a morphological lexicon (conference proceedings). Much of the work done for the purposes of this paper has been done during the authors’ respective academia-to-industry secondments at Prompsit Language Engineering in Elx (Spain).

ranlp15

Abu-MaTran at the Machine Translation Marathon 2015

The Abu-MaTran project is present this week at the tenth Machine Translation Marathon in Prague (Czech Republic).

Project researcher Jorge Ferrández Tordera from Prompsit Language Engineering presents a poster about CloudLM, a novel tool that allows to use cloud-based language models in statistical machine translation systems. This paper is co-authored by project researchers Sergio Ortiz-Rojas and Antonio Toral. Most of the work leading to the publication was carried out during Jorge’s industry-to-academia secondment at Dublin City University.

mtm15_cloudlm_jferrandez

Guest lecture by Antonio Toral at Dublin City University

Project researcher Antonio Toral is giving a guest lecture at Dublin City University as part of the Digital World module.

Multimedia in Mobile-based Machine Translation (PDF slides)

Advances in recent years in the fields of machine translation and mobile computing have led to  the long-awaited dream of having automatic translation in a handheld device to be feasible.
This lecture aims to give a general overview of the techniques behind machine translation and the technologies that allow to deal with multimedia, so that one can build mobile applications that translate not only from text but also from other types of media such as speech and image.

Time: 10.00, 30th October 2014
Venue: Room XG20, Science Buliding, Dublin City University

Three highlights of the Abu-MaTran project at the mid-term review

We have prepared a short visual presentation concerning three highlights of the Abu-MaTran project when we arrive at the mid-term review. These highlights cover the following topics:

Talk by Andy Way at Universitat d’Alacant

Project researcher Andy Way is giving an invited talk at Universitat d’Alacant.

Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation

Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminology with the extracted monolingual term lists. We manually evaluate our novel terminology extraction model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. Furthermore, we report the performance of our monolingual terminology extraction model comparing with a number of the state-of-the-art terminology extraction models on the English-to-Hindi datasets.

Time: 10.30, 12th September 2014,
Venue: Seminari de III cicle del Departament de Llenguatges i Sistemes Informàtics, Edifici Politècnica IV (Edifici 39), Mòdul 2, 1a planta

 

Talk by Mikel Artetxe at Dublin City University

Mikel Artetxe from the University of the Basque Country will be giving an invited talk as part of the NCLT Seminar Series.

Mitzuli: offline machine translation on a mobile phone

Mobile platforms are changing the way in which people interact with technology, and they offer a whole new world of possibilities to make something like machine translation more useful for the general public. This talk is about Mitzuli, a translator app for Android that includes support for OCR, TTS and a full offline mode. The challenges of creating something like this will be presented in the talk, analyzing the main features and restrictions of these mobile platforms when compared to the traditional desktop platforms. As an example of this, we will see how Apertium, the RBMT system that Mitzuli is based on, was ported to Android.

Time: 3pm, Tuesday June 10th
Venue: S206 (Engineering)

Talk by Tommi Pirinen at Dublin City University

Project researcher Tommi Pirinen is giving a talk at Dublin City University as part of the NCLT seminar series.

Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems

As some of you may know, University of Helsinki is mostly known from its strictly rule-based approach to computational linguistics, with main contributions like TWOL system by Prof. Koskenniemi in 1983 and CG system by Prof. Karlsson 1995. In my doctoral dissertation I experimented with some basic approaches of combining statistical information to weighted finite-state models (cf. Openfst <http://www.openfst.org/> and Mohri’s academic papers) of language, esp. for morphologically complex languages with limited resources (e.g. Greenlandic).

The presentation will consist of some slides from my FSMNLP 2012 tutorial <http://www.helsinki.fi/~tapirine/publications/fsmnlp-2012-spelling-tutorial.pdf> and parts of my lectio praecursoria for my phd <http://www.helsinki.fi/~tapirine/publications/Pirinen-2014-dissertation.pdf>.

Time: 14.00, Thursday March 13th
Venue: L221 (School of Computing)