Abu-MaTran at EAMT 2017

The Abu-MaTran project was present at the 20th Conference of the European Association for Machine Translation (EAMT 2017), held in Prague (Czech Republic).

Project researchers Miquel Esplà and Mikel Forcada from Universtitat d’Alacant and Antonio Toral from Dublin City University presented a poster about the Abu-MaTran project focusing on a set of selected project outcomes of interest to the conference audience.

In addition, Antonio presented a research paper co-authored with project researchers Filip Klubička from University of Zagreb and Víctor M. Sánchez-Cartagena from Prompsit Language Engineering entitled Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation.


Antonio (left) and Miquel (right) presenting Abu-MaTran’s selected final results poster.

Workshop on Statistical Machine Translation for Curious Translators

We are glad to share with you the materials of the Workshop on Statistical Machine Translation for Curious Translators created within the Abu-MaTran project as one of the outreach activities planned.

This workshop took place in December 2016 at Universitat d’Alacant (Spain) and aimed at making translators and students understand how statistical machine translation works with the help of two tools: Bicrawler, a service that allows you to easily obtain bilingual data from multilingual websites, and MTradumàtica, a web interface with which you can train and test phrase-based statistical machine translation systems in a couple of clicks. This workshop builds upon the Workshop on Tools for Teaching Machine Translation that took place at Dublin City University last November.

We are making available:

– the workshop guide
– the workshop slides

All materials are distributed with free/open-source licenses. For a copy of the original files, just e-mail us as indicated in the PDF files.

If you use this materials, please, let us know. We will be glad to have some feedback from you.

Workshop on Tools for Teaching Machine Translation

We are glad to share with you the materials of the Workshop on Tools for Teaching Machine Translation created within the Abu-MaTran project as one of the outreach activities planned.

This workshop took place in November 2016 at Dublin City University (Ireland) and aimed at presenting two web-based tools that make teaching machine translation easier: Bicrawler, a service that allows you to easily obtain bilingual data from multilingual websites, and MTradumàtica, a web interface with which you can train and test phrase-based statistical machine translation systems in a couple of clicks.

We are making available:

the workshop guide
the workshop slides 

All materials are distributed with free/open-source licenses. For a copy of the original files, just e-mail us as indicated in the PDF files.

If you use this materials, please, let us know. We will be glad to have some feedback from you.

Abu-MaTran at the Machine Translation Marathon 2016

The Abu-MaTran project was present at the eleventh Machine Translation Marathon in Prague (Czech Republic).

Project researcher Víctor M. Sánchez-Cartagena from Prompsit Language Engineering presented a poster about ruLearn, a novel tool for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora. Part of the work leading to the publication was carried out during Víctor’s industry-to-academia secondment at Universitat d’Alacant.

2016-09-15 12.33.55

Abu-MaTran at WMT16 Machine Translation Shared Task

Project researcher Víctor M. Sánchez-Cartagena represented the Abu-MaTran consortium at the Machine Translation shared task of the First Conference on Machine Translation (WMT 16), co-located with ACL. Our submission ranked first for the language pair in which we participated: English-to-Finnish.

Víctor presented the systems submitted by the Abu-MaTran consortium to the Machine Translation shared task. We applied morphological segmentation and deep learning in order to address (i) the data scarcity problem caused by the lack of in-domain parallel data in the constrained task and (ii) the complex morphology of Finnish. Our neural machine translation and system combination submissions were the top performing English-to-Finnish constrained submissions according to all automatic metrics and human evaluation.

 

IMG_20160811_110712

Open position at Prompsit for Abu-MaTran

Join us for 2.5 months to work on data acquisition and machine translation!

Prompsit offers a short-time vacancy for an experienced researcher in natural language processing, particularly data acquisition and machine translation.

Candidates are asked to submit their applications by 23rd July 2016 to info at prompsit dot com containing applicant’s CV and a letter of motivation to apply for this vacancy.

Download the following document to see more information about the position: Prompsit’s recruitment

Don’t think it twice and join us this autumn!

prompsit

Workshop on Hybridisation of Machine Translation for Irish

Abu-MaTran, in conjunction with the Tapadóir project, hosted a workshop on Hybridisation of Machine Translation (MT) to build translation solutions for English–Irish at Dublin City University on the 29th of April 2016.

The workshop consisted of 3 talks from researchers working on MT for Irish (1 of which from the Tapadóir project), 2 talks on hybrid MT from researchers of the Abu-MaTran project, an invited talk on hybrid MT in the HyghTra project and a final session for open discussion. The workshop was well attended with 15 participants, attracting attendees also from outside Ireland (e.g. Wales), thus showing that the topic is of interest also for MT involving other Celtic languages.

Below we include the programme with links to the slides of each talk (where available) and the motivation to carry out this workshop.

Programme
09:30 Welcome. John Judge (ADAPT Centre, Dublin City University)
09:40 Tapadóir: Statistical Machine Translation for Irish. Meghan Dowling (ADAPT Centre, Dublin City University)
10:00 IRIS: English-Irish Translation Sytem. Mihael Arcan (Insight, NUI Galway)
10:20 Rule-based MT for Irish. Jim O’Regan (Trinity College Dublin)
10:40 Developing a Hybrid MT system from Rule-based architecture: Experience from HyghTra project. Invited talk by Bogdan Babych (University of Leeds, UK)
11:20 Coffee break
11:40 Hybrid MT in the Abu-MaTran project. Víctor M. Sánchez-Cartagena (Prompsit Language Engineering, Spain)
12:20 Hybridisation through system combination. Antonio Toral (ADAPT Centre, Dublin City University)
12:50 Open discussion
13:30 Final remarks

Motivation
While data driven, statistical approaches to MT have proven successful in rapidly developing translation solutions, these approaches do not recognise the inherent linguistic rules and structures of a language and the need for these to be considered when translating. High performing open source reference implementations of both data and rule driven MT exist in the Moses and Apertium platforms and much recent work has focused on combining the advances made in both paradigms into hybrid MT systems.

This workshop discussed recent advances in the state of the art in both rule-driven and statistics-based MT approaches to translate English text to Irish and presented a variety of hybridisation techniques which have been successfully developed and tested on other languages. The workshop included an open discussion session for researchers to consider how best to replicate the hybridisation successes for the English–Irish language pair and to propose next steps.

Organisers
John Judge (ADAPT Centre, Dublin City University)
Antonio Toral (ADAPT Centre, Dublin City University)

Apertium lecture course in Tarto

Project Researcher Tommi Pirinen jointly with Francis Tyers held a course on Rule-Based Machine Translation using Apertium as a part of dissemination and outreach activities.  The course was held to enrich materials, show new research to students alongside traditional apertium-based systems and to gather feedback.

20151113_165930

The course was a two-week intense learning course during which we taught and developed the creation of lexical data using the apertium framework and development of a baseline machine translation system for a number of local languages including Finnish, Estonian, Võru, North Saami as well as one for Malayalam and English.

20151113_170112

The course materials are available on apertium wiki.

Abu-MaTran at WMT15 Machine Translation and Quality Estimation Shared Tasks

Project researchers Raphaël Rubino and Miquel Esplà-Gomis represented the Abu-MaTran consortium at the Tenth Workshop on Statistical Machine Translation (WMT 15), co-located with EMNLP. We participated in two shared tasks (Machine Translation and Quality Estimation), and in both cases our submissions ranked first: Machine Translation for English-to-Finnish and Quality Estimation at word-level, respectively.

Raphaël presented the systems submitted by the Abu-MaTran consortium to the Machine Translation shared task. We participated in the Finnish–English language pair, in which we tackled the lack of resources and complex morphology of the Finnish language by (i) crawling parallel (FiEnWaC) and monolingual (FiWaC) data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Our submissions were the top performing English-to-Finnish unconstrained (according to all automatic metrics) and constrained (according to BLEU), and Finnish-to-English constrained (according to TER) systems.

Miquel presented the systems submitted to the Quality Estimation shared task. We participated in the word-level sub-task with a method that uses external sources of bilingual information as a black box to spot sub-segment correspondences between a source segment and the translation hypothesis produced by a machine translation system. We used two sources of bilingual information in our submissions: machine translation (Apertium and Google Translate) and the bilingual concordancer Reverso Context. Our system ranked first in the sub-task.

IMG-20150917-WA0000

Abu-MaTran at RANLP 2015

This week the Abu-MaTran project has also been to the RANLP 2015 (Recent Advances in Natural Language Processing) conference in Hissar (Bulgaria).

Nikola Ljubešić and Filip Klubička from the University of Zagreb are attending the conference and presenting a paper jointly written with another Abu-MaTran researcher, Miquel Esplà-Gomis from Universitat d’Alacant, which describes a tool for predicting inflectional paradigms for unknown words to be added to a morphological lexicon (conference proceedings). Much of the work done for the purposes of this paper has been done during the authors’ respective academia-to-industry secondments at Prompsit Language Engineering in Elx (Spain).

ranlp15