Category Archives: Uncategorized

Abu-MaTran at EAMT 2017

The Abu-MaTran project was present at the 20th Conference of the European Association for Machine Translation (EAMT 2017), held in Prague (Czech Republic).

Project researchers Miquel Esplà and Mikel Forcada from Universtitat d’Alacant and Antonio Toral from Dublin City University presented a poster about the Abu-MaTran project focusing on a set of selected project outcomes of interest to the conference audience.

In addition, Antonio presented a research paper co-authored with project researchers Filip Klubička from University of Zagreb and Víctor M. Sánchez-Cartagena from Prompsit Language Engineering entitled Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation.


Antonio (left) and Miquel (right) presenting Abu-MaTran’s selected final results poster.

Open position at Prompsit for Abu-MaTran

Join us for 2.5 months to work on data acquisition and machine translation!

Prompsit offers a short-time vacancy for an experienced researcher in natural language processing, particularly data acquisition and machine translation.

Candidates are asked to submit their applications by 23rd July 2016 to info at prompsit dot com containing applicant’s CV and a letter of motivation to apply for this vacancy.

Download the following document to see more information about the position: Prompsit’s recruitment

Don’t think it twice and join us this autumn!

prompsit

Workshop on Hybridisation of Machine Translation for Irish

Abu-MaTran, in conjunction with the Tapadóir project, hosted a workshop on Hybridisation of Machine Translation (MT) to build translation solutions for English–Irish at Dublin City University on the 29th of April 2016.

The workshop consisted of 3 talks from researchers working on MT for Irish (1 of which from the Tapadóir project), 2 talks on hybrid MT from researchers of the Abu-MaTran project, an invited talk on hybrid MT in the HyghTra project and a final session for open discussion. The workshop was well attended with 15 participants, attracting attendees also from outside Ireland (e.g. Wales), thus showing that the topic is of interest also for MT involving other Celtic languages.

Below we include the programme with links to the slides of each talk (where available) and the motivation to carry out this workshop.

Programme
09:30 Welcome. John Judge (ADAPT Centre, Dublin City University)
09:40 Tapadóir: Statistical Machine Translation for Irish. Meghan Dowling (ADAPT Centre, Dublin City University)
10:00 IRIS: English-Irish Translation Sytem. Mihael Arcan (Insight, NUI Galway)
10:20 Rule-based MT for Irish. Jim O’Regan (Trinity College Dublin)
10:40 Developing a Hybrid MT system from Rule-based architecture: Experience from HyghTra project. Invited talk by Bogdan Babych (University of Leeds, UK)
11:20 Coffee break
11:40 Hybrid MT in the Abu-MaTran project. Víctor M. Sánchez-Cartagena (Prompsit Language Engineering, Spain)
12:20 Hybridisation through system combination. Antonio Toral (ADAPT Centre, Dublin City University)
12:50 Open discussion
13:30 Final remarks

Motivation
While data driven, statistical approaches to MT have proven successful in rapidly developing translation solutions, these approaches do not recognise the inherent linguistic rules and structures of a language and the need for these to be considered when translating. High performing open source reference implementations of both data and rule driven MT exist in the Moses and Apertium platforms and much recent work has focused on combining the advances made in both paradigms into hybrid MT systems.

This workshop discussed recent advances in the state of the art in both rule-driven and statistics-based MT approaches to translate English text to Irish and presented a variety of hybridisation techniques which have been successfully developed and tested on other languages. The workshop included an open discussion session for researchers to consider how best to replicate the hybridisation successes for the English–Irish language pair and to propose next steps.

Organisers
John Judge (ADAPT Centre, Dublin City University)
Antonio Toral (ADAPT Centre, Dublin City University)

Apertium lecture course in Tarto

Project Researcher Tommi Pirinen jointly with Francis Tyers held a course on Rule-Based Machine Translation using Apertium as a part of dissemination and outreach activities.  The course was held to enrich materials, show new research to students alongside traditional apertium-based systems and to gather feedback.

20151113_165930

The course was a two-week intense learning course during which we taught and developed the creation of lexical data using the apertium framework and development of a baseline machine translation system for a number of local languages including Finnish, Estonian, Võru, North Saami as well as one for Malayalam and English.

20151113_170112

The course materials are available on apertium wiki.

Abu-MaTran at WMT15 Machine Translation and Quality Estimation Shared Tasks

Project researchers Raphaël Rubino and Miquel Esplà-Gomis represented the Abu-MaTran consortium at the Tenth Workshop on Statistical Machine Translation (WMT 15), co-located with EMNLP. We participated in two shared tasks (Machine Translation and Quality Estimation), and in both cases our submissions ranked first: Machine Translation for English-to-Finnish and Quality Estimation at word-level, respectively.

Raphaël presented the systems submitted by the Abu-MaTran consortium to the Machine Translation shared task. We participated in the Finnish–English language pair, in which we tackled the lack of resources and complex morphology of the Finnish language by (i) crawling parallel (FiEnWaC) and monolingual (FiWaC) data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Our submissions were the top performing English-to-Finnish unconstrained (according to all automatic metrics) and constrained (according to BLEU), and Finnish-to-English constrained (according to TER) systems.

Miquel presented the systems submitted to the Quality Estimation shared task. We participated in the word-level sub-task with a method that uses external sources of bilingual information as a black box to spot sub-segment correspondences between a source segment and the translation hypothesis produced by a machine translation system. We used two sources of bilingual information in our submissions: machine translation (Apertium and Google Translate) and the bilingual concordancer Reverso Context. Our system ranked first in the sub-task.

IMG-20150917-WA0000

Abu-MaTran at RANLP 2015

This week the Abu-MaTran project has also been to the RANLP 2015 (Recent Advances in Natural Language Processing) conference in Hissar (Bulgaria).

Nikola Ljubešić and Filip Klubička from the University of Zagreb are attending the conference and presenting a paper jointly written with another Abu-MaTran researcher, Miquel Esplà-Gomis from Universitat d’Alacant, which describes a tool for predicting inflectional paradigms for unknown words to be added to a morphological lexicon (conference proceedings). Much of the work done for the purposes of this paper has been done during the authors’ respective academia-to-industry secondments at Prompsit Language Engineering in Elx (Spain).

ranlp15

Abu-MaTran at the Machine Translation Marathon 2015

The Abu-MaTran project is present this week at the tenth Machine Translation Marathon in Prague (Czech Republic).

Project researcher Jorge Ferrández Tordera from Prompsit Language Engineering presents a poster about CloudLM, a novel tool that allows to use cloud-based language models in statistical machine translation systems. This paper is co-authored by project researchers Sergio Ortiz-Rojas and Antonio Toral. Most of the work leading to the publication was carried out during Jorge’s industry-to-academia secondment at Dublin City University.

mtm15_cloudlm_jferrandez

Second Workshop on Data Creation for Apertium

The second Workshop on data creation has been completed!

This workshop, entitled “Workshop on the Apertium free/open-source machine translation platform: transferring structures from one language to another”, took place on 22nd May 2015 at the University of Zagreb (Croatia) and aims at encouraging people to contribute to the Apertium platform for the creation of data (transfer rules) for South-Slavic languages.

Participants of the first workshop acquired an advanced knowledge of one of the most interesting modules inside Apertium: the transfer, where translation of structures from one language to another takes place.

These are the materials that were produced and used for the workshop:

– the workshop guide: for abumatran-apertium-workshop2-guide

– the workshop slides: for abumatran-apertium-workshop2-slides

All materials are distributed with free/open-source licenses. For a copy of the original files, just contact us.

If you use this materials, please, let us know. We will be glad to have some feedback from you.

Workshop on data creation for Apertium RBMT language pairs

We are back to you to share the materials of the Workshop on data creation created within the Abu-MaTran project as one of the outreach activities planned.

This workshop, entitled “Workshop on the Apertium free/open-source machine translation platform: basics on how to control the engine through linguistics”, took place in November 2014 at the University of Zagreb (Croatia) and aims at encouraging people to contribute to the Apertium platform for the creation of data (dictionaries and manually disambiguated corpora) for South-Slavic languages.

Newcomers and already Apertium contributors are targeted as testers of a new approach seeking to low the bar for contributions through experimental user interfaces.

We are making available:

– the workshop guide: for days 1 and 2

– the workshop slides: for day 1 and day 2

All materials are distributed with free/open-source licenses. For a copy of the original files, just contact us.

If you use this materials, please, let us know. We will be glad to have some feedback from you.

Talk by Gema Ramírez at UZ NLP circle

This afternoon, Gema Ramírez, from partner Prompsit, has been invited to give a talk at the NLP circle at the University of Zagreb that has a monthly meeting on the last Monday of each month.

More than 50 people among researchers, students and professionals will attend this talk about the free/open source machine translation platform Apertium and the company Gema’s managages, Prompsit, offering services related to this platform and other NLP services.

The slides of the talk if you are not able to make it, are available here.

Enjoy!

When: 27 October 2014, 17:00
Where: Meeting Room. 2nd floor. Faculty of Humanities and Social Sciences at the University of Zagreb
Tittle: The Apertium plaform: opportunities for research and business