Project researcher Antonio Toral is giving a guest lecture at Dublin City University as part of the Digital World module.
Multimedia in Mobile-based Machine Translation (PDF slides)
Advances in recent years in the fields of machine translation and mobile computing have led to the long-awaited dream of having automatic translation in a handheld device to be feasible.
This lecture aims to give a general overview of the techniques behind machine translation and the technologies that allow to deal with multimedia, so that one can build mobile applications that translate not only from text but also from other types of media such as speech and image.
Time: 10.00, 30th October 2014
Venue: Room XG20, Science Buliding, Dublin City University
Project researcher Andy Way is giving an invited talk at Universitat d’Alacant.
Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminology with the extracted monolingual term lists. We manually evaluate our novel terminology extraction model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. Furthermore, we report the performance of our monolingual terminology extraction model comparing with a number of the state-of-the-art terminology extraction models on the English-to-Hindi datasets.
Time: 10.30, 12th September 2014,
Venue: Seminari de III cicle del Departament de Llenguatges i Sistemes Informàtics, Edifici Politècnica IV (Edifici 39), Mòdul 2, 1a planta
Project researcher Tommi Pirinen is giving a talk at Dublin City University as part of the NCLT seminar series.
Weighted finite-state methods as a bridge between strictly rule-based and mostly statistical nlp systems
As some of you may know, University of Helsinki is mostly known from its strictly rule-based approach to computational linguistics, with main contributions like TWOL system by Prof. Koskenniemi in 1983 and CG system by Prof. Karlsson 1995. In my doctoral dissertation I experimented with some basic approaches of combining statistical information to weighted finite-state models (cf. Openfst <http://www.openfst.org/> and Mohri’s academic papers) of language, esp. for morphologically complex languages with limited resources (e.g. Greenlandic).
The presentation will consist of some slides from my FSMNLP 2012 tutorial <http://www.helsinki.fi/~tapirine/publications/fsmnlp-2012-spelling-tutorial.pdf> and parts of my lectio praecursoria for my phd <http://www.helsinki.fi/~tapirine/publications/Pirinen-2014-dissertation.pdf>.
Time: 14.00, Thursday March 13th
Venue: L221 (School of Computing)
Project researcher Felipe Sánchez Martínez from Univesitat d’Alacant is giving an invited talk at Dublin City University as part of the NCLT Seminar Series:
Generalised alignment templates for the inference of shallow-transfer MT rules from small parallel corpora – Felipe Sánchez Martínez, Universitat d’Alacant
Rule-based machine translation (MT) is the paradigm of choice when the amount of bilingual resources available is not large enough to train a full-fledged statistical MT system. Building a rule-based MT system usually implies a considerable investment in the development of linguistics resources. However, even in those cases in which bilingual parallel corpora are scarce, automatic inference methods can be used to automatically infer structural transfer rules.
In this talk I will present the current developments at Universitat d’Alacant aimed at learning shallow-transfer MT rules from small parallel corpora for their used by the shallow-transfer MT platform Apertium. Inspired by the work by Sánchez-Martínez & Forcada (2009) we use alignment templates (AT), like those used in statistical MT, and overcomes the main limitations of their approach: the inability of finding the appropriate level of generalisation for the ATs from which rules are generated; the inability to perform context-dependent lexicalisations to be able to give a different treatment to those words that are incorrectly translated by more general ATs; and the deficient selection of the sequences of lexical categories for which transfer rules are generated. Preliminary experiments show that translation quality is improved as compared to the method by Sánchez-Martínez & Forcada (2009), and the number of inferred rules is considerably smaller.
Time: 3-4pm on Friday, November 29th
Venue: CG05 (Henry Grattan Building)
Project researcher Gema Ramírez Sánchez is giving an invited talk on July 17th at ILSP/Athena Research Centre
The Apertium Machine Translation platform and Prompsit: opportunities for research and business.
Apertium (http://www.apertium.org) is a free/open-source rule-based machine translation platform that provides tools for managing the linguistic data necessary to build a machine translation system for a given language pair, and linguistic data for a growing number of language pairs. The participation of engineers and linguists from Prompsit Language Engineering SA (http://www.prompsit.com) in the development of Apertium since its beginning 8 years ago, has made Prompsit an expert in this technology and its potential applications. We are currently developing and helping the development of more than 20 pairs of languages inside the Apertium platform. We are also developing hybrid systems combining rule-based approaches to MT with translation memories and statistical and example based MT systems, and we have been involved in the development of the parallel corpora crawler Bitextor (http://bitextor.sourceforge.net/). While machine translation has been the core business of Prompsit, an increasing demand on related technologies for named-entity or opinion classification has also brought new development and business opportunities to the company, which sees itself as a vehicle of marketing applied-research results as services.
Gema Ramírez-Sánchez is a translator, a computational linguist and the CEO of Prompsit Language Engineering, a spin-off company which was created inside the Transducens Research Group at the University of Alicante (http://transducens.dlsi.ua.es/) in 2006. Prompsit specialises in language technologies, mainly machine translation, information extraction and sentiment analysis. Gema is visiting ILSP in the framework of Abu-MaTran (Automatic Building of Machine Translation, https://www.abumatran.eu/), an FP7 Marie Curie Industry-Academia Partnerships and Pathways project that aims to enhance industry-academia cooperation in developing and exploiting Machine Translation technologies and resources. Prompsit and ILSP plan to exchange expertise in technologies for automatic acquisition of parallel and monolingual resources from the web.