Publications

2017

  1. Antonio Toral and Víctor M. Sánchez-Cartagena.
    A Multifaceted Evaluation of Neural versus Statistical Machine Translation for 9 Language Directions.
    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017).
  2. Filip Klubička, Antonio Toral and Víctor M. Sánchez-Cartagena.
    Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation.
    The Prague Bulletin of Mathematical Linguistics.
  3. Arefeh Kazemi, Antonio Toral, Andy Way, Amirhassan Monadjemi and Mohammadali Nematbakhsh.
    Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation.
    Expert Systems with Applications.

2016

    1. Miquel Esplà-Gomis.
      Using sources of bilingual information for word-level quality estimation in translation technologies.
      PhD thesis, Universitat d’Alacant.
      [BibTeX | PDF]
    2. Miquel Esplà-Gomis, Rafael C. Carrasco, Víctor M. Sánchez-Cartagena, Mikel L. Forcada, Felipe Sánchez-Martínez, Juan Antonio Pírez-Ortiz.
      Assisting non-expert speakers of under-resourced languages in assigning stems and inflectional paradigms to new word entries of morphological dictionaries.
      In Language Resources and Evaluation, online first, DOI: 10.1007/s10579-016-9360-9.
      [BibTeX | PDF]
    3. Miquel Esplà-Gomis, Mikel L. Forcada, Sergio Ortiz Rojas, Jorge Fernández-Tordera.
      Bitextor’s participation in WMT’16: shared task on document alignment.
      In Proceedings of the First Conference in Machine Translation (WMT 2016).
      [BibTeX | PDF]
    4. Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Mikel L. Forcada.
      UAlacant word-level and phrase-level machine translation quality estimation systems at WMT 2016.
      In Proceedings of th eFirst Conference in Machine Translation (WMT 2016).
      [BibTeX | PDF]
    5. Jorge Ferrández-Tordera, Sergio Ortiz-Rojas, Antonio Toral.
      CloudLM: a Cloud-based Language Model for Machine Translation.
      The Prague Bulletin of Mathematical Linguistics.
      [BibTeX | PDF]
    6. Mikel L. Forcada, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz.
      Stand-off Annotation of Web Content as a Legally Safer Alternative to Crawling for Distribution.
      Baltic Journal of Modern Computing.
      [BibTeX | PDF]
    7. Arefeh Kazemi, Antonio Toral, Andy Way.
      Using Wordnet to Improve Reordering in Hierarchical Phrase-Based Statistical Machine Translation.
      In Proceedings of the 8th Global WordNet Conference.
      [BibTeX| PDF]
    8. Filip Klubička, Gema Ramírez-Sánchez, Nikola Ljubešić.
      Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian.
      Baltic Journal of Modern Computing.
      [BibTeX | PDF]
    9. Nikola Ljubešić, Tomaž Erjavec, Darja Fišer, Tanja Samardžić, Maja Miličević, Filip Klubička, Filip Petkovski.
      Easily accessible language technologies for Slovene, Croatian and Serbian.
      Language Technologies and Digital Humanities Conference.
      [BibTeX | PDF]
    10. Nikola Ljubešić, Miquel Esplà-Gomis, Antonio Toral, Sergio Ortiz Rojas, Filip Klubička.
      Producing Monolingual and Parallel Web Corpora at the Same Time – SpiderLing and Bitextor’s Love Affair.
      In Proceedings of the Tenth International Conference on Language Resources and Evaluation.
      [BibTeX | PDF]
    11. Nikola Ljubešić, Filip Klubička, Željko Agić, Ivo-Pavao Jazbec.
      New inflectional lexicons and training corpora for improved morphosyntactic annotation of Croatian and Serbian.
      In Proceedings of the Tenth International Conference on Language Resources and Evaluation.
      [BibTeX | PDF]
    12. Vassilis Papavassiliou, Prokopis Prokopidis, Stelios Piperidis.
      The ILSP/ARC submission to the WMT 2016 Bilingual Document Alignment Shared Task.
      In Proceedings of the First Conference in Machine Translation (WMT 2016).
      [BibTeX | PDF]
    13. Tommi A Pirinen, Antonio Toral, Raphael Rubino.
      Rule-Based and Statistical Morph Segments in English-to-Finnish SMT.
      In Proceedings of the 2nd International Workshop on Computational Linguistics for Uralic Languages.
      [BibTeX| PDF]
    14. Maja Popović, Mihael Arcan, Filip Klubička.
      Language Related Issues for Machine Translation between Closely Related South Slavic Languages.
      Third workshop for NLP for Similar Languages, Varieties and Dialects, COLING.
      [BibTeX | PDF]
    15. Prokopis Prokopidis, Vassilis Papavassiliou, Stelios Piperidis.
      Parallel Global Voices: a Collection of Multilingual Corpora with Citizen Media Stories
      In Proceedings of the Tenth International Conference on Language Resources and Evaluation.
      [BibTeX | PDF]
    16. Iñaki San Vicente, Iñaki Alegria, Cristina España-Bonet, Pablo Gamallo, Hugo Gonçalo Oliveira, Eva Martinez Garcia, Antonio Toral, Arkaitz Zubiaga.
      TweetMT: A parallel microblog corpus.
      In Proceedings of the Tenth International Conference on Language Resources and Evaluation.
      [BibTeX | PDF]
    17. Víctor M. Sánchez Cartagena, Nikola Ljubešić, Filip Klubička.
      Dealing with data sparseness in SMT with factored models and morphological expansion: a Case Study on Croatian.
      Baltic Journal of Modern Computing.
      [BibTeX | PDF]
    18. Víctor M. Sánchez-Cartagena, Juan A. Pérez-Ortiz, Felipe Sánchez-Martínez.
      Integrating rules and dictionaries from shallow-transfer machine translation into phrase-based statistical machine translation
      Journal of Artificial Intelligence Research.
      [BibTeX | PDF]
    19. Víctor M. Sánchez Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez.
      RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation.
      The Prague Bulletin of Mathematical Linguistics.
      [BibTeX | PDF]
    20. Víctor M. Sánchez Cartagena and Antonio Toral.
      Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences.
      In Proceedings of the First Conference in Machine Translation (WMT 2016).
      [BibTeX | PDF]
    21. Antonio Toral, Miquel Esplà-Gomis, Filip Klubička, Nikola Ljubešić, Vassilis Papavassiliou, Prokopis Prokopidis, Raphaël Rubino, Andy Way.
      Crawl and crowd to bring machine translation to under-resourced languages.
      Language Resources and Evaluation, online first, DOI: 10.1007/s10579-016-9363-6.
      [BibTeX | PDF]
    22. Antonio Toral; Raphael Rubino; Gema Ramírez-Sánchez.
      Re-assessing the Impact of SMT Techniques with Human Evaluation: a Case Study on English—Croatian.
      Baltic Journal of Modern Computing.
      [BibTeX | PDF]

2015

  1. Željko Agić, Nikola Ljubešić
    Universal Dependencies for Croatian (that Work for Serbian, too)
    In The 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015).
  2. Tomaž Erjavec, Nikola Ljubešić, Nataša Logar,
    The slWaC Corpus of the SloveneWeb
    In Informatica
  3. Miquel Esplà-Gomis Felipe Sánchez-Martínez, Mikel L. Forcada
    UAlacant word-level machine translation quality estimation system at WMT 2015
    In In Proceedings of the Tenth Workshop on Statistical Machine Translation.
  4. Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Mikel L. Forcada
    Using machine translation to provide target-language edit hints in computer aided translation based on translation memories
    in Journal of Artificial Intelligence Research
  5. Miquel Esplà-Gomis, Felipe Sánchez-Martínez, and Mikel L. Forcada.
    Using on-line available sources of bilingual information for word-level machine translation quality estimation.
    In Proceedings of the Eighteenth Annual Conference of the European Association for Machine Translation (EAMT).
    [PDF | BibTeX]
  6. Nikola Ljubešić, Kaja Dobrovoljc, Darja Fišer
    * MWELex–MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora

    In Informatica.
  7. Nikola Ljubešić, Miquel Esplà-Gomis, Filip Klubička, and Nives Mikelić Preradović.
    Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons. In Proceedings of Recent Advances in Natural Language Processing (RANLP).
  8. Nikola Ljubešić, Denis Kranjčić,
    Discriminating Between Closely Related Languages on Twitter
    In Informatica.
  9. Pavel Pecina, Antonio Toral, Vassilis Papavassiliou, Prokopis Prokopidis, Aleš Tamchyna, Andy Way, Josef van Genabith
    Domain adaptation of statistical machine translation with domain-focused web crawling
    In Language Resources and Evaluation Journal.
    [BibTeX | PDF]
  10. Raphael Rubino, Miquel Esplà-Gomis, Antonio Toral, Vasilis Papavasiliou and Prokopis Prokopidis. DIY Domain Specific Parallel Corpora for Translators. In Proceedings of the IV International Conference on Corpus Use and Learning to Translate (CULT).
  11. Raphael Rubino, Tommi Pirinen, Miquel Espla-Gomis, N Ljubešic, Sergio Ortiz Rojas, Vassilis Papavassiliou, Prokopis Prokopidis, Antonio Toral
    Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling 
    In Proceedings of the Tenth Workshop on Statistical Machine Translation
  12. Víctor M. Sánchez-Cartagena, Juan A. Pérez-Ortiz, Felipe Sánchez-Martínez.
    A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora
    In Computer Speech & Language (Special Issue on Hybrid Machine Translation).
    [BibTeX | PDF]
  13. Antonio Toral, Pavel Pecina, Longyue Wang, Josef van Genabith
    Linguistically-augmented Perplexity-based Data Selection for Language Models
    In Computer Speech and Language (Special Issue on Hybrid Machine Translation).
    [BibTeX | PDF]
  14. Antonio Toral, Tommi Pirinen, Andy Way, Raphaël Rubino, Gema Ramírez-Sánchez, Sergio
    Ortiz-Rojas, Víctor Sánchez-Cartagena, Jorge Ferrández-Tordera
    Automatic Acquisition of Machine Translation Resources in the Abu-MaTran Project
    In Procesamiento del Lenguaje Natural Journal
  15. Antonio Toral, Andy Way,
    Translating Literary Text between Related Languages using SMT
    in Workshop on Computational Linguistics for Literature, NAACL
  16. Antonio Toral, Xiaofeng Wu, Tommi Pirinen, Zhengwei Qiu, Ergun Bicici, Jinhua Du.
    Dublin city university at the tweetmt 2015 shared task 
    In TweetMT@ SEPLN.
  17. Forcada, M.L., Sánchez-Martínez, F.
    A general framework for minimizing translation effort: towards a principled combination of translation technologies in computer-aided translation,
    in Proceedings of EAMT 2015, The Eigtheenth Annual Conference of the European Association for Machine Translation.

2014

  1. Željko Agić, Nikola Ljubešić
    The SETimes.HR Linguistically Annotated Corpus of Croatian
    In Proceedings of the 9th Language Resources and Evaluation Conference (LREC).
    [BibTeX | PDF]
  2. Miquel Esplà-Gomis, Filip Klubička, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis
    Comparing Two Acquisition Systems for Automatically Building an English-Croatian Parallel Corpus from Multilingual Websites
    In Proceedings of the 9th Language Resources and Evaluation Conference (LREC).
    [BibTeX | PDF]
  3. Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan A. Pérez-Ortiz, Felipe Sánchez-Martínez, Mikel L. Forcada, Rafael C. Carrasco
    An Efficient Method to Assist Non-expert Users in Extending Dictionaries by Assigning Stems and Inflectional Paradigms to Unknown Words
    In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT).
    [BibTeX | PDF]
  4. Filip Klubička, Nikola Ljubešić
    Using crowdsourcing in building a morphosyntactically annotated and lemmatized silver standard corpus of Croatian
    To appear in Language Technologies conference.
    [BibTeX | PDF]
  5. Nikola Ljubešić, Darja Fišer, Tomaž Erjavec
    TweetCaT: A Tool for Building Twitter Corpora of Smaller Languages
    In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC).
    [BibTeX | PDF]
  6. Nikola Ljubešić, Filip Klubička
    {bs,hr,sr}WaC — Web Corpora of Bosnian, Croatian and Serbian
    In Proceedings of the 9th Workshop Web as Corpus (WaC).
    [BibTeX | PDF]
  7. Nikola Ljubešić, Denis Kranjčić
    Discriminating between VERY similar languages among Twitter users
    To appear in Language Technologies conference.
    [BibTeX | PDF]
  8. Nikola Ljubešić and Antonio Toral
    caWaC – A Web Corpus of Catalan and its Application to Language Modeling and Machine Translation
    In Proceedings of the 9th Language Resources and Evaluation Conference (LREC).
    [BibTeX | PDF]
  9. Maria Mouroutsou, Vassilis Papavassiliou
    Developing Thesauri of Historical Period Names from Web Acquired Data
    In Proceedings of the 11th International Conference on Greek Linguistics
    [BibTeX | PDF]
  10. Maja Popović, Nikola Ljubešić
    Exploring cross-language statistical machine translation for closely related South Slavic languages
    To appear in LT4CloseLang.
    [BibTeX | PDF]
  11. Prokopis Prokopidis, Harris Papageorgiou
    Experiments for Dependency Parsing of Greek
    In Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages (SPMRL-SANCL).
    [BibTeX | PDF]
  12. Raphael Rubino, Antonio Toral, Nikola Ljubešić and Gema Ramírez-Sánchez
    Quality Estimation for Synthetic Parallel Data Generation
    In Proceedings of the 9th Language Resources and Evaluation Conference (LREC).
    [BibTeX | PDF]
  13. Raphael Rubino, Antonio Toral, Victor M. Sánchez-Cartagena, Jorge Ferrández-Tordera, Sergio Ortiz-Rojas, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Andy Way
    Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
    In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT).
    [BibTeX | PDF]
  14. Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
    The UA-Prompsit hybrid machine translation system for the 2014 Workshop on Statistical Machine Translation
    In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT).
    [BibTeX | PDF]
  15. Antonio Toral
    TLAXCALA: A Multilingual Corpus of Independent News
    In Proceedings of the 9th Language Resources and Evaluation Conference (LREC).
    [BibTeX | PDF]
  16. Antonio Toral, Guillermo Latour, Stanislav Gurevich, Mikel Forcada, Gema Ramírez-Sánchez
    Establishing a Linguistic Olympiad in Spain, Year 1
    Procesamiento de Lenguaje Natural Journal No 53, 2014, pp. 171-174. ISSN 1135-5948.
    [BibTeX | PDF]
  17. Antonio Toral, Raphael Rubino, Miquel Esplà-Gomis, Tommi Pirinen, Andy Way, Gema Ramírez-Sánchez
    Extrinsic Evaluation of Web-Crawlers in Machine Translation: a Case Study on Croatian–English for the Tourism Domain
    In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT).
    [BibTeX | PDF]

2013

  1. Željko Agić, Nikola Ljubešić, Danijela Merkler
    Lemmatization and Morphosyntactic Tagging of Croatian and Serbian
    In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing.
    [BibTeX | PDF]
  2. Sudip Kumar Naskar, Antonio Toral, Federico Gaspari, Declan Groves
    Meta-Evaluation of a Diagnostic Quality Metric for Machine Translation
    In Proceedings of the XIV Machine Translation Summit (MT Summit).
    [BibTeX | PDF]
  3. Nikola Ljubešić, Darja Fišer
    Identifying False Friends Between Closely Related Languages
    In Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing.
    [BibTeX | PDF]
  4. Vassilis Papavassiliou, Prokopis Prokopidis, Gregor Thurmair
    A Modular Open-source Focused Crawler for Mining Monolingual and Bilingual Corpora from the Web
    In Proceedings of the 6th Workshop on Building and Using Comparable Corpora (BUCC).
    [BibTeX | PDF]
  5. Raphael Rubino, Antonio Toral, Santiago Cortés Vaíllo, Jun Xie, Xiaofeng Wu, Stephen Doherty, Qun Liu
    The CNGL-DCU-Prompsit Translation Systems for WMT13
    In Proceedings of the Eighth Workshop on Statistical Machine Translation (WMT).
    [BibTeX | PDF]
  6. Antonio Toral
    Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
    In Proceedings of the Second Workshop on Hybrid Approaches to Translation (HyTra).
    [BibTeX | PDF]
  7. Antonio Toral, Sudip Kumar Naskar, Joris Vreeke, Federico Gaspari, Declan Groves
    A Web Application for the Diagnostic Evaluation of Machine Translation over Specific Linguistic Phenomena
    In Proceedings of the 2013 NAACL HLT Demonstration Session.
    [BibTeX | PDF]