Creating a Multilingual Open Corpus of Academic Knowledge from OJS Journals

Authors

  • Pilar Sánchez-Gijón Universitat Autònoma de Barcelona
  • Ramon Piqué Universitat Autònoma de Barcelona

Abstract

Languages, like knowledge, are part of the intangible cultural heritage of humanity. The aim of the Tradumàtica Research Groupis is to compile multilingual texts (originals and translations) that journals publish in open source in order to create a corpus for each language combination that can feed one or more machine translation (MT) engines. These engines can link minority and/or minorised languages with a language with a greater number of speakers (e.g. English or Spanish), and even minority languages among themselves by joining translation engines that share the majority language. The aim of this presentation is to share this initiative and to invite publishers of multilingual magazines to collaborate. It will also present the collection flow of texts, the way they will be processed and, as a medium-term objective, the possibility of developing specific MT engines for each language combination and making them available to the community.

Published

2019-11-21

Issue

Section

Presentations