Creating a Multilingual Open Corpus of Academic Knowledge from OJS Journals
Abstract
Languages, like knowledge, are part of the intangible cultural heritage of humanity. The aim of the Tradumàtica Research Groupis is to compile multilingual texts (originals and translations) that journals publish in open source in order to create a corpus for each language combination that can feed one or more machine translation (MT) engines. These engines can link minority and/or minorised languages with a language with a greater number of speakers (e.g. English or Spanish), and even minority languages among themselves by joining translation engines that share the majority language. The aim of this presentation is to share this initiative and to invite publishers of multilingual magazines to collaborate. It will also present the collection flow of texts, the way they will be processed and, as a medium-term objective, the possibility of developing specific MT engines for each language combination and making them available to the community.
Published
Issue
Section
This work is licensed under a Creative Commons Attribution 3.0 International License.
This work is licensed under a Creative Commons Attribution License.