Periodic Reporting for period 2 - TraMOOC (Translation for Massive Open Online Courses)
Período documentado: 2016-08-01 hasta 2018-01-31
To achieve its goal, the project pursues the following challenging scientific and technological objectives:
-High-quality machine translation of MOOC text. The high quality of the machine translation will be achieved through a hybrid translation schema that will combine automatic processes (i.e. text processing tools and MT resources that are adapted to cope with multi-genre MOOC educational text) as well as limited, focused human intervention (i.e. crowdsourcing in a novel time- and cost-efficient setup for evaluating the generated translation).
-Novel translation evaluation schemata and metrics. The second goal is the establishment of a new, appropriate, standardized, multi-level evaluation schema for deter-mining the value of the produced translation. Taking into account the idiosyncrasies of the particular text type and its multi-genre character, the evaluation schema will consist of a human and an automatic aspect, as well as a traditional explicit (direct) and an innovative implicit mode, that will be facilitated via a separate text mining application, namely topic identification. An analysis of results from a first phase evaluation process will be used for providing the translation engines with more accurate and wider-coverage data for re-training and thereby for improved translation in a second phase.
Infrastructure bootstrapping. In alignment with the need for high quality translation, even for poorly equipped languages, and the automatic nature of the proposed approach to translation, an important objective of TraMOOC is the automatic boot-strapping of new resources for languages that are fragmentarily or weakly equipped with infrastructure.
Language independence. The machine translation process will be language-independent. The translation approach will be statistical, and relying in principle on no language-dependent resources or thesauri. 11 languages are targeted to prove the lan-guage independent aspect of TraMOOC, i.e. nine European and two BRIC, languages, namely German (DE), Italian (IT), Portuguese (PT), Dutch (NL), Bulgarian (BG), Greek (EL), Polish (PL), Czech (CZ), Croatian (HR), Russian (RU) and Chinese (ZH). The particular languages were selected because they constitute strong use cases, i.e. that they warrant a market need for MOOC translation, they are of significant im-portance to the political and commercial agendas of the European Commission, and they also constitute challenging translation pairs, i.e. languages that are weakly equipped with tools and resources and languages that have been proven difficult to translate into.
-creation of the project governing bodies: Project Coordination Board, Project Technical Board, work package groups
-development and implementation of internal procedures for reporting, quality control, and risk management
-implementation of the project communication channels: Intranet, mailing lists, regular teleconferencing meetings and project meetings in person
-dedicated protected data repository for data used/created in the project
-establishment of the visual identity of the project: multilingual website, social media groups, leaflets, posters, presentations, and other dissemination activities
-business analysis of the potential market for the project results and a business plan for continuing the project legacy after the end of the project
-the final system architecture for the TraMOOC platform is already in place
-collection of significant amounts of MOOC-related parallel data
-2 machine translation prototypes developed
-initial integration of the MT systems in a real-live MOOC platform which proved the viability of the work performed within the project
-thorough comparison of phrase-based, syntax-based, and neural machine translation algorithms on MOOC-related test data
The first category concerns the expected impacts listed in the Horizon 2020 ICT work pro-gram (Objective ICT 17 - 2014). Those include: i) cracking the language barrier within the EU internal free market by developing MT systems with superior translation quality, ii) a significant improvement in the MT infrastructure of weakly and fragmentarily supported EU languages, and iii) support of a platform where hundreds of contributors of language technology tools can share and make use of the tools and resources developed within the project. The second category concerns the scientific impact the project will have while the social impact falls under the third category of expected impacts.
The project is on the right track to deliver on all types of impact. The first and the second MT prototypes delivered comparable or better results for MOOCs than the general state-of-the-art systems. We have created and continue to create in-domain parallel data for weakly supported languages like Croatian and Bulgarian. The TraMOOC platform will be open-source and freely available after the end of the project, with people being not only able to use it but also to contribute to it. Addressing the linguistics, natural language processing text analytics, data mining and machine translation scientific communities, the proposed project introduces several novel translation evaluation schemata that add significantly to the value of existing tools and resources in all these areas. The most important social impact of TraMOOC is in the area of education. By providing access to free multilingual online courses, run by the most prestigious universities and colleges worldwide, people who do not speak English adequately, people who cannot afford to pay for other educational means, people with disabilities, people living in isolated areas, social groups with specific characteristics and needs (mothers, elderly people) who cannot move around easily, will benefit from the project outcome. Focusing on Europe, multilingualism will be supported, wide access to multilingual educational material will be ensured, even for the less served language communities, and translation costs will be cut down. The initial integration of the MT service into the platform of Iversity has proven the feasibility and usefulness of our approach for the wider public.