MMT will deliver a language independent commercial online translation service based on a new open-source machine translation distributed architecture

Periodic Reporting for period 3 - MMT (MMT will deliver a language independent commercial online translation service based on a new open-source machine translation distributed architecture)

Okres sprawozdawczy: 2017-01-01 do 2017-12-31

The original goal of ModernMT (MMT) was to create machine translation (MT) software able to overcome four technology barriers hindering the wide adoption of available solutions by end-users and language service providers:

- MMT will be a ready to install application that will not require any initial training phase.
- MMT will manage context automatically so that it will not require building domain-specific systems.
- MMT will enable scalability of data and users so that no more expensive ad-hoc hardware installations are needed.
- MMT will create a data collection infrastructure that accelerates the process of filling the data gap between large web companies and the machine translation industry.

Remarkably, although the goals of MMT were fixed over three years ago, they have remained still valid even after the recent shift of paradigm that occurred in MT, from phrase-based to neural technology.

The last twelve months of the MMT project were rather intense and turbulent. We started the year still believing that phrase-based MT (PBMT) could be competitive with Neural MT (NMT) in the realm of technical translation and, in particular, when a single system has to handle translation requests from many different domains. Thus, during Q1 we invested significant effort in finalising our PBMT implementation and released the first MMT plug-in for a commercial CAT tool (Trados Studio). In parallel, we continued investigating an original NMT adaptive solution developed at FBK that seemed to fit pretty well the use cases of the project and, more importantly, to overcome the limitations of NMT under the multi-domain setting. Finally, the imminent finalisation of the MMT PBMT software gave also impetus to a large-scale collection of training data for 7 language pairs (14 translation directions) and to the development of a large-scale multi-domain evaluation benchmark.

During Q2 we presented the MMT PBMT solution to several large companies, which later started to run tests with the MMT open source code, either independently or with our support. During Q2, in accordance with the work plan, FBK spun off a newco, MMT Srl, which was soon joined by Translated in the role of investor.

By the end of Q2, however, empirical evidence convinced us that NMT would outplace PBMT much quicker than we thought. From Q3, we started to completely redesign the MMT architecture and to rewrite and replace most of its core components. In just a few months of very intensive work, the MMT team was able to release the first real-time adaptive NMT open source software and to complete the implementation of a plugin for the MateCat tool. Hence, 14 translation engines for seven language pairs were trained and put into production with the plugin. The new software was then widely disseminated and showcased at several public venues and companies.

In Q4, evaluation activities of the new NMT architecture took off. Fast development of new and better baseline neural systems also called for an effective and simple human evaluation protocol to rapidly monitor progress. This period was very intense but rewarding as we e achieved significant enhancements of the NMT baselines. Until the very last days of 2017, our work has focused on improving the 14 baseline NMT systems as well as the MMT open source code, which thanks to feedback from several users has been relentlessly improved by fixing bugs and improving its functions and performance.

Nowadays, computer-assisted translation (CAT) tools represent the dominant technology in the translation market – and those including machine translation (MT) engines are on the increase. In this new scenario, where MT and post-editing are becoming the standard portfolio for professional translators, it is of the utmost importance that MT systems are specifically tailored to translators.

Little more than a year ago, neural MT seemed to be the ideal solution for machine translation. Yet, introducing Neural MT meant missing out on some of the improvements of phrase-based MT that most helped translation practitioners. With ModernMT, we created an open source system, which overcomes two main issues with Neural MT: its applicability to heterogeneous data (covering many different topics) and its ability to evolve over time. Hence, our novel approach merges the best of the two worlds, by offering state-of-the-art neural MT with real-time adaptivity.

The very distinguishing features of MMT are mostly evident to CAT tool users: (i) MMT does not require any initial training: as soon as translators upload their translation memories in the CAT tool, MMT can seamlessly and quickly learn from their data; (ii) MMT adapts to the content to be translated in real time: the system leverages the training data most similar to the document being translated; (iii) MMT learns from user corrections: during the translation workflow, MMT constantly learns from the post-edited sentences to improve its translation suggestions.

In conclusion, we believe MMT offers a unique solution for the specific needs of enterprises and translators. MMT has consolidated current state-of-the-art MT technology into a single, easy-to-use product capable of learning from – and evolving through – interaction with users, with the final aim of increasing MT-output utility for the translator in a real professional environment.

ModernMT Poster

Periodic Reporting for period 3 - MMT (MMT will deliver a language independent commercial online translation service based on a new open-source machine translation distributed architecture)

Udostępnij tę stronę

Pobierz