Periodic Reporting for period 3 - MMT (MMT will deliver a language independent commercial online translation service based on a new open-source machine translation distributed architecture)
Okres sprawozdawczy: 2017-01-01 do 2017-12-31
- MMT will be a ready to install application that will not require any initial training phase.
- MMT will manage context automatically so that it will not require building domain-specific systems.
- MMT will enable scalability of data and users so that no more expensive ad-hoc hardware installations are needed.
- MMT will create a data collection infrastructure that accelerates the process of filling the data gap between large web companies and the machine translation industry.
Remarkably, although the goals of MMT were fixed over three years ago, they have remained still valid even after the recent shift of paradigm that occurred in MT, from phrase-based to neural technology.
During Q2 we presented the MMT PBMT solution to several large companies, which later started to run tests with the MMT open source code, either independently or with our support. During Q2, in accordance with the work plan, FBK spun off a newco, MMT Srl, which was soon joined by Translated in the role of investor.
By the end of Q2, however, empirical evidence convinced us that NMT would outplace PBMT much quicker than we thought. From Q3, we started to completely redesign the MMT architecture and to rewrite and replace most of its core components. In just a few months of very intensive work, the MMT team was able to release the first real-time adaptive NMT open source software and to complete the implementation of a plugin for the MateCat tool. Hence, 14 translation engines for seven language pairs were trained and put into production with the plugin. The new software was then widely disseminated and showcased at several public venues and companies.
In Q4, evaluation activities of the new NMT architecture took off. Fast development of new and better baseline neural systems also called for an effective and simple human evaluation protocol to rapidly monitor progress. This period was very intense but rewarding as we e achieved significant enhancements of the NMT baselines. Until the very last days of 2017, our work has focused on improving the 14 baseline NMT systems as well as the MMT open source code, which thanks to feedback from several users has been relentlessly improved by fixing bugs and improving its functions and performance.
Little more than a year ago, neural MT seemed to be the ideal solution for machine translation. Yet, introducing Neural MT meant missing out on some of the improvements of phrase-based MT that most helped translation practitioners. With ModernMT, we created an open source system, which overcomes two main issues with Neural MT: its applicability to heterogeneous data (covering many different topics) and its ability to evolve over time. Hence, our novel approach merges the best of the two worlds, by offering state-of-the-art neural MT with real-time adaptivity.
The very distinguishing features of MMT are mostly evident to CAT tool users: (i) MMT does not require any initial training: as soon as translators upload their translation memories in the CAT tool, MMT can seamlessly and quickly learn from their data; (ii) MMT adapts to the content to be translated in real time: the system leverages the training data most similar to the document being translated; (iii) MMT learns from user corrections: during the translation workflow, MMT constantly learns from the post-edited sentences to improve its translation suggestions.
In conclusion, we believe MMT offers a unique solution for the specific needs of enterprises and translators. MMT has consolidated current state-of-the-art MT technology into a single, easy-to-use product capable of learning from – and evolving through – interaction with users, with the final aim of increasing MT-output utility for the translator in a real professional environment.