Periodic Reporting for period 4 - MOUSSE (Multilingual, Open-text Unified Syntax-independent SEmantics)
Reporting period: 2021-12-01 to 2023-05-31
- Multi-inventory and Multilingual Word Sense Disambiguation (WSD)
Thanks to novel neural models, we put forward innovative techniques that scale robustly across languages and that can integrate neuro-symbolic information within the network, making them more interpretable, and within the loss functions. We also investigated generating definitions which explain the meaning of words in context, proposing a new task for multilingual/cross-lingual Word-in-Context disambiguation and casting WSD as a Question Answering task. We also pioneered silver-data creation in WSD by proposing new frameworks for acquiring large amounts of sense-tagged sentences across languages.
- Multi-inventory and Multilingual Semantic Role Labeling (SRL)
For the first time, we introduced a multilingual and multi-inventory approach to SRL, work that received an outstanding paper award at NAACL 2021. This changed the field's landscape by: 1) reducing the gap from high- to low-resource languages and 2) bringing together the different inventories of predicates and roles in the literature, also across languages. We also brought together two different task "styles", that is, dependency- and span-based. We presented a novel multilingual multi-inventory resource, UniteD-SRL, and provided APIs and software, including InVeRo-SRL. Lately, we explored definition modeling to empower Semantic Role Labeling and analyzed the behavior of multilingual SRL via probing.
- Language-independent, semantically-grounded Semantic Parsing
We created meaning representations at the sentence level that are independent of the language (current representations are instead bound to English or few other languages). To achieve this goal, we proposed VerbAtlas, a novel unified resource which enables state-of-the-art multilingual semantic role labelling, SyntagNet, the first large-scale language-independent resource of semantic collocations, and multilingual sense embeddings. We enabled cross-lingual and multilingual Semantic Parsing with a novel model for cross-lingual Abstract Meaning Representation (AMR) parsing and graph generation in a seq2seq fashion. To move away from English AMR, we proposed SPRING, a new seq2seq model. After only 2 years it is the reference approach to the task. Last but not least, we proposed a novel formalism, BabelNet Meaning Representation (BMR), together with a truly semantic parsing aimed at overcoming the current issue of explicit sentence representations like AMR and enabling for the first time language-independent semantically-grounded representations.
- Related areas, including Machine Translation, LLMs, Multilingual Named Entity Recognition, Entity Disambiguation, Relation Extraction
We put forward multi-genre, fine-grained Named Entity Recognition. We proposed a novel approach to Entity Disambiguation based on extractive disambiguation, also empowered with textual definitions. We concluded the project by showing the relevance of lexical bias in Machine Translation with a novel benchmark for MT evaluation (work recipient of the best resource paper award at ACL 2022).
Close to the end of the project, we organized in Rome a Workshop on Ten Years of BabelNet and Multilingual Neuro-Symbolic Natural Language Understanding, with talks from the MOUSSE team and colleagues all around the world, was a big success, with guess from all over the world: Simon Krek, Anna Rogers, Bonnie Webber, Mark Steedman, Luke Zettlemoyer, Ed Hovy, Hinrich Schutze, Iryna Gurevich, Nathan Schneider, Ekaterina Shutova, Rico Sennrich, Alexander Koller, Daniel Hershcovich, Johan Bos, Steven Schockaert, Thierry Declerck, Jan Hajic, Carla Marello. As a result of one of the brainstorming session held during this workshop, a position paper on the hype on superhuman performance of LLMs in NLU was presented at ACL 2023 (recipient of an outstanding paper award).
2) Knowledge-based approaches have been shown to rival neural supervised approaches thanks to the integration of lexical-semantic syntagmatic information.
3) It is now possible to perform Semantic Role Labeling in arbitrary languages, thanks to the availability of VerbAtlas, a novel verb resource which overcomes the issues of PropBank and related resources (scalability, language specificity, human readability) in the literature and encodes the semantics of verb predicates and their arguments in a language-independent manner. It is also possible to perform Semantic Role Labeling with multiple inventories, an outcome enabled by the project.
4) Semantic parsing can now be carried out multilingually and in a truly semantic, language-independent fashion, thanks to the BabelNet Meaning Representation and neuro-symbolic semantic parsers.