Periodic Reporting for period 1 - LT-BRIDGE (“Bridging the technology gap: Integrating Malta into European Research and Innovation efforts for AI-based language technologies”)
Periodo di rendicontazione: 2021-01-01 al 2022-03-31
At the same time, the University of Malta (UM), being in a widening country, might not always have access to the latest technology and infrastructure to be able to compete in a highly competitive and fast-paced research environment. Nearly all of the tools and resources developed for the Maltese language have been produced by UM with minimal access to funding, resources, knowledge and infrastructure. This makes the research output slow and thus prevents researchers from providing timely outputs of their research in the field of LT. The administrative and research support infrastructure is also limited and academics might not be able to avail themselves of the necessary resources and support that would usually be available in larger countries and universities, which often leads to a reluctance on the academic’s part to engage with such research programmes.
In this context, LT-BRIDGE is a project aimed at integrating the University of Malta (UM), in particular, the Department of AI and the Institute of Linguistics and Language Technologies into the European Research community in the area of AI-based language technologies. This is being done by significantly strengthening the research and networking capacities and reputation aiming to create a European-level Centre of Excellence in the field of AI-Language Technologies in Malta. Thus closing the technological and research gap. The partnership created with the German Research Centre for Artificial Intelligence (DFKI) and Dublin City University’s ADAPT Centre (DCU) will enable UM to achieve its overall goals and objectives.
The project aims to significantly strengthen the research and innovation capacities of UM, as well as boost UM’s reputation in the scientific community with the aim of creating a European-level Centre of Excellence in this field in Malta. It also aims to maximise the research management and administrative capacities of both UM’s research staff and its Research Support office, thus ensuring the sustainable growth of UM as a whole.
Its operational objectives are:
O1: To design and launch a Scientific Strategy for the UM NLP Group, identifying key areas for joint research.
O2: To implement a targeted Research Capacity Building Programme to support and build UM’s scientific excellence, positioning UM as a key institution in the European Language Technology research community.
O3: To improve the long-term prospects of research excellence by investing in young talent and by providing training opportunities to ESRs.
O4: To build the capacity of the innovation and research management skills for the UM research staff through the implementation of a targeted Research Development and Innovation Capacity Building Programme.
O5: To position UM and improve its visibility in the local, regional and European communities as a viable and cross-disciplinary research partner with a strong portfolio of competencies related to AI Language Technology.
O6: To evaluate the impact of the project activities on the research and innovation capacities in accordance with the development of the research directions at UM.
It has produced an initial scientific strategy for the NLP group at UM and is testing out a possible setup for the Virtual Laboratory which needs to be in place by the end of the project.
It has established a number of connections between UM and LT-Bodies, including DARIAH, ELRC, ELRA and ELG.
It has also organised two advance workshops, one on Annonimisation and one on Chatbots and their design.
It has organised the first shared task in Machine Translation for low resource languages together with a scientific workshop and a series of webinars that led to the shared task to encourage young researchers to participate in the shared task.
The same approach will be carried out for the current shared task being organised, in Natural Language Generation (WebNLG). For this shared task, we have also produced new datasets in Maltese and Irish, which is the first time that such low-resource languages are participating in generation tasks, thus this in itself is a contribution to the scientific community and those working with low-resource languages.
Moreover, a submission for the third shared task has been sent and we are waiting for the outcome of the evaluation process.
The first summer school has been organised and initial plans are underway for the second summer school.
The first round of in-person training activities is also being held in Malta at UM aimed at the research administration and support staff.
Overall, all the project partners are very active and collaborate toward the successful execution of the LT-Bridge project.
K. Micallef and A. Gatt and M. Tanti and L. van der Plas and C. Borg (2022) Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese to appear in The Proceedings of 3rd Deep Learning for Low-resource NLP (DeepLo) workshop (collocated with NAACL 2022)
Models & Datasets: https://huggingface.co/MLRS
Code: https://github.com/MLRS/BERTu
This is a very positive step in terms of SOTA development for resources of the Maltese Language.