Periodic Reporting for period 1 - vera.ai (vera.ai: VERification Assisted by Artificial Intelligence)
Okres sprawozdawczy: 2022-09-15 do 2023-09-14
vera.ai's objectives fall under the following specific objectives:
SO1: AI methods for content analysis, enhancement, and evidence retrieval
SO2: AI tools for the detection of synthetic media (including deepfakes) and manipulated content
SO3: Discovery, tracking, and impact measurement of disinformation narratives and campaigns across social platforms, modalities, and languages
SO4: Intelligent verification and debunk authoring assistant, based on chatbot NLP technology
SO5: Fact-checker-in-the-loop approach to gather new feedback as a side effect of verification
SO6: Adoption and sustainability of the new AI tools in real-world applications through integration in leading verification tools and platforms with established communities.
D2.1 reports on the methodology, co-creation activities and the resulting use cases and requirements and serves as a reference for the research and development of WP3, WP4 and WP5 methods that address user requirements.
Details on the conducted research are presented in the Scientific Advances in AI Methods for Detecting and Mitigating Disinformation report.
Towards SO1, partners achieved progress in developing methods for extracting credibility indicators and trustworthy evidence, audio-visual content analysis and enhancement, extraction of verification clues from visual content, and cross-modal detection of decontextualised content. Additionally, they built on the existing Near Duplicate Detection service for image and video retrieval, cross-lingual and multimodal search to support the retrieval of already debunked narratives, videos, or images from the project’s Database of Known Fakes (DBKF) to present as authoritative evidence to users.
Towards SO2, the main limitations of prior image and video deepfake detection methods have been investigated, focusing on the poor generalisation of deepfake detection models to unseen and novel synthesis methods. There has also been progress on the challenge of detecting AI-generated false news and narratives through a study of the capabilities of SotA LMs to generate misleading content and trustworthy-looking arguments in favour of disinformation narratives. vera.ai also contributed to creating a dataset of synthetic speech, and synthetic speech detection methods were evaluated with encouraging first results.
Towards SO3, research aimed to support professionals in uncovering coordinated inauthentic behaviour and other disinformation campaigns and to measure their impact and spread within the target communities. A workflow for periodic monitoring of a known list of coordinated social media accounts across different platforms was designed and implemented to study the 2022 Italian general election. Finally, an approach is developed to evaluate the impact of social media posts based on an implementation of the Misinformation Amplification Factor method introduced by the Integrity Institute.
No progress towards SO4 yet (related tasks start on M13).
Towards SO5, a fact-checker-in-the-loop approach is implemented to seamlessly gather new feedback as a side effect of verification workflows, which is used by our AI methods to continuously adapt to evolving disinformation targets, narratives, and types.
Towards SO6, a list of existing and planned services is created, and integration planning has started to ensure the adoption and sustainability of the new AI tools in real-world applications through integration in leading verification tools (Truly Media and the InVID-WeVerify verification plugin). D5.1 presents a common data annotation model that represents the inputs and outputs of all tools developed within the project and the improvements within the Database of Known Fakes-specific services, workflows and functionalities.
For detection of decontextualised content, we identified the challenge of using test sets generated by Synthetic Misinformers instead of real-world multimodal misinformation. To this end, we conducted an extensive comparative study where we train a Transformer-based model and compare performance on the COSMOS benchmark that encompasses real-world multimodal misinformation. We found that the COSMOS evaluation benchmark enables text-side unimodal biases on an inherently multimodal task.
To enable the discovery, tracking, and impact measurement of disinformation narratives and campaigns across social platforms, modalities, and languages, we designed a workflow that allows for periodic monitoring of a known list of coordinated social media accounts across different platforms. This workflow had multiple objectives: to automatically update the account list, to keep a quasi-real-time record of these accounts' top-performing content and narratives and to generalise the detection logic for coordinated sharing, considering multimodal near-duplicates. An early version of this workflow was implemented to study the 2022 Italian general election. To improve the capabilities of the near duplicate detection, we have developed and published a novel approach capable of addressing multiple video retrieval and detection tasks at once, with no requirement for labelled data.