Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Piloting a Cooperative Open Web Search Infrastructure to Support Europe's Digital Sovereignty

Periodic Reporting for period 1 - OpenWebSearch.EU (Piloting a Cooperative Open Web Search Infrastructure to Support Europe's Digital Sovereignty)

Reporting period: 2022-09-01 to 2024-02-29

Web search has become an essential technology and commodity, driving not only future innovations but forming a backbone for our digital economy. Regrettably, currently few non-European gatekeepers control Web search, which creates a biased, one-sided information access centred around economic success rather than the needs of citizens or European values and jurisdiction. This one-sided ecosystem puts pressure on many small Web contributors from science, economy, art, culture, media and society requiring them to optimize their content for a few gatekeepers. This creates a self reinfocing cycle, which leads to locked-in effects and a closed search engine market. To promote an open human-centred search engine market, OpenWebSearch.EU proposes to develop and pilot the core of a European Open Web Index (OWI) and the foundation for an open and extensible European open Web Search and Analysis Infrastructure (OWSAI).

Our approach is based on four objectives, namely to
1. develop a core suite of search, discovery and analytics services to create, maintain and utilize the OWI;
2. develop relevant search engine verticals and new search paradigms demonstrating the impact;
3. establish a network of European HPC-infrastructure, research and business organizations to pilot the OWSAI based on Europe’s values, principles, legislation, ethics and standards;
4. stimulate an ecosystem around the OWI. The envisioned infrastructure will not only contribute to Europe’s sovereignty for navigating and searching the web, it will also empower Europe’s researchers, innovators and business to systematically tap into the Web as business and innovation resource, without paying huge upfront costs. This will be particularly crucial for future AI innovations and relevant for other European infrastructures.

By developing the Open Web Index (OWI) and the European Open Web Search and Analysis Infrastructure (OWSAI), we aim to enhance two strategic areas:
1. Enabling a diverse ecosystem of internet innovators, startups, industries, and public entities to create various search solutions, thereby boosting Europe's sovereignty and competitiveness in search engine markets.
2. Supporting the development of numerous Web-based data products and services, focusing on AI-based innovations and providing alternatives to the prevalent ad-based Web markets.

The main outcome of OpenWebSearch.EU will be the pilot of the OWI, which will facilitate various vertical search applications and AI-driven web data products.
The project started with the development of a conceptual framework and corresponding first versions of the necessary pipelines and software stacks with seven core software components for realising an Open Web Index and published corresponding papers. During the first 18 months, we built and set up a federated crawler running a three HPC data centers, and crawled around 77 TB of data (in total) which is now used to build an index. Index slices are provided daily, partitioned by language and cover around 29 million hosts and 1.23 billion URLs covering about 185 languages, whereas 9.8 billion URLs are still queued for download. The crawled data is further preprocessed in a scalable pipeline to extract main text and metadata from the crawled web pages as well as enrich them. Examples are the extraction of main text and microdata or the enrichment with geo-coordinates. Based on the crawled data, we developed potential search applications, including first prototypes, and concept for ensuring privacy, transparency and trust.

OpenWebSearch.eu also explored the legal implications and identified legal issues related to crawling, storing, and processing web data, as well as sharing raw web data and a web index. In addition, an ethical framework has been elaborated that provides guidance for taking care of the ethical aspects related to the creation and usage of the open web index. As part of the framework ethical values have been elaborated that provide the underpinning for providing measures to address potential ethical issues. Initial societal aspects have been identified that are relevant to the creation, usage, and maintenance of the open web index. Sovereignty is related to the availability of open web search, environmental aspects are related to energy consumption of data centres, and democracy is related to opening up web information to a broader ranges of societal groups.

Beyond scientific, technical and social developments OpenWebSearch.eu is very much concerned about building commuinties around the Open Web Index and to govern a sustainable future development. We identified, addressed and started to involve a large range of OpenWebSearch.eu stakeholders according to their possible engagement in and contribution to the OpenWebSearch.eu ecosystem and actively grew this network of supporters during the first 18 month. The level of proximity to the project was used to identify four main groups: Core Stakeholders, Contributors, Supporters, and Followers. We developed an in-depth Dissemination, Exploitation and Communication Plan to address relevant communities and the public in the context of Open Web Search and Web Analysis and studied, stratified and evaluated possible legal forms for operating a future European Open Web Search infrastructure and derived a model governance for operating such an infrastructure with all relevant and involved players, computing facilities, oversight and development components.
As a major result, an open index of the Web is available in a first version, together with corresponding pipelines and infrastructures at Europes HPC centres. The results are not only relevant for the area of Web search but also for AI in particular and Europe's data industry in general. Research conducted in the project has been published at various venues, including top conferences in the field of information retrieval and for more details we refer to https://openwebsearch.eu .
Workflow of OpenWebSearch.eu