Skip to main content
European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

NewsEye: A Digital Investigator for Historical Newspapers

Resultado final

Automatic Text Recognition (final)

Reports on software tools and modules incl documentation for Automatic Text Recognition Technical Reports on further development and innovative adaptation of algorithms and methods for Automatic Text Recognition

Dissemination, communication and exploitation of results (e) (final)

The PEDR will be delivered at M3 and the project will followthrough by maintaining a rolling plan of activities to disseminate and exploit project results including reports or publications for each event on a particular topic This deliverable includes rapid dissemination channels in the form of blog posts tweets and other online media as well as more traditional dissemination outputs conference papers scholarly articlesAt M12 M24 and M36 we will provide yearly reports on the execution of the PEDR as well as on all dissemination and communication events organized during the projects Main dissemination and communication events are planned at M3 M14 M24 M25 M26 and M30 but will be reported on yearly together with smaller scale eventsThis deliverable under the lead of WP7 by BNF after M36 will provide details on the dissemination communication and exploitation of results during the project extension

Layout Analysis (final)

Reports on software tools and modules incl documentation for Layout Analysis Technical Reports on further development and innovative adaptation of algorithms and methods for Layout Analysis

Usability/Fit for research purpose test of tools and user interfaces (c) (final)

The deliverables will report on testing the methods tools and interfaces to the core They are the result of collaboration on the mockups and prototypes workshophackathon participation with the computer science groups and the libraries as indicated in Task T74 providing extensive feedback on tools and methods UIBKICH will supervise the production of reports in preparation for and as a followup to the tools prototypes betaversions and publishable tools and along the timeline of WP7 The final version is due at M34 with a possible update at M45

Contextualized Case Studies for academic use (d) (final)

The deliverables will report on the four digital humanities case studies prepared by using already existing methods and tools as well as the ones to be developed in this project showing progress and improvement of search and research outcome UIBKICH will be responsible for the case studies on migration UHDH for the case study on nationalisms and revolutions UNIVIE for the case study on media and journalism and UPVM for the case study on gender The members of the DHgroup will furthermore compare and contrast the results of the case studies in order to show how newspapers work both as a space for change as well as for stability while addressing the relationship between press politics and society in different regions and languages across Europe thus showing the transformation of our societiesThe deliverables will a include thorough literature and background research for each of the case studies b work with the semantically enriched The deliverables will report on testing the methods tools and interfaces to the core They are the result of collaboration on the mockups and prototypes workshophackathon participation with the computer science groups and the libraries as indicated in Task T74 providing extensive feedback on tools and methods UIBKICH will supervise the production of reports in preparation for and as a followup to the tools prototypes betaversions and publishable tools and along the timeline of WP7text as well as applicationutilization of the developed dynamic text analysis features in different languages in order to improve the quality of the case studies c show how the developed tools contribute to change and continuity discussions for European societiesDraftreports will be delivered at M6 complete reports at M12 while final reports to be submitted for publication in renowned humanities and digital humanities journals will be completed at M24 and M36

Personal Research Assistant: Explainer (b) (final)

This deliverable describes the Explainer component The first version M24 will be able to produce initial descriptions of strategies goals and decisions of the Investigator while the second version M36 describes the final version The final version is due at M36 with a possible update at M45

Article separation (c) (final)

Reports on software tools and modules incl documentation for Article Separation Technical Reports on further development and innovative adaptation of algorithms and methods for Article Separation journal research paper submissions on new preferably Machine Learning based neural algorithms and technologies for Article Separation along with the inherently used Layout Analysis Text Line Detection and Automatic Text The final version is due at M36 with a possible update at M45

Event detection (final)

Report on the level of completion of the event detection tool at M24 present the state of the art in event detection replying on the detection of events based on the sole document content using stringbased multilingual approaches based on rhetoric and specificities of the news genre as previously developed at ULR The second version at M36 will integrate contrastive knowledge from other documents The final version is due at M36 with a possible update at M45

Personal Research Assistant: Reporter (c) (final)

This deliverable describes the Reporter component and how it is used The first version M12 will be capable of some simple natural language generation using relatively rigid document structures and mechanisms for talking about the results of tools produced in WP34 during year one The second version M24 will have more elaborate document structuring and will be able to report more flexibly on a wider range of analysis results The second version will also have a first version of summarization of textual contents The third version of the deliverable M36 will describe the final version with full functionality The final version is due at M36 with a possible update at M45

Use of project results for the general public (b) (final)

The deliverables will report on the texts podcasts and social media activities by the digital humanities group UNIVIE will be supervising the podcast production UPVM the linking with Wikipedia and UHDH the social media activities

NewsEye Demonstrator (c) (final)

Reports and software on the development of the NewsEye Demonstrator a web based user interface for tools developed in WP3 and 4 and for the Personal Research Assistant WP5 Tools for the user interface of WP3 will be provided at M12 while the complete Minimum viable product MVP will be delivered at M24 and the final version at M36 The final version is due at M36 with a possible update at M45

Sustainability plan (c) (final)

The project will conceptualize a sustainability strategy for the longterm access of tools and data generated by the project to be planned in full details at M26 being implemented at M36 and fully implemented at M45

Stance detection (final)

Reports on the level of completion of the software tool for stance detection M12 The first version at M12 will rely on standards of the state of the art and the second version at M24 contains our principal research contribution robust to noise and language independent

Showcase case studies for the user interface (b) (final)

The deliverables will consist of texts videos statistics search paths how to etc on the user interface and on the project homepage All partners of the digital humanities group will contribute to the deliverable

Personnal Research Assistant: Investigator (c) (final)

The deliverable describes the Investigator tool In the first iteration M12 the Investigator will be capable of planning forming and running some queries using analysis tools developed in parallel in WP34 and of interacting with the user in simple ways to continue the investigation In the second iteration M24 the Investigator will also be able to create strategies for investigation to analyze the results obtained and to adjust its strategy accordingly The third iteration M36 describes the final version with full functionality The final version is due at M36 with a possible update at M45

Advanced tool to query the enriched data sets (final)

Report on the software to query the data sets (M6). The first version is delivered early on at M6 to allow que-rying the data set as soon as possible, without the semantic enrichment produced in other deliverables of WP3, and the second version at M12 reporting on the software to analyze the data and the enriched data sets is delivered as soon as possible, and allows querying the data set and the enriched data set, including the se-mantic text enrichment to be produced in the rest of WP3 (D3.1-D3.3).

Data models (d) (final)

Regular reports providing a detailed description of the data models formats and specifications used in the project including publicly available example data

Data collection and preservation (d) (final)

Report and data collection

Comparative analysis of data between contexts (b) (final)

Reports on the developed methods and tools for dynamic comparative analysis of data between given contexts The first version at M24 describes the methods to extract sets of characteristics to describe similarities or contrasts between document groups and the second version at M36 describes the final methods to extract contrasting characteristics from groups of documents integrated with work on intelligible descriptions The final version is due at M36 with a possible update at M45

Educational material for teachers, pupils and lay historians (b) (final)

The deliverables consist of prototypes of the educational material in M24 and the online published material in M36 While all partners of the digital humanities group will contribute in the production of the material UHDH will supervise the production of material for teachers UPVM for pupils and students and UIBKICH for lay historians in different languagesA report on educational material prototypes will be delivered at M24 the final report will be delivered at M36

Analysis of data in a given context (c) (final)

Reports on the level of completion of the software tool for dynamic analysis of data in a given context The first version at M12 will be tools for building multilingual topic models topic hierarchies and dynamic topic models and using them to analyze articles in the initial dataset the second version at M24 contains document analysis methods for article similarity and link discovery to suggest related articles combining multilingual hierarchical dynamic topic models and the third version at M36 contains document analysis methods refined on the basis of feedback from their use in Personal Research Assistant and evaluation of their integration with intelligible descriptions The final version is due at M36 with a possible update at M45

NE recognition and linking (final)

Reports on the level of completion of the software tool to recognize and link NEs The first version at M12 will rely on standards of the state of the art and the second version at M24 contains our principal research contribution robust to noise and language independent

Intelligible representation of statistical analysis (b) (final)

Reports on the methods and tools for outputting humanintelligible representations based on the outputs from statistical models developed in T41 and T42 The first version at M24 describes the methods that provide intelligible namesdescriptions of topics and extracted characteristics for use in Personal Research Assistant and the second version at M36 describes the final methods to provide intelligible descriptions refined after integration in Personal Research Assistant The final version is due at M36 with a possible update at M45

Project website (to be continuously updated)

The project will maintain a website that will act as a portal for the communications activities. In M1 a web page will be published to advertise and announce the project. By M8 the full website structure will be in place, integrating social media (such as Twitter) channels. The website will be maintained throughout the duration of the project and content will be contributed by all project partners.

Data management plan

The NewsEye project will contribute to the open research data pilot. According to the guidelines for Research Data Management of Horizon 2020 (http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf) a Data Management Plan will be written during the first six months explaining what data will be generated, collected, shared and curated during project duration as well as after the project’s end. It will consider the different kinds of research outcomes (WP6) and data (WP2-5) resulting from the project. One im-portant goal of Newseye is to make its data findable, accessible, interoperable and reusable (FAIR).

Publicaciones

Exploring Entities in Event Detection as Question Answering

Autores: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Publicado en: Proceedings of the 44th European Conference on Information Retrieval (ECIR), 2022
Editor: Springer
DOI: 10.5281/zenodo.5779941

L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition

Autores: Emanuela Boros, Carlos-Emiliano Gonzalez-Gallardo, Jose G. Moreno, Antoine Doucet
Publicado en: International Workshop on Semantic Evaluation (SemEval), Edición Task 11, 2022
Editor: ACL
DOI: 10.5281/zenodo.6369947

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Autores: Ahmed Hamdi; Elvys Linhares Pontes; Emanuela Boros; Thi Tuyet Hai Nguyen; Günter Hackl; Jose G. Moreno; Antoine Doucet
Publicado en: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, Página(s) 2328–2334
Editor: ACM
DOI: 10.1145/3404835.3463255

Assessing and Minimizing the Impact of OCR Quality on Named Entity Recognition

Autores: Ahmed Hamdi; Axel Jean-Caurant; Nicolas Sidere; Mickaël Coustaty; Antoine Doucet
Publicado en: Proceedings of the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Edición 12246, 2020, Página(s) 87–101
Editor: Springer
DOI: 10.1007/978-3-030-54956-5_7

Alleviating Digitization Errors in Named Entity Recognition for Historical Documents

Autores: Emanuela Boros; Ahmed Hamdi; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Nicolas Sidere; Antoine Doucet
Publicado en: Proceedings of the 24th Conference on Computational Natural Language Learning (CoNLL), 2020, Página(s) 431–441
Editor: ACL
DOI: 10.18653/v1/2020.conll-1.35

Exploring Entities in Event Detection as Question Answering

Autores: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Publicado en: European Conference on Information Retrieval (ECIR 2022), 2022, Página(s) 65-79, ISBN 978-3-030-99735-9
Editor: Springer
DOI: 10.1007/978-3-030-99736-6_5

Grammatical Profiling for Semantic Change Detection

Autores: Giulianelli, Mario; Kutuzov, Andrey; Pivovarova, Lidia
Publicado en: Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), 2021
Editor: ACL
DOI: 10.18653/v1/2021.conll-1.33

Multilingual Epidemic Event Extraction

Autores: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publicado en: Proceedings of the 23rd International Conference on Asian Digital Libraries (ICADL)., Edición 13133, 2021, Página(s) 139–156
Editor: Springer
DOI: 10.5281/zenodo.5779966

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Autores: Boros, Emanuela; Doucet, Antoine
Publicado en: Thirteenth Text Analysis Conference ((TAC 2020), 2021
Editor: NIST
DOI: 10.5281/zenodo.4555778

Information Extraction from Invoices

Autores: Ahmed Hamdi; Elodie Carel; Aurelie Joseph; Mickael Coustaty; Antoine Doucet
Publicado en: International Conference on Document Analysis and Recognition ICDAR 2021, Edición 12822, 2021, Página(s) 699–714
Editor: Springer
DOI: 10.1007/978-3-030-86331-9_45

Event Detection with Entity Markers

Autores: Emanuela Boros; Jose G. Moreno; Antoine Doucet
Publicado en: Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Edición 12657, 2021, Página(s) 233–240
Editor: Springer
DOI: 10.1007/978-3-030-72240-1_20

An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish

Autores: Quan Duong; Mika K Hämäläinen; Simon Hengchen
Publicado en: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2020, Página(s) 240–248
Editor: ACL
DOI: 10.5281/zenodo.4242890

Dataset for Temporal Analysis of English-French Cognates

Autores: Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Publicado en: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, Página(s) 855–859
Editor: European Language Resources Association
DOI: 10.5281/zenodo.3693650

NewsEye: A digital investigator for historical newspapers

Autores: Doucet, Antoine; Gasteiner, Martin; Granroth-Wilding, Mark; Kaiser, Max; Kaukonen, Minna; Labahn, Roger; Moreux, Jean-Philippe; Muehlberger, Guenter; Pfanzelter, Eva; Therenty, Marie-Eve; Toivonen, Hannu; Tolonen, Mikko
Publicado en: 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, 2020
Editor: ADHO
DOI: 10.5281/zenodo.3895269

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Autores: Emanuela Boros; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Ahmed Hamdi; José Moreno; Nicolas Sidère; Antoine Doucet
Publicado en: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Edición 2696, 2020, Página(s) 1-17
Editor: CEUR
DOI: 10.5281/zenodo.4068074

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Autores: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Publicado en: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing (BSNLP at ACL), 2021, Página(s) 98–104
Editor: ACL
DOI: 10.5281/zenodo.4730477

SpaceWars: A Web Interface for Exploring the Spatio-temporal Dimensions of WWI Newspaper Reporting

Autores: Gutehrlé, Nicolas; Harlamov, Oleg; Karimi, Farimah; Wei, Haoyu; Jean-Caurant, Axel; Pivovarova, Lidia
Publicado en: Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021), 2021
Editor: CEUR
DOI: 10.5281/zenodo.5566463

Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections

Autores: Zosa, Elaine; Hengchen, Simon; Marjanen, Jani; Pivovarova, Lidia; Tolonen, Mikko
Publicado en: Digital Humanities in the Nordic countries (DHN 2020), 2020
Editor: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3631613

Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques

Autores: Boros, Emanuela; Hamdi, Ahmed; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Moreno, José G.; Sidere, Nicolas; Doucet, Antoine
Publicado en: Conférence en Recherche d’Informations et Applications - CORIA 2021, French Information Retrieval Conference,, 2021
Editor: ARIA
DOI: 10.24348/coria.2021.mini_24

Neural Machine Translation with BERT for Post-OCR Error Detection and Correction

Autores: Thi Tuyet Hai Nguyen; Adam Jatowt; Nhu-Van Nguyen; Mickael Coustaty; Antoine Doucet
Publicado en: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2020, Página(s) 333–336
Editor: ACM
DOI: 10.1145/3383583.3398605

Post-OCR Error Detection by Generating Plausible Candidates

Autores: Thi-Tuyet-Hai Nguyen, Adam Jatowt, Mickael Coustaty, Nhu-Van Nguyen, Antoine Doucet
Publicado en: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, Página(s) 876-881, ISBN 978-1-7281-3014-9
Editor: IEEE
DOI: 10.1109/ICDAR.2019.00145

Elastic Embedded Background Linking for News Articles with Keywords, Entities and Events.

Autores: Luis Adrián Cabrera-Diego, Emanuela Boros, Antoine Doucet
Publicado en: Text REtrieval Conference (TREC) 2021, Edición News Track, 2022
Editor: NIST
DOI: 10.5281/zenodo.6334523

Opening Digitized Newspapers for Different User Groups - Successes and Challenges

Autores: Juha Rautiainen
Publicado en: IFLA World Library and Information Congress 2019, 2019
Editor: IFLA
DOI: 10.5281/zenodo.3403158

A Baseline Document Planning Method for Automated Journalism

Autores: Leo Leppänen; Hannu Toivonen
Publicado en: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021, Página(s) 101–111
Editor: ACL
DOI: 10.5281/zenodo.4694492

Personal Research Assistant for Online Exploration of Historical News

Autores: Lidia Pivovarova; Axel Jean-Caurant; Jari Avikainen; Khalid Alnajjar; Mark Granroth-Wilding; Leo Leppänen; Elaine Zosa; Hannu Toivonen
Publicado en: Proceedings of the 42nd European Conference on IR Research, Edición 12036, 2020, Página(s) 481–485, ISBN 9783030454418
Editor: Springer
DOI: 10.1007/978-3-030-45442-5_62

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages

Autores: Piskorski, Jakub; Babych, Bogdan; Kancheva, Zara; Kanishcheva, Olga; Lebedeva, Maria; Marcinczuk, Michał; Nakov, Preslav; Osenova, Petya; Pivovarova, Lidia; Pollak, Senja; Přibáň, Pavel; Radev, Ivaylo; Robnik-Šikonja, Marko; Starko, Vasyl; Steinberger, Josef; Yangarber, Roman
Publicado en: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 2021, Página(s) 122–133
Editor: ACL
DOI: 10.5281/zenodo.4635585

When to Use OCR Post-correction for Named Entity Recognition?

Autores: Vinh-Nam Huynh; Ahmed Hamdi; Antoine Doucet
Publicado en: Proceedings of the 14th International Conference on Data Analytics in Logistics (ICDAL 2020), Edición 12504, 2020, Página(s) 33–42, ISBN 9783030644512
Editor: Springer
DOI: 10.1007/978-3-030-64452-9_3

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Autores: Elaine Zosa; Mark Granroth-Wilding; Lidia Pivovarova
Publicado en: Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020), 2020, Página(s) 32-37
Editor: ACL
DOI: 10.5281/zenodo.3751036

"Transformer-based Methods with #Entities for Detecting Emergency Events on Social Media"

Autores: Emanuela Boros, Nhu Khoa Nguyen, Gaël Lejeune, Mickaël Coustaty, Antoine Doucet
Publicado en: Text REtrieval Conference (TREC) 2021, Edición Incident Streams Track, 2022
Editor: NIST
DOI: 10.5281/zenodo.6334513

Simple ways to improve NER in every language using markup

Autores: Luis Adrián Cabrera-Diego; Moreno, J. G.; Doucet, A.
Publicado en: Proceedings of the 2nd International Workshop on Cross-Lingual Event-Centric Open Analytics Co-Located with the 30th The Web Conference (WWW 2021), 2021, ISSN 1613-0073
Editor: CEUR-WS
DOI: 10.5281/zenodo.4680998

Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman's Rights (allemansrätten)

Autores: Kettunen, Kimmo; La Mela, Matti
Publicado en: Proceedings of the Digital Humanities in the Nordic Countries (5th Conference), 2020, Página(s) 63–80
Editor: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3676371

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

Autores: Ehrmann, Maud; Romanello, Matteo; Doucet, Antoine; Clematide, Simon
Publicado en: European Conference on Information Retrieval (ECIR 2022), 2022, Página(s) 347–354, ISBN 978-3-030-99739-7
Editor: Springer
DOI: 10.1007/978-3-030-99739-7_44

Event Related Document Retrieval with Multilingual Real World Event Representation

Autores: Guillaume Bernard, Cyrille Suire, Cyril Faucher, Antoine Doucet
Publicado en: Proceedings of the 20th International Semantic Web Conference (ISWC), 2021
Editor: CEUR-WS
DOI: 10.5281/zenodo.5900742

Three-part diachronic semantic change dataset for Russian

Autores: Andrey Kutuzov; Lidia Pivovarova
Publicado en: Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 2021, Página(s) 7-13
Editor: ACL
DOI: 10.18653/v1/2021.lchange-1.2

ICDAR 2019 Competition on Post-OCR Text Correction

Autores: Christophe Rigaud; Antoine Doucet; Mickaël Coustaty; Jean-Philippe Moreux
Publicado en: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, ISBN 978-1-7281-3015-6
Editor: IEEE
DOI: 10.1109/icdar.2019.00255

Multilingual Dynamic Topic Model

Autores: Zosa, Elaine; Granroth-Wilding, Mark; Department of Computer Science, University of Helsinki, Finland
Publicado en: Proceedings - Natural Language Processing in a Deep Learning World (RANLP), 2019, Página(s) 1388–1396
Editor: RANLP
DOI: 10.26615/978-954-452-056-4_159

Visual Topic Modelling for NewsImage Task at MediaEval 2021

Autores: Lidia Pivovarova, Elaine Zosa
Publicado en: Working Notes Proceedings of the MediaEval 2021 Workshop, 2021
Editor: CEUR-WS
DOI: 10.5281/zenodo.5900719

Linking Named Entities across Languages using Multilingual Word Embeddings

Autores: Elvys Linhares Pontes; Jose G. Moreno; Antoine Doucet
Publicado en: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020, Página(s) 329–332
Editor: ACM
DOI: 10.1145/3383583.3398597

Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on 'The Dark Sides of War' (1914–1918)

Autores: Klaus, Barbara
Publicado en: Proceedings of the Digital Humanities in the Nordic Countries (5th Conference), Edición 2612, 2020, Página(s) 267–274
Editor: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.4686731

Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search

Autores: Sumikawa, Yasunobu; Jatowt, Adam; Doucet, Antoine; Moreux, Jean-Phillippe
Publicado en: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Edición yearly, 2019, Página(s) 77-86, ISBN 978-1-7281-1547-4
Editor: IEEE computer society
DOI: 10.1109/jcdl.2019.00021

Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing

Autores: Nguyen, Thi-Tuyet-Hai; Jatowt, Adam; Coustaty, Mickael; Nguyen, Nhu-Van; Doucet, Antoine
Publicado en: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Edición yearly, 2019, Página(s) 29-38, ISBN 978-1-7281-1547-4
Editor: IEEE computer society
DOI: 10.1109/jcdl.2019.00015

Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles

Autores: Rola Alhalaseh, Myriam Munezero, Miika Leinonen, Leo Leppänen, Jari Avikainen, Hannu Toivonen
Publicado en: Proceedings of the 22nd International Academic Mindtrek Conference on - Mindtrek '18, Edición yearly, 2018, Página(s) 100-109, ISBN 9781-450365895
Editor: ACM Press
DOI: 10.1145/3275116.3275131

An Analysis of the Performance of Named Entity Recognition over OCRed Documents

Autores: Hamdi, Ahmed; Jean-Caurant, Axel; Sidere, Nicolas; Coustaty, Mickael; Doucet, Antoine
Publicado en: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Edición yearly, 2019, Página(s) 333-334, ISBN 978-1-7281-1547-4
Editor: IEEE computer society
DOI: 10.1109/jcdl.2019.00057

Impact Analysis of Document Digitization on Event Extraction

Autores: Nhu Khoa Nguyen; Emanuela Boroş; Gaël Lejeune; Antoine Doucet
Publicado en: Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020), Edición 2735, 2020, Página(s) 17–28
Editor: CEUR-WS
DOI: 10.5281/zenodo.4734267

Scalable and Interpretable Semantic Change Detection

Autores: Syrielle Montariol; Matej Martinc; Lidia Pivovarova
Publicado en: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021, Página(s) 4642–4652
Editor: ACL
DOI: 10.18653/v1/2021.naacl-main.369

Word Clustering for Historical Newspapers Analysis

Autores: Lidia Pivovarova; Jani Marjanen; Elaine Zosa
Publicado en: Proceedings of the Workshop on Language Technology for Digital Historical Archives, 2019, Página(s) 3-10
Editor: ACL Bulgaria
DOI: 10.26615/978-954-452-059-5_002

Multilingual Epidemiological Text Classification: A Comparative Study

Autores: Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Adam Jatowt; Gaël Lejeune; Moses Odeo
Publicado en: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, Página(s) 6172–6183
Editor: ACL
DOI: 10.18653/v1/2020.coling-main.543

Impact of OCR Quality on Named Entity Linking

Autores: Elvys Linhares Pontes; Ahmed Hamdi; Nicolas Sidere; Antoine Doucet
Publicado en: International Conference on Asia-Pacific Digital Libraries 2019, 2019, Página(s) 102–115, ISBN 978-3-030-34058-2
Editor: Springer
DOI: 10.1007/978-3-030-34058-2_11

Entity Linking for Historical Documents: Challenges and Solutions

Autores: Pontes, Elvys Linhares; Cabrera-Diego, Luis Adrián; Moreno, José G.; Boros, Emanuela; Pontes, Elvys,; Hamdi, Ahmed; Sidère, Nicolas; Coustaty, Mickaël; Doucet, Antoine
Publicado en: Proceedings of the 22nd International Conference on Asia-Pacific Digital Libraries (ICADL 2020), Edición 12504, 2020, Página(s) 215–231, ISBN 9783030644512
Editor: Springer
DOI: 10.1007/978-3-030-64452-9_19

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Autores: Jani Pekka Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Publicado en: HistoInformatics 2019: International Workshop on Computational History 2019, part of TPDL 2019, 2019
Editor: Springer
DOI: 10.5281/zenodo.3689466

Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise

Autores: Elaine Zosa, Stephen Mutuvi, Mark Granroth-Wilding, Antoine Doucet
Publicado en: International Conference on Asian Digital Libraries (ICADL), 2021, ISBN 978-3-030-91668-8
Editor: Springer
DOI: 10.1007/978-3-030-91669-5_30

Topic Modelling Discourse Dynamics in Historical Newspapers

Autores: Marjanen, Jani; Zosa, Elaine; Hengchen, Simon; Pivovarova, Lidia; Tolonen, Mikko
Publicado en: Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), 2020, Página(s) 63-77
Editor: CEUR-WS
DOI: 10.5281/zenodo.5648114

Benchmarks for Unsupervised Discourse Change Detection

Autores: Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Publicado en: Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021), Edición 2981, 2021
Editor: Springer
DOI: 10.5281/zenodo.5780033

Capturing Evolution in Word Usage: Just Add More Clusters?

Autores: Matej Martinc; Syrielle Montariol; Elaine Zosa; Lidia Pivovarova
Publicado en: WWW '20: Companion Proceedings of the Web Conference 2020, 2020, Página(s) 343-349
Editor: ACM
DOI: 10.1145/3366424.3382186

A Dataset for Multi-lingual Epidemiological Event Extraction

Autores: Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Publicado en: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, Página(s) 4139–4144
Editor: European Language Resources Association
DOI: 10.5281/zenodo.3709626

Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

Autores: Elaine Zosa; Ravi Shekhar; Mladen Karan; Matthew Purver
Publicado en: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021, Página(s) 1652–1662
Editor: RANLP
DOI: 10.5281/zenodo.5648098

EMBEDDIA at SemEval-2022 Task 8: Investigating Sentence, Image, and Knowledge Graph Representations for Multilingual News Article Similarity

Autores: Elaine Zosa, Emanuela Boros, Boshko Koloski, Lidia Pivovarova
Publicado en: Proceedings of SemEval-2022 Workshop Task 8, 2022
Editor: ACL
DOI: 10.5281/zenodo.6369944

Token-Level Multilingual Epidemic Dataset for Event Extraction

Autores: Stephen Mutuvi; Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Gaël Lejeune; Adam Jatowt; Moses Odeo
Publicado en: Proceedings of the 25th International Conference on Theory and Practice of Digital Libraries (TPDL), Edición 12866, 2021, Página(s) 55–59
Editor: Springer
DOI: 10.5281/zenodo.5780019

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Autores: Johannes Michael, Roger Labahn, Tobias Gruning, Jochen Zollner
Publicado en: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, Página(s) 1286-1293, ISBN 978-1-7281-3014-9
Editor: IEEE
DOI: 10.1109/icdar.2019.00208

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Autores: Nhu Khoa Nguyen; Emanuela Boros; Gaël Lejeune; Antoine Doucet; Thierry Delahaut
Publicado en: Companion Proceedings of the Web Conference, 2020, Página(s) 302–306
Editor: ACM
DOI: 10.5281/zenodo.4734321

The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context

Autores: Ros, Ruben; Oberbichler, Sarah
Publicado en: Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020, 2020, Página(s) 66–74
Editor: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3689228

Multilingual Topic Labelling of News Topics using Ontological Mapping

Autores: Elaine Zosa, Lidia Pivovarova, Michele Boggia, Sardana Ivanova
Publicado en: European Conference on Information Retrieval (ECIR), 2022
Editor: Springer
DOI: 10.5281/zenodo.6334491

Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie

Autores: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Publicado en: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, 2021
Editor: ARIA
DOI: 10.5281/zenodo.4734471

A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines

Autores: Guillaume Bernard, Cyrille Suire, Cyril Faucher, Antoine Doucet
Publicado en: International Conference on Theory and Practice of Digital Libraries (TPDL), 2021, Página(s) 153-164, ISBN 978-3-030-86323-4
Editor: Springer
DOI: 10.1007/978-3-030-86324-1_19

A Method for Wavelet-Based Time Series Analysis of Historical Newspapers

Autores: Avikainen, Jari
Publicado en: 2019
Editor: University of Helsinki
DOI: 10.5281/zenodo.3628262

"""Wir dürfen wieder Österreicher sein!"" Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung"

Autores: Stefan Patrick Hechl
Publicado en: 2021
Editor: Universität Innsbruck
DOI: 10.5281/zenodo.4468295

Wortvektoren

Autores: Laasch, Bastian Marc
Publicado en: 2018
Editor: University of Rostock
DOI: 10.18453/rosdok_id00002309

Embeddings built on 19th century newspapers from Finland

Autores: Lidia Pivovarova, Elaine Zosa, Jani Marjanen
Publicado en: 2019
Editor: Zenodo
DOI: 10.5281/zenodo.3557480

Doing historical research with digital newspapers – perspectives of DH scholars

Autores: Sarah Oberbichler, Eva Pfanzelter, Stefan Hechl, Jani Marjanen
Publicado en: Europeana Tech, Edición Edición 16: Newspapers, 2021
Editor: Europeana

Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles

Autores: Sarah Oberbichler
Publicado en: 2020
Editor: Zenodo
DOI: 10.5281/zenodo.3887193

The Book of Abstracts for What’s Past is Prologue: The NewsEye International Conference.

Autores: Antti Kanner, Eetu Mäkelä, Jani Marjanen, Mikko Tolonen, Sarah Oberbichler, Quan Duong, Lidia Pivovarova, Dilawar Ali, Steven Verstockt, Étienne Ollion, Rubing Shen, Matthias Arnold, David Brown, Raven Adam, Saranya Balasubramanian, Vera Maria Charvat, Manfred Füllsack, Jörn Kleinert, Hanna Misera, Nenad Pantelic, Jakob Sonnberger, Georg Vogelor, Alessandra De Mulder, Heikki K
Publicado en: 2021
Editor: Zenodo
DOI: 10.5281/zenodo.5167375

Covid-19 et grippe espagnole: Quand la presse du XXe siècle rappelle celle de 2020

Autores: Nejma Omari, Antoine Doucet
Publicado en: 2020
Editor: The Conversation

Annotation Guidelines for Named Entity Recognition, Entity Linking and Stance Detection (v3.1)

Autores: Ahmed Hamdi, Elvys Linhares Pontes, Antoine Doucet
Publicado en: 2021
Editor: Zenodo
DOI: 10.5281/zenodo.4574199

NewsEye Policy Brief

Autores: NewsEye consortium
Publicado en: 2020
Editor: Zenodo
DOI: 10.5281/zenodo.4291895

Assessing the Impact of OCR Noise on Multilingual Event Detection over Digitised Documents

Autores: Emanuela Boros, Nhu Khoa Nguyen, Gaël Lejeune, Antoine Doucet
Publicado en: International Journal on Digital Libraries, Edición 14325012, 2022, ISSN 1432-5012
Editor: Springer Verlag
DOI: 10.1007/s00799-022-00325-2

The expansion of isms, 1820-1917: Data-driven analysis of political language in digitized newspaper collections

Autores: Jani Marjanen; Jussi Antero Kurunmäki; Lidia Pivovarova; Elaine Zosa
Publicado en: Journal of Data Mining & Digital Humanities, HistoInformatics, Edición 6159, 2020, ISSN 2416-5999
Editor: EPIsciences
DOI: 10.5281/zenodo.4447025

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Autores: Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Publicado en: Computación y Sistemas, Edición 24 (2), 2020, ISSN 2007-9737
Editor: IPN
DOI: 10.13053/cys-24-2-3335

Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians

Autores: Sarah Oberbichler; Emanuela Boros; Antoine Doucet; Jani Marjanen; Eva Pfanzelter; Juha Rautiainen; Hannu Toivonen; Mikko Tolonen
Publicado en: Journal of the Association for Information Science and Technology, Edición 73 (2), 2022, Página(s) 225–239, ISSN 2330-1643
Editor: John Wiley and Sons Ltd
DOI: 10.1002/asi.24565

In Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking

Autores: Ahmed Hamdi, Evlys Linhares Pontes, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Publicado en: Natural Language Engineering, 2022, Página(s) 1-24, ISSN 1351-3249
Editor: Cambridge University Press
DOI: 10.1017/s1351324922000110

Digital interfaces of historical newspapers: opportunities, restrictions and recommendations

Autores: Eva Pfanzelter; Sarah Oberbichler; Jani Marjanen; Pierre-Carl Langlais; Stefan Hechl
Publicado en: Journal of Data Mining and Digital Humanities, Volume on HistoInformatics, Edición 6121, 2021, ISSN 2416-5999
Editor: EPIsciences
DOI: 10.5281/zenodo.4446818

Als eine andere Epidemie die Welt in Atem hielt: Die Spanische Grippe 1918/19 in der österreichischen Presse

Autores: Sarah Oberbichler, Stefan Hechl, Eva Pfanzelter
Publicado en: Tiroler Chronist - Fachblatt von und für Chronisten in Nord-, Süd- und Osttirol, Edición 154, 2020, Página(s) 15-22, ISSN 1990-9799
Editor: Tiroler Bildungsforum

A data-driven approach to studying changing vocabularies in historical newspaper collections

Autores: Hengchen, Simon; Ros, Ruben; Marjanen, Jani; Tolonen, Mikko
Publicado en: Digital Scholarship in the Humanities, Edición 36, 2021, Página(s) 109–126, ISSN 2055-7671
Editor: Oxford University Press
DOI: 10.5281/zenodo.5783070

Survey of Post-OCR Processing Approaches

Autores: Thi Tuyet Hai Nguyen; Adam Jatowt; Mickaël Coustaty; Antoine Doucet
Publicado en: ACM Computing Surveys, Edición 54(6), 2022, Página(s) 1–37, ISSN 0360-0300
Editor: Association for Computing Machinary, Inc.
DOI: 10.1145/3453476

A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917

Autores: Jani Marjanen; Villle Vaara; Antti Kanner; Hege Roivainen; Eetu Mäkelä; Leo Lahti; Mikko Tolonen
Publicado en: Journal of European Periodical Studies, Edición 4 (1), 2019, Página(s) 55–78, ISSN 2506-6587
Editor: ESPRit (European Society for Periodical Research)
DOI: 10.21825/jeps.v4i1.10483

MELHISSA: a multilingual entity linking architecture for historical press articles

Autores: Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Emanuela Boros; Ahmed Hamdi; Antoine Doucet; Nicolas Sidere; Mickaël Coustaty
Publicado en: International Journal on Digital Libraries, 2021, ISSN 1432-5012
Editor: Springer Verlag
DOI: 10.1007/s00799-021-00319-6

Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods

Autores: Sarah Oberbichler, Eva Pfanzelter
Publicado en: Journal of Digital History, 2021
Editor: De Gruyter

Tracing Discourses in Digital Newspaper Collections: A Contribution to Digital Hermeneutics while Investigating 'Return Migration' in Historical Press Coverage

Autores: Sarah Oberbichler, Eva Pfanzelter
Publicado en: Digitised Newspapers – A New Eldorado for Historians?, 2022, ISBN 9783110729214
Editor: De Gruyter Oldenbourg

Crossing or Intersecting the Emperor’s Desk with digitized Newspaper Data: Entity-source-networks in the late Habsburg Empire

Autores: Martin Gasteiner, Andreas Enderlin
Publicado en: Digitised Newspapers – A New Eldorado for Historians?, 2022, ISBN 9783110729214
Editor: De Gruyter Oldenbourg

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Autores: Johannes Michael; Max Weidemann; Bastian Laasch; Roger Labahn
Publicado en: Proceedings of ICPR International Workshops and Challenges (2020), Edición 12668, 2021, Página(s) 405–418
Editor: Springer
DOI: 10.1007/978-3-030-68793-9_30

International: From Legal to Civic Discourse and Beyond in the Nineteenth Century

Autores: Jani Marjanen, Ruben Ros
Publicado en: Nationalism and Internationalism Intertwined - A European History of Concepts Beyond the Nation State, 2022, Página(s) 60-85, ISBN 978-1-80073-314-5
Editor: Berghahn

Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction

Autores: Thi-Tuyet-Hai Nguyen, Mickael Coustaty, Antoine Doucet, Adam Jatowt, Nhu-Van Nguyen
Publicado en: Maturity and Innovation in Digital Libraries - 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings, Edición 11279, 2018, Página(s) 278-289, ISBN 978-3-030-04256-1
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-04257-8_29

Evaluating the Impact of OCR Errors on Topic Modeling

Autores: Stephen Mutuvi, Antoine Doucet, Moses Odeo, Adam Jatowt
Publicado en: Maturity and Innovation in Digital Libraries - 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings, Edición 11279, 2018, Página(s) 3-14, ISBN 978-3-030-04256-1
Editor: Springer International Publishing
DOI: 10.1007/978-3-030-04257-8_1

National Sentiment: Nation Building and Emotional Language in Nineteenth-Century Finland

Autores: Jani Marjanen
Publicado en: Lived Nation as the History of Experiences and Emotions in Finland, 1800-2000, 2021, Página(s) 61–83, ISBN 978-3-030-69881-2
Editor: Springer
DOI: 10.1007/978-3-030-69882-9_3

Buscando datos de OpenAIRE...

Se ha producido un error en la búsqueda de datos de OpenAIRE

No hay resultados disponibles