Skip to main content
European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

NewsEye: A Digital Investigator for Historical Newspapers

CORDIS oferuje możliwość skorzystania z odnośników do publicznie dostępnych publikacji i rezultatów projektów realizowanych w ramach programów ramowych HORYZONT.

Odnośniki do rezultatów i publikacji związanych z poszczególnymi projektami 7PR, a także odnośniki do niektórych konkretnych kategorii wyników, takich jak zbiory danych i oprogramowanie, są dynamicznie pobierane z systemu OpenAIRE .

Rezultaty

Automatic Text Recognition (final)

Reports on software tools and modules incl documentation for Automatic Text Recognition Technical Reports on further development and innovative adaptation of algorithms and methods for Automatic Text Recognition

Dissemination, communication and exploitation of results (e) (final)

The PEDR will be delivered at M3 and the project will followthrough by maintaining a rolling plan of activities to disseminate and exploit project results including reports or publications for each event on a particular topic This deliverable includes rapid dissemination channels in the form of blog posts tweets and other online media as well as more traditional dissemination outputs conference papers scholarly articlesAt M12 M24 and M36 we will provide yearly reports on the execution of the PEDR as well as on all dissemination and communication events organized during the projects Main dissemination and communication events are planned at M3 M14 M24 M25 M26 and M30 but will be reported on yearly together with smaller scale eventsThis deliverable under the lead of WP7 by BNF after M36 will provide details on the dissemination communication and exploitation of results during the project extension

Layout Analysis (final)

Reports on software tools and modules incl documentation for Layout Analysis Technical Reports on further development and innovative adaptation of algorithms and methods for Layout Analysis

Usability/Fit for research purpose test of tools and user interfaces (c) (final)

The deliverables will report on testing the methods tools and interfaces to the core They are the result of collaboration on the mockups and prototypes workshophackathon participation with the computer science groups and the libraries as indicated in Task T74 providing extensive feedback on tools and methods UIBKICH will supervise the production of reports in preparation for and as a followup to the tools prototypes betaversions and publishable tools and along the timeline of WP7 The final version is due at M34 with a possible update at M45

Contextualized Case Studies for academic use (d) (final)

The deliverables will report on the four digital humanities case studies prepared by using already existing methods and tools as well as the ones to be developed in this project showing progress and improvement of search and research outcome UIBKICH will be responsible for the case studies on migration UHDH for the case study on nationalisms and revolutions UNIVIE for the case study on media and journalism and UPVM for the case study on gender The members of the DHgroup will furthermore compare and contrast the results of the case studies in order to show how newspapers work both as a space for change as well as for stability while addressing the relationship between press politics and society in different regions and languages across Europe thus showing the transformation of our societiesThe deliverables will a include thorough literature and background research for each of the case studies b work with the semantically enriched The deliverables will report on testing the methods tools and interfaces to the core They are the result of collaboration on the mockups and prototypes workshophackathon participation with the computer science groups and the libraries as indicated in Task T74 providing extensive feedback on tools and methods UIBKICH will supervise the production of reports in preparation for and as a followup to the tools prototypes betaversions and publishable tools and along the timeline of WP7text as well as applicationutilization of the developed dynamic text analysis features in different languages in order to improve the quality of the case studies c show how the developed tools contribute to change and continuity discussions for European societiesDraftreports will be delivered at M6 complete reports at M12 while final reports to be submitted for publication in renowned humanities and digital humanities journals will be completed at M24 and M36

Personal Research Assistant: Explainer (b) (final)

This deliverable describes the Explainer component The first version M24 will be able to produce initial descriptions of strategies goals and decisions of the Investigator while the second version M36 describes the final version The final version is due at M36 with a possible update at M45

Article separation (c) (final)

Reports on software tools and modules incl documentation for Article Separation Technical Reports on further development and innovative adaptation of algorithms and methods for Article Separation journal research paper submissions on new preferably Machine Learning based neural algorithms and technologies for Article Separation along with the inherently used Layout Analysis Text Line Detection and Automatic Text The final version is due at M36 with a possible update at M45

Event detection (final)

Report on the level of completion of the event detection tool at M24 present the state of the art in event detection replying on the detection of events based on the sole document content using stringbased multilingual approaches based on rhetoric and specificities of the news genre as previously developed at ULR The second version at M36 will integrate contrastive knowledge from other documents The final version is due at M36 with a possible update at M45

Personal Research Assistant: Reporter (c) (final)

This deliverable describes the Reporter component and how it is used The first version M12 will be capable of some simple natural language generation using relatively rigid document structures and mechanisms for talking about the results of tools produced in WP34 during year one The second version M24 will have more elaborate document structuring and will be able to report more flexibly on a wider range of analysis results The second version will also have a first version of summarization of textual contents The third version of the deliverable M36 will describe the final version with full functionality The final version is due at M36 with a possible update at M45

Use of project results for the general public (b) (final)

The deliverables will report on the texts podcasts and social media activities by the digital humanities group UNIVIE will be supervising the podcast production UPVM the linking with Wikipedia and UHDH the social media activities

NewsEye Demonstrator (c) (final)

Reports and software on the development of the NewsEye Demonstrator a web based user interface for tools developed in WP3 and 4 and for the Personal Research Assistant WP5 Tools for the user interface of WP3 will be provided at M12 while the complete Minimum viable product MVP will be delivered at M24 and the final version at M36 The final version is due at M36 with a possible update at M45

Sustainability plan (c) (final)

The project will conceptualize a sustainability strategy for the longterm access of tools and data generated by the project to be planned in full details at M26 being implemented at M36 and fully implemented at M45

Stance detection (final)

Reports on the level of completion of the software tool for stance detection M12 The first version at M12 will rely on standards of the state of the art and the second version at M24 contains our principal research contribution robust to noise and language independent

Showcase case studies for the user interface (b) (final)

The deliverables will consist of texts videos statistics search paths how to etc on the user interface and on the project homepage All partners of the digital humanities group will contribute to the deliverable

Personnal Research Assistant: Investigator (c) (final)

The deliverable describes the Investigator tool In the first iteration M12 the Investigator will be capable of planning forming and running some queries using analysis tools developed in parallel in WP34 and of interacting with the user in simple ways to continue the investigation In the second iteration M24 the Investigator will also be able to create strategies for investigation to analyze the results obtained and to adjust its strategy accordingly The third iteration M36 describes the final version with full functionality The final version is due at M36 with a possible update at M45

Advanced tool to query the enriched data sets (final)

Report on the software to query the data sets (M6). The first version is delivered early on at M6 to allow que-rying the data set as soon as possible, without the semantic enrichment produced in other deliverables of WP3, and the second version at M12 reporting on the software to analyze the data and the enriched data sets is delivered as soon as possible, and allows querying the data set and the enriched data set, including the se-mantic text enrichment to be produced in the rest of WP3 (D3.1-D3.3).

Data models (d) (final)

Regular reports providing a detailed description of the data models formats and specifications used in the project including publicly available example data

Data collection and preservation (d) (final)

Report and data collection

Comparative analysis of data between contexts (b) (final)

Reports on the developed methods and tools for dynamic comparative analysis of data between given contexts The first version at M24 describes the methods to extract sets of characteristics to describe similarities or contrasts between document groups and the second version at M36 describes the final methods to extract contrasting characteristics from groups of documents integrated with work on intelligible descriptions The final version is due at M36 with a possible update at M45

Educational material for teachers, pupils and lay historians (b) (final)

The deliverables consist of prototypes of the educational material in M24 and the online published material in M36 While all partners of the digital humanities group will contribute in the production of the material UHDH will supervise the production of material for teachers UPVM for pupils and students and UIBKICH for lay historians in different languagesA report on educational material prototypes will be delivered at M24 the final report will be delivered at M36

Analysis of data in a given context (c) (final)

Reports on the level of completion of the software tool for dynamic analysis of data in a given context The first version at M12 will be tools for building multilingual topic models topic hierarchies and dynamic topic models and using them to analyze articles in the initial dataset the second version at M24 contains document analysis methods for article similarity and link discovery to suggest related articles combining multilingual hierarchical dynamic topic models and the third version at M36 contains document analysis methods refined on the basis of feedback from their use in Personal Research Assistant and evaluation of their integration with intelligible descriptions The final version is due at M36 with a possible update at M45

NE recognition and linking (final)

Reports on the level of completion of the software tool to recognize and link NEs The first version at M12 will rely on standards of the state of the art and the second version at M24 contains our principal research contribution robust to noise and language independent

Intelligible representation of statistical analysis (b) (final)

Reports on the methods and tools for outputting humanintelligible representations based on the outputs from statistical models developed in T41 and T42 The first version at M24 describes the methods that provide intelligible namesdescriptions of topics and extracted characteristics for use in Personal Research Assistant and the second version at M36 describes the final methods to provide intelligible descriptions refined after integration in Personal Research Assistant The final version is due at M36 with a possible update at M45

Project website (to be continuously updated)

The project will maintain a website that will act as a portal for the communications activities. In M1 a web page will be published to advertise and announce the project. By M8 the full website structure will be in place, integrating social media (such as Twitter) channels. The website will be maintained throughout the duration of the project and content will be contributed by all project partners.

Data management plan

The NewsEye project will contribute to the open research data pilot. According to the guidelines for Research Data Management of Horizon 2020 (http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf) a Data Management Plan will be written during the first six months explaining what data will be generated, collected, shared and curated during project duration as well as after the project’s end. It will consider the different kinds of research outcomes (WP6) and data (WP2-5) resulting from the project. One im-portant goal of Newseye is to make its data findable, accessible, interoperable and reusable (FAIR).

Publikacje

Exploring Entities in Event Detection as Question Answering

Autorzy: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Opublikowane w: Proceedings of the 44th European Conference on Information Retrieval (ECIR), 2022
Wydawca: Springer
DOI: 10.5281/zenodo.5779941

L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition

Autorzy: Emanuela Boros, Carlos-Emiliano Gonzalez-Gallardo, Jose G. Moreno, Antoine Doucet
Opublikowane w: International Workshop on Semantic Evaluation (SemEval), Numer Task 11, 2022
Wydawca: ACL
DOI: 10.5281/zenodo.6369947

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Autorzy: Ahmed Hamdi; Elvys Linhares Pontes; Emanuela Boros; Thi Tuyet Hai Nguyen; Günter Hackl; Jose G. Moreno; Antoine Doucet
Opublikowane w: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, Strona(/y) 2328–2334
Wydawca: ACM
DOI: 10.1145/3404835.3463255

Assessing and Minimizing the Impact of OCR Quality on Named Entity Recognition

Autorzy: Ahmed Hamdi; Axel Jean-Caurant; Nicolas Sidere; Mickaël Coustaty; Antoine Doucet
Opublikowane w: Proceedings of the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Numer 12246, 2020, Strona(/y) 87–101
Wydawca: Springer
DOI: 10.1007/978-3-030-54956-5_7

Alleviating Digitization Errors in Named Entity Recognition for Historical Documents

Autorzy: Emanuela Boros; Ahmed Hamdi; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Nicolas Sidere; Antoine Doucet
Opublikowane w: Proceedings of the 24th Conference on Computational Natural Language Learning (CoNLL), 2020, Strona(/y) 431–441
Wydawca: ACL
DOI: 10.18653/v1/2020.conll-1.35

Exploring Entities in Event Detection as Question Answering

Autorzy: Boros, Emanuela; Moreno, Jose G.; Doucet, Antoine
Opublikowane w: European Conference on Information Retrieval (ECIR 2022), 2022, Strona(/y) 65-79, ISBN 978-3-030-99735-9
Wydawca: Springer
DOI: 10.1007/978-3-030-99736-6_5

Grammatical Profiling for Semantic Change Detection

Autorzy: Giulianelli, Mario; Kutuzov, Andrey; Pivovarova, Lidia
Opublikowane w: Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), 2021
Wydawca: ACL
DOI: 10.18653/v1/2021.conll-1.33

Multilingual Epidemic Event Extraction

Autorzy: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Opublikowane w: Proceedings of the 23rd International Conference on Asian Digital Libraries (ICADL)., Numer 13133, 2021, Strona(/y) 139–156
Wydawca: Springer
DOI: 10.5281/zenodo.5779966

Transformer-based Methods for Recognizing Ultra Fine-grained Entities (RUFES)

Autorzy: Boros, Emanuela; Doucet, Antoine
Opublikowane w: Thirteenth Text Analysis Conference ((TAC 2020), 2021
Wydawca: NIST
DOI: 10.5281/zenodo.4555778

Information Extraction from Invoices

Autorzy: Ahmed Hamdi; Elodie Carel; Aurelie Joseph; Mickael Coustaty; Antoine Doucet
Opublikowane w: International Conference on Document Analysis and Recognition ICDAR 2021, Numer 12822, 2021, Strona(/y) 699–714
Wydawca: Springer
DOI: 10.1007/978-3-030-86331-9_45

Event Detection with Entity Markers

Autorzy: Emanuela Boros; Jose G. Moreno; Antoine Doucet
Opublikowane w: Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Numer 12657, 2021, Strona(/y) 233–240
Wydawca: Springer
DOI: 10.1007/978-3-030-72240-1_20

An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish

Autorzy: Quan Duong; Mika K Hämäläinen; Simon Hengchen
Opublikowane w: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2020, Strona(/y) 240–248
Wydawca: ACL
DOI: 10.5281/zenodo.4242890

Dataset for Temporal Analysis of English-French Cognates

Autorzy: Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon
Opublikowane w: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, Strona(/y) 855–859
Wydawca: European Language Resources Association
DOI: 10.5281/zenodo.3693650

NewsEye: A digital investigator for historical newspapers

Autorzy: Doucet, Antoine; Gasteiner, Martin; Granroth-Wilding, Mark; Kaiser, Max; Kaukonen, Minna; Labahn, Roger; Moreux, Jean-Philippe; Muehlberger, Guenter; Pfanzelter, Eva; Therenty, Marie-Eve; Toivonen, Hannu; Tolonen, Mikko
Opublikowane w: 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, 2020
Wydawca: ADHO
DOI: 10.5281/zenodo.3895269

Robust Named Entity Recognition and Linking on Historical Multilingual Documents

Autorzy: Emanuela Boros; Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Ahmed Hamdi; José Moreno; Nicolas Sidère; Antoine Doucet
Opublikowane w: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Numer 2696, 2020, Strona(/y) 1-17
Wydawca: CEUR
DOI: 10.5281/zenodo.4068074

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Autorzy: Cabrera-Diego, Luis Adrián; Moreno, Jose G.; Doucet, Antoine
Opublikowane w: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing (BSNLP at ACL), 2021, Strona(/y) 98–104
Wydawca: ACL
DOI: 10.5281/zenodo.4730477

SpaceWars: A Web Interface for Exploring the Spatio-temporal Dimensions of WWI Newspaper Reporting

Autorzy: Gutehrlé, Nicolas; Harlamov, Oleg; Karimi, Farimah; Wei, Haoyu; Jean-Caurant, Axel; Pivovarova, Lidia
Opublikowane w: Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021), 2021
Wydawca: CEUR
DOI: 10.5281/zenodo.5566463

Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections

Autorzy: Zosa, Elaine; Hengchen, Simon; Marjanen, Jani; Pivovarova, Lidia; Tolonen, Mikko
Opublikowane w: Digital Humanities in the Nordic countries (DHN 2020), 2020
Wydawca: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3631613

Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques

Autorzy: Boros, Emanuela; Hamdi, Ahmed; Linhares Pontes, Elvys; Cabrera-Diego, Luis Adrián; Moreno, José G.; Sidere, Nicolas; Doucet, Antoine
Opublikowane w: Conférence en Recherche d’Informations et Applications - CORIA 2021, French Information Retrieval Conference,, 2021
Wydawca: ARIA
DOI: 10.24348/coria.2021.mini_24

Neural Machine Translation with BERT for Post-OCR Error Detection and Correction

Autorzy: Thi Tuyet Hai Nguyen; Adam Jatowt; Nhu-Van Nguyen; Mickael Coustaty; Antoine Doucet
Opublikowane w: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2020, Strona(/y) 333–336
Wydawca: ACM
DOI: 10.1145/3383583.3398605

Post-OCR Error Detection by Generating Plausible Candidates

Autorzy: Thi-Tuyet-Hai Nguyen, Adam Jatowt, Mickael Coustaty, Nhu-Van Nguyen, Antoine Doucet
Opublikowane w: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, Strona(/y) 876-881, ISBN 978-1-7281-3014-9
Wydawca: IEEE
DOI: 10.1109/ICDAR.2019.00145

Elastic Embedded Background Linking for News Articles with Keywords, Entities and Events.

Autorzy: Luis Adrián Cabrera-Diego, Emanuela Boros, Antoine Doucet
Opublikowane w: Text REtrieval Conference (TREC) 2021, Numer News Track, 2022
Wydawca: NIST
DOI: 10.5281/zenodo.6334523

Opening Digitized Newspapers for Different User Groups - Successes and Challenges

Autorzy: Juha Rautiainen
Opublikowane w: IFLA World Library and Information Congress 2019, 2019
Wydawca: IFLA
DOI: 10.5281/zenodo.3403158

A Baseline Document Planning Method for Automated Journalism

Autorzy: Leo Leppänen; Hannu Toivonen
Opublikowane w: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021, Strona(/y) 101–111
Wydawca: ACL
DOI: 10.5281/zenodo.4694492

Personal Research Assistant for Online Exploration of Historical News

Autorzy: Lidia Pivovarova; Axel Jean-Caurant; Jari Avikainen; Khalid Alnajjar; Mark Granroth-Wilding; Leo Leppänen; Elaine Zosa; Hannu Toivonen
Opublikowane w: Proceedings of the 42nd European Conference on IR Research, Numer 12036, 2020, Strona(/y) 481–485, ISBN 9783030454418
Wydawca: Springer
DOI: 10.1007/978-3-030-45442-5_62

Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages

Autorzy: Piskorski, Jakub; Babych, Bogdan; Kancheva, Zara; Kanishcheva, Olga; Lebedeva, Maria; Marcinczuk, Michał; Nakov, Preslav; Osenova, Petya; Pivovarova, Lidia; Pollak, Senja; Přibáň, Pavel; Radev, Ivaylo; Robnik-Šikonja, Marko; Starko, Vasyl; Steinberger, Josef; Yangarber, Roman
Opublikowane w: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 2021, Strona(/y) 122–133
Wydawca: ACL
DOI: 10.5281/zenodo.4635585

When to Use OCR Post-correction for Named Entity Recognition?

Autorzy: Vinh-Nam Huynh; Ahmed Hamdi; Antoine Doucet
Opublikowane w: Proceedings of the 14th International Conference on Data Analytics in Logistics (ICDAL 2020), Numer 12504, 2020, Strona(/y) 33–42, ISBN 9783030644512
Wydawca: Springer
DOI: 10.1007/978-3-030-64452-9_3

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Autorzy: Elaine Zosa; Mark Granroth-Wilding; Lidia Pivovarova
Opublikowane w: Proceedings of the Workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020), 2020, Strona(/y) 32-37
Wydawca: ACL
DOI: 10.5281/zenodo.3751036

"Transformer-based Methods with #Entities for Detecting Emergency Events on Social Media"

Autorzy: Emanuela Boros, Nhu Khoa Nguyen, Gaël Lejeune, Mickaël Coustaty, Antoine Doucet
Opublikowane w: Text REtrieval Conference (TREC) 2021, Numer Incident Streams Track, 2022
Wydawca: NIST
DOI: 10.5281/zenodo.6334513

Simple ways to improve NER in every language using markup

Autorzy: Luis Adrián Cabrera-Diego; Moreno, J. G.; Doucet, A.
Opublikowane w: Proceedings of the 2nd International Workshop on Cross-Lingual Event-Centric Open Analytics Co-Located with the 30th The Web Conference (WWW 2021), 2021, ISSN 1613-0073
Wydawca: CEUR-WS
DOI: 10.5281/zenodo.4680998

Digging Deeper into the Finnish Parliamentary Protocols – Using a Lexical Semantic Tagger for Studying Meaning Change of Everyman's Rights (allemansrätten)

Autorzy: Kettunen, Kimmo; La Mela, Matti
Opublikowane w: Proceedings of the Digital Humanities in the Nordic Countries (5th Conference), 2020, Strona(/y) 63–80
Wydawca: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3676371

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

Autorzy: Ehrmann, Maud; Romanello, Matteo; Doucet, Antoine; Clematide, Simon
Opublikowane w: European Conference on Information Retrieval (ECIR 2022), 2022, Strona(/y) 347–354, ISBN 978-3-030-99739-7
Wydawca: Springer
DOI: 10.1007/978-3-030-99739-7_44

Event Related Document Retrieval with Multilingual Real World Event Representation

Autorzy: Guillaume Bernard, Cyrille Suire, Cyril Faucher, Antoine Doucet
Opublikowane w: Proceedings of the 20th International Semantic Web Conference (ISWC), 2021
Wydawca: CEUR-WS
DOI: 10.5281/zenodo.5900742

Three-part diachronic semantic change dataset for Russian

Autorzy: Andrey Kutuzov; Lidia Pivovarova
Opublikowane w: Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 2021, Strona(/y) 7-13
Wydawca: ACL
DOI: 10.18653/v1/2021.lchange-1.2

ICDAR 2019 Competition on Post-OCR Text Correction

Autorzy: Christophe Rigaud; Antoine Doucet; Mickaël Coustaty; Jean-Philippe Moreux
Opublikowane w: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, ISBN 978-1-7281-3015-6
Wydawca: IEEE
DOI: 10.1109/icdar.2019.00255

Multilingual Dynamic Topic Model

Autorzy: Zosa, Elaine; Granroth-Wilding, Mark; Department of Computer Science, University of Helsinki, Finland
Opublikowane w: Proceedings - Natural Language Processing in a Deep Learning World (RANLP), 2019, Strona(/y) 1388–1396
Wydawca: RANLP
DOI: 10.26615/978-954-452-056-4_159

Visual Topic Modelling for NewsImage Task at MediaEval 2021

Autorzy: Lidia Pivovarova, Elaine Zosa
Opublikowane w: Working Notes Proceedings of the MediaEval 2021 Workshop, 2021
Wydawca: CEUR-WS
DOI: 10.5281/zenodo.5900719

Linking Named Entities across Languages using Multilingual Word Embeddings

Autorzy: Elvys Linhares Pontes; Jose G. Moreno; Antoine Doucet
Opublikowane w: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020, Strona(/y) 329–332
Wydawca: ACM
DOI: 10.1145/3383583.3398597

Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on 'The Dark Sides of War' (1914–1918)

Autorzy: Klaus, Barbara
Opublikowane w: Proceedings of the Digital Humanities in the Nordic Countries (5th Conference), Numer 2612, 2020, Strona(/y) 267–274
Wydawca: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.4686731

Large Scale Analysis of Semantic and Temporal Aspects in Cultural Heritage Collection's Search

Autorzy: Sumikawa, Yasunobu; Jatowt, Adam; Doucet, Antoine; Moreux, Jean-Phillippe
Opublikowane w: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Numer yearly, 2019, Strona(/y) 77-86, ISBN 978-1-7281-1547-4
Wydawca: IEEE computer society
DOI: 10.1109/jcdl.2019.00021

Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing

Autorzy: Nguyen, Thi-Tuyet-Hai; Jatowt, Adam; Coustaty, Mickael; Nguyen, Nhu-Van; Doucet, Antoine
Opublikowane w: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Numer yearly, 2019, Strona(/y) 29-38, ISBN 978-1-7281-1547-4
Wydawca: IEEE computer society
DOI: 10.1109/jcdl.2019.00015

Towards Data-Driven Generation of Visualizations for Automatically Generated News Articles

Autorzy: Rola Alhalaseh, Myriam Munezero, Miika Leinonen, Leo Leppänen, Jari Avikainen, Hannu Toivonen
Opublikowane w: Proceedings of the 22nd International Academic Mindtrek Conference on - Mindtrek '18, Numer yearly, 2018, Strona(/y) 100-109, ISBN 9781-450365895
Wydawca: ACM Press
DOI: 10.1145/3275116.3275131

An Analysis of the Performance of Named Entity Recognition over OCRed Documents

Autorzy: Hamdi, Ahmed; Jean-Caurant, Axel; Sidere, Nicolas; Coustaty, Mickael; Doucet, Antoine
Opublikowane w: 2019 JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), Urbana-Champaign, Illinois, June 2-6, 2019, Numer yearly, 2019, Strona(/y) 333-334, ISBN 978-1-7281-1547-4
Wydawca: IEEE computer society
DOI: 10.1109/jcdl.2019.00057

Impact Analysis of Document Digitization on Event Extraction

Autorzy: Nhu Khoa Nguyen; Emanuela Boroş; Gaël Lejeune; Antoine Doucet
Opublikowane w: Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020), Numer 2735, 2020, Strona(/y) 17–28
Wydawca: CEUR-WS
DOI: 10.5281/zenodo.4734267

Scalable and Interpretable Semantic Change Detection

Autorzy: Syrielle Montariol; Matej Martinc; Lidia Pivovarova
Opublikowane w: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021, Strona(/y) 4642–4652
Wydawca: ACL
DOI: 10.18653/v1/2021.naacl-main.369

Word Clustering for Historical Newspapers Analysis

Autorzy: Lidia Pivovarova; Jani Marjanen; Elaine Zosa
Opublikowane w: Proceedings of the Workshop on Language Technology for Digital Historical Archives, 2019, Strona(/y) 3-10
Wydawca: ACL Bulgaria
DOI: 10.26615/978-954-452-059-5_002

Multilingual Epidemiological Text Classification: A Comparative Study

Autorzy: Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Adam Jatowt; Gaël Lejeune; Moses Odeo
Opublikowane w: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, Strona(/y) 6172–6183
Wydawca: ACL
DOI: 10.18653/v1/2020.coling-main.543

Impact of OCR Quality on Named Entity Linking

Autorzy: Elvys Linhares Pontes; Ahmed Hamdi; Nicolas Sidere; Antoine Doucet
Opublikowane w: International Conference on Asia-Pacific Digital Libraries 2019, 2019, Strona(/y) 102–115, ISBN 978-3-030-34058-2
Wydawca: Springer
DOI: 10.1007/978-3-030-34058-2_11

Entity Linking for Historical Documents: Challenges and Solutions

Autorzy: Pontes, Elvys Linhares; Cabrera-Diego, Luis Adrián; Moreno, José G.; Boros, Emanuela; Pontes, Elvys,; Hamdi, Ahmed; Sidère, Nicolas; Coustaty, Mickaël; Doucet, Antoine
Opublikowane w: Proceedings of the 22nd International Conference on Asia-Pacific Digital Libraries (ICADL 2020), Numer 12504, 2020, Strona(/y) 215–231, ISBN 9783030644512
Wydawca: Springer
DOI: 10.1007/978-3-030-64452-9_19

Clustering Ideological Terms in Historical Newspaper Data with Diachronic Word Embeddings

Autorzy: Jani Pekka Marjanen; Lidia Pivovarova; Elaine Zosa; Jussi Kurunmäki
Opublikowane w: HistoInformatics 2019: International Workshop on Computational History 2019, part of TPDL 2019, 2019
Wydawca: Springer
DOI: 10.5281/zenodo.3689466

Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise

Autorzy: Elaine Zosa, Stephen Mutuvi, Mark Granroth-Wilding, Antoine Doucet
Opublikowane w: International Conference on Asian Digital Libraries (ICADL), 2021, ISBN 978-3-030-91668-8
Wydawca: Springer
DOI: 10.1007/978-3-030-91669-5_30

Topic Modelling Discourse Dynamics in Historical Newspapers

Autorzy: Marjanen, Jani; Zosa, Elaine; Hengchen, Simon; Pivovarova, Lidia; Tolonen, Mikko
Opublikowane w: Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020), 2020, Strona(/y) 63-77
Wydawca: CEUR-WS
DOI: 10.5281/zenodo.5648114

Benchmarks for Unsupervised Discourse Change Detection

Autorzy: Duong, Quan; Pivovarova, Lidia; Zosa, Elaine
Opublikowane w: Proceedings of the 6th International Workshop on Computational History (HistoInformatics 2021), Numer 2981, 2021
Wydawca: Springer
DOI: 10.5281/zenodo.5780033

Capturing Evolution in Word Usage: Just Add More Clusters?

Autorzy: Matej Martinc; Syrielle Montariol; Elaine Zosa; Lidia Pivovarova
Opublikowane w: WWW '20: Companion Proceedings of the Web Conference 2020, 2020, Strona(/y) 343-349
Wydawca: ACM
DOI: 10.1145/3366424.3382186

A Dataset for Multi-lingual Epidemiological Event Extraction

Autorzy: Mutuvi, Stephen; Doucet, Antoine; Lejeune, Gael; Odeo, Moses
Opublikowane w: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, Strona(/y) 4139–4144
Wydawca: European Language Resources Association
DOI: 10.5281/zenodo.3709626

Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

Autorzy: Elaine Zosa; Ravi Shekhar; Mladen Karan; Matthew Purver
Opublikowane w: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021, Strona(/y) 1652–1662
Wydawca: RANLP
DOI: 10.5281/zenodo.5648098

EMBEDDIA at SemEval-2022 Task 8: Investigating Sentence, Image, and Knowledge Graph Representations for Multilingual News Article Similarity

Autorzy: Elaine Zosa, Emanuela Boros, Boshko Koloski, Lidia Pivovarova
Opublikowane w: Proceedings of SemEval-2022 Workshop Task 8, 2022
Wydawca: ACL
DOI: 10.5281/zenodo.6369944

Token-Level Multilingual Epidemic Dataset for Event Extraction

Autorzy: Stephen Mutuvi; Stephen Mutuvi; Emanuela Boros; Antoine Doucet; Gaël Lejeune; Adam Jatowt; Moses Odeo
Opublikowane w: Proceedings of the 25th International Conference on Theory and Practice of Digital Libraries (TPDL), Numer 12866, 2021, Strona(/y) 55–59
Wydawca: Springer
DOI: 10.5281/zenodo.5780019

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Autorzy: Johannes Michael, Roger Labahn, Tobias Gruning, Jochen Zollner
Opublikowane w: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, Strona(/y) 1286-1293, ISBN 978-1-7281-3014-9
Wydawca: IEEE
DOI: 10.1109/icdar.2019.00208

L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers

Autorzy: Nhu Khoa Nguyen; Emanuela Boros; Gaël Lejeune; Antoine Doucet; Thierry Delahaut
Opublikowane w: Companion Proceedings of the Web Conference, 2020, Strona(/y) 302–306
Wydawca: ACM
DOI: 10.5281/zenodo.4734321

The Helsinki Digital Humanities Hackathon: Two Perspectives on Multidisciplinary Historical Newspapers Research in a Hackathon Context

Autorzy: Ros, Ruben; Oberbichler, Sarah
Opublikowane w: Proceedings of the Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020, 2020, Strona(/y) 66–74
Wydawca: Institute of Literature, Folklore and Art
DOI: 10.5281/zenodo.3689228

Multilingual Topic Labelling of News Topics using Ontological Mapping

Autorzy: Elaine Zosa, Lidia Pivovarova, Michele Boggia, Sardana Ivanova
Opublikowane w: European Conference on Information Retrieval (ECIR), 2022
Wydawca: Springer
DOI: 10.5281/zenodo.6334491

Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie

Autorzy: Mutuvi, Stephen; Boros, Emanuela; Doucet, Antoine; Lejeune, Gaël; Jatowt, Adam; Odeo, Moses
Opublikowane w: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, 2021
Wydawca: ARIA
DOI: 10.5281/zenodo.4734471

A Comprehensive Extraction of Relevant Real-World-Event Qualifiers for Semantic Search Engines

Autorzy: Guillaume Bernard, Cyrille Suire, Cyril Faucher, Antoine Doucet
Opublikowane w: International Conference on Theory and Practice of Digital Libraries (TPDL), 2021, Strona(/y) 153-164, ISBN 978-3-030-86323-4
Wydawca: Springer
DOI: 10.1007/978-3-030-86324-1_19

A Method for Wavelet-Based Time Series Analysis of Historical Newspapers

Autorzy: Avikainen, Jari
Opublikowane w: 2019
Wydawca: University of Helsinki
DOI: 10.5281/zenodo.3628262

"""Wir dürfen wieder Österreicher sein!"" Die Rolle der Tagespresse in österreichischen Nation-Building-Prozessen 1945–1948 – eine quantitative Analyse ausgewählter digitaler Zeitungskorpora samt Vorschlägen zur didaktischen Umsetzung"

Autorzy: Stefan Patrick Hechl
Opublikowane w: 2021
Wydawca: Universität Innsbruck
DOI: 10.5281/zenodo.4468295

Wortvektoren

Autorzy: Laasch, Bastian Marc
Opublikowane w: 2018
Wydawca: University of Rostock
DOI: 10.18453/rosdok_id00002309

Embeddings built on 19th century newspapers from Finland

Autorzy: Lidia Pivovarova, Elaine Zosa, Jani Marjanen
Opublikowane w: 2019
Wydawca: Zenodo
DOI: 10.5281/zenodo.3557480

Doing historical research with digital newspapers – perspectives of DH scholars

Autorzy: Sarah Oberbichler, Eva Pfanzelter, Stefan Hechl, Jani Marjanen
Opublikowane w: Europeana Tech, Numer Numer 16: Newspapers, 2021
Wydawca: Europeana

Using LDA and Jensen-Shannon Distance (JSD) to group similar newspaper articles

Autorzy: Sarah Oberbichler
Opublikowane w: 2020
Wydawca: Zenodo
DOI: 10.5281/zenodo.3887193

The Book of Abstracts for What’s Past is Prologue: The NewsEye International Conference.

Autorzy: Antti Kanner, Eetu Mäkelä, Jani Marjanen, Mikko Tolonen, Sarah Oberbichler, Quan Duong, Lidia Pivovarova, Dilawar Ali, Steven Verstockt, Étienne Ollion, Rubing Shen, Matthias Arnold, David Brown, Raven Adam, Saranya Balasubramanian, Vera Maria Charvat, Manfred Füllsack, Jörn Kleinert, Hanna Misera, Nenad Pantelic, Jakob Sonnberger, Georg Vogelor, Alessandra De Mulder, Heikki K
Opublikowane w: 2021
Wydawca: Zenodo
DOI: 10.5281/zenodo.5167375

Covid-19 et grippe espagnole: Quand la presse du XXe siècle rappelle celle de 2020

Autorzy: Nejma Omari, Antoine Doucet
Opublikowane w: 2020
Wydawca: The Conversation

Annotation Guidelines for Named Entity Recognition, Entity Linking and Stance Detection (v3.1)

Autorzy: Ahmed Hamdi, Elvys Linhares Pontes, Antoine Doucet
Opublikowane w: 2021
Wydawca: Zenodo
DOI: 10.5281/zenodo.4574199

NewsEye Policy Brief

Autorzy: NewsEye consortium
Opublikowane w: 2020
Wydawca: Zenodo
DOI: 10.5281/zenodo.4291895

Assessing the Impact of OCR Noise on Multilingual Event Detection over Digitised Documents

Autorzy: Emanuela Boros, Nhu Khoa Nguyen, Gaël Lejeune, Antoine Doucet
Opublikowane w: International Journal on Digital Libraries, Numer 14325012, 2022, ISSN 1432-5012
Wydawca: Springer Verlag
DOI: 10.1007/s00799-022-00325-2

The expansion of isms, 1820-1917: Data-driven analysis of political language in digitized newspaper collections

Autorzy: Jani Marjanen; Jussi Antero Kurunmäki; Lidia Pivovarova; Elaine Zosa
Opublikowane w: Journal of Data Mining & Digital Humanities, HistoInformatics, Numer 6159, 2020, ISSN 2416-5999
Wydawca: EPIsciences
DOI: 10.5281/zenodo.4447025

A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

Autorzy: Linhares Pontes, Elvys; Huet, Stéphane; Torres Moreno, Juan Manuel; Gouveia da Silva, Thiago; Carneiro Linhares, Andréa
Opublikowane w: Computación y Sistemas, Numer 24 (2), 2020, ISSN 2007-9737
Wydawca: IPN
DOI: 10.13053/cys-24-2-3335

Integrated interdisciplinary workflows for research on historical newspapers: Perspectives from humanities scholars, computer scientists, and librarians

Autorzy: Sarah Oberbichler; Emanuela Boros; Antoine Doucet; Jani Marjanen; Eva Pfanzelter; Juha Rautiainen; Hannu Toivonen; Mikko Tolonen
Opublikowane w: Journal of the Association for Information Science and Technology, Numer 73 (2), 2022, Strona(/y) 225–239, ISSN 2330-1643
Wydawca: John Wiley and Sons Ltd
DOI: 10.1002/asi.24565

In Depth Analysis of the Impact of OCR Errors on Named Entity Recognition and Linking

Autorzy: Ahmed Hamdi, Evlys Linhares Pontes, Nicolas Sidère, Mickaël Coustaty, Antoine Doucet
Opublikowane w: Natural Language Engineering, 2022, Strona(/y) 1-24, ISSN 1351-3249
Wydawca: Cambridge University Press
DOI: 10.1017/s1351324922000110

Digital interfaces of historical newspapers: opportunities, restrictions and recommendations

Autorzy: Eva Pfanzelter; Sarah Oberbichler; Jani Marjanen; Pierre-Carl Langlais; Stefan Hechl
Opublikowane w: Journal of Data Mining and Digital Humanities, Volume on HistoInformatics, Numer 6121, 2021, ISSN 2416-5999
Wydawca: EPIsciences
DOI: 10.5281/zenodo.4446818

Als eine andere Epidemie die Welt in Atem hielt: Die Spanische Grippe 1918/19 in der österreichischen Presse

Autorzy: Sarah Oberbichler, Stefan Hechl, Eva Pfanzelter
Opublikowane w: Tiroler Chronist - Fachblatt von und für Chronisten in Nord-, Süd- und Osttirol, Numer 154, 2020, Strona(/y) 15-22, ISSN 1990-9799
Wydawca: Tiroler Bildungsforum

A data-driven approach to studying changing vocabularies in historical newspaper collections

Autorzy: Hengchen, Simon; Ros, Ruben; Marjanen, Jani; Tolonen, Mikko
Opublikowane w: Digital Scholarship in the Humanities, Numer 36, 2021, Strona(/y) 109–126, ISSN 2055-7671
Wydawca: Oxford University Press
DOI: 10.5281/zenodo.5783070

Survey of Post-OCR Processing Approaches

Autorzy: Thi Tuyet Hai Nguyen; Adam Jatowt; Mickaël Coustaty; Antoine Doucet
Opublikowane w: ACM Computing Surveys, Numer 54(6), 2022, Strona(/y) 1–37, ISSN 0360-0300
Wydawca: Association for Computing Machinary, Inc.
DOI: 10.1145/3453476

A National Public Sphere? Analyzing the Language, Location, and Form of Newspapers in Finland, 1771–1917

Autorzy: Jani Marjanen; Villle Vaara; Antti Kanner; Hege Roivainen; Eetu Mäkelä; Leo Lahti; Mikko Tolonen
Opublikowane w: Journal of European Periodical Studies, Numer 4 (1), 2019, Strona(/y) 55–78, ISSN 2506-6587
Wydawca: ESPRit (European Society for Periodical Research)
DOI: 10.21825/jeps.v4i1.10483

MELHISSA: a multilingual entity linking architecture for historical press articles

Autorzy: Elvys Linhares Pontes; Luis Adrián Cabrera-Diego; Jose G. Moreno; Emanuela Boros; Ahmed Hamdi; Antoine Doucet; Nicolas Sidere; Mickaël Coustaty
Opublikowane w: International Journal on Digital Libraries, 2021, ISSN 1432-5012
Wydawca: Springer Verlag
DOI: 10.1007/s00799-021-00319-6

Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods

Autorzy: Sarah Oberbichler, Eva Pfanzelter
Opublikowane w: Journal of Digital History, 2021
Wydawca: De Gruyter

Tracing Discourses in Digital Newspaper Collections: A Contribution to Digital Hermeneutics while Investigating 'Return Migration' in Historical Press Coverage

Autorzy: Sarah Oberbichler, Eva Pfanzelter
Opublikowane w: Digitised Newspapers – A New Eldorado for Historians?, 2022, ISBN 9783110729214
Wydawca: De Gruyter Oldenbourg

Crossing or Intersecting the Emperor’s Desk with digitized Newspaper Data: Entity-source-networks in the late Habsburg Empire

Autorzy: Martin Gasteiner, Andreas Enderlin
Opublikowane w: Digitised Newspapers – A New Eldorado for Historians?, 2022, ISBN 9783110729214
Wydawca: De Gruyter Oldenbourg

ICPR 2020 Competition on Text Block Segmentation on a NewsEye Dataset

Autorzy: Johannes Michael; Max Weidemann; Bastian Laasch; Roger Labahn
Opublikowane w: Proceedings of ICPR International Workshops and Challenges (2020), Numer 12668, 2021, Strona(/y) 405–418
Wydawca: Springer
DOI: 10.1007/978-3-030-68793-9_30

International: From Legal to Civic Discourse and Beyond in the Nineteenth Century

Autorzy: Jani Marjanen, Ruben Ros
Opublikowane w: Nationalism and Internationalism Intertwined - A European History of Concepts Beyond the Nation State, 2022, Strona(/y) 60-85, ISBN 978-1-80073-314-5
Wydawca: Berghahn

Adaptive Edit-Distance and Regression Approach for Post-OCR Text Correction

Autorzy: Thi-Tuyet-Hai Nguyen, Mickael Coustaty, Antoine Doucet, Adam Jatowt, Nhu-Van Nguyen
Opublikowane w: Maturity and Innovation in Digital Libraries - 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings, Numer 11279, 2018, Strona(/y) 278-289, ISBN 978-3-030-04256-1
Wydawca: Springer International Publishing
DOI: 10.1007/978-3-030-04257-8_29

Evaluating the Impact of OCR Errors on Topic Modeling

Autorzy: Stephen Mutuvi, Antoine Doucet, Moses Odeo, Adam Jatowt
Opublikowane w: Maturity and Innovation in Digital Libraries - 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings, Numer 11279, 2018, Strona(/y) 3-14, ISBN 978-3-030-04256-1
Wydawca: Springer International Publishing
DOI: 10.1007/978-3-030-04257-8_1

National Sentiment: Nation Building and Emotional Language in Nineteenth-Century Finland

Autorzy: Jani Marjanen
Opublikowane w: Lived Nation as the History of Experiences and Emotions in Finland, 1800-2000, 2021, Strona(/y) 61–83, ISBN 978-3-030-69881-2
Wydawca: Springer
DOI: 10.1007/978-3-030-69882-9_3

Wyszukiwanie danych OpenAIRE...

Podczas wyszukiwania danych OpenAIRE wystąpił błąd

Brak wyników