Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Gonzalez-Gallardo, Carlos-Emiliano; Boros, Emanuela; Girdhar, Nancy; Hamdi, Ahmed; Moreno, Jose G.; Doucet, Antoine; ACM

doi:10.1109/JCDL57899.2023.00034

Gonzalez-Gallardo, Carlos-Emiliano; Boros, Emanuela; Girdhar, Nancy; Hamdi, Ahmed; Moreno, Jose G.; Doucet, Antoine; ACM

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing it in the named entity recognition and classification (NERC) task in primary sources (e.g., historical newspapers and classical commentaries) in a zero-shot manner and by comparing it with state-of-the-art LM-based systems. Our findings indicate several shortcomings in identifying entities in historical text that range from the consistency of entity annotation guidelines, entity complexity, and code-switching, to the specificity of prompting. Moreover, as expected, the inaccessibility of historical archives to the public (and thus on the Internet) also impacts its performance.

Details

Title Yes but.. Can ChatGPT Identify Entities in Historical Documents?

Author(s) Gonzalez-Gallardo, Carlos-Emiliano ; Boros, Emanuela ; Girdhar, Nancy ; Hamdi, Ahmed ; Moreno, Jose G. ; Doucet, Antoine ; ACM

Published in 2023 Acm/Ieee Joint Conference On Digital Libraries, Jcdl

Pages 184-189

Conference 23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL), JUN 26-30, 2023, Santa Fe, NM

Date 2023-01-01

Publisher Assoc Computing Machinery, New York

ISSN 2575-7865
2575-8152

ISBN 979-8-3503-9931-8

Keywords

Named Entity Recognition And Classification; Large Language Models; Generative Pretrained Transformer; Historical Documents

DOI https://doi.org/10.1109/JCDL57899.2023.00034

Other identifier(s) View record in Web of Science

Laboratories DHLAB

Record Appears in Scientific production and competences > CDH - College of Humanities and social sciences > Digital Humanities Institute > DHLAB - Digital Humanities Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Grant ANNA: 2019-1R40226
TER-MITRAD: AAPR2020-2019-8510010
Pypa: AAPR2021-2021-12263410
Nouvelle-Aquitaine Region, France: AAPR2022-2021-17014610

Record creation date 2024-02-20