The Impact of Data Persistence Bias on Social Media Studies

Elmas, Tugrulcan; ACM

doi:10.1145/3578503.3583630

Elmas, Tugrulcan; ACM

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Social media studies often collect data retrospectively to analyze public opinion. Social media data may decay over time and such decay may prevent the collection of the complete dataset. As a result, the collected dataset may differ from the complete dataset and the study may suffer from data persistence bias. Past research suggests that the datasets collected retrospectively are largely representative of the original dataset in terms of textual content. However, no study analyzed the impact of data persistence bias on social media studies such as those focusing on controversial topics. In this study, we analyze the data persistence and the bias it introduces on the datasets of three types: controversial topics, trending topics, and framing of issues. We report which topics are more likely to suffer from data persistence among these datasets. We quantify the data persistence bias using the change in political orientation, the presence of potentially harmful content and topics as measures. We found that controversial datasets are more likely to suffer from data persistence and they lean towards the political left upon recollection. The turnout of the data that contain potentially harmful content is significantly lower on non-controversial datasets. Overall, we found that the topics promoted by right-aligned users are more likely to suffer from data persistence. Account suspensions are the primary factor contributing to data removals, if not the only one. Our results emphasize the importance of accounting for the data persistence bias by collecting the data in real time when the dataset employed is vulnerable to data persistence bias.

Details

Title The Impact of Data Persistence Bias on Social Media Studies

Author(s) Elmas, Tugrulcan ; ACM

Published in Proceedings Of The 15Th Acm Web Science Conference, Websci 2023

Pages 196-207

Conference 15th ACM Web Science Conference (WebSci), APR 30-MAY 01, 2023, Austin, TX

Date 2023-01-01

Publisher Assoc Computing Machinery, New York

ISBN 979-8-4007-0089-7

Keywords

Data Persistence; Bias; Reproducibility; Social Media; Twitter; Deletions; Datasets; Political Orientation; Sampling

DOI https://doi.org/10.1145/3578503.3583630

Other identifier(s) View record in Web of Science

Laboratories LSIR

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LSIR - Distributed Information Systems Laboratory
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2024-02-20