What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction

June 1, 2024 / IRIS3D

Hongyu Chen, Dr. Michael Roth, Dr. Agnieszka Falenska

Authorship Profiling (AP) aims to predict the demographic attributes (such as gender and age) of authors based on their writing styles. Ever-improving models mean that this task is gaining interest and application possibilities. However, with greater use also comes the risk that authors are misclassified more frequently, and it remains unclear to what extent the better models can capture the bias and who is affected by the models' mistakes. In this paper, we investigate three established datasets for AP as well as classical and neural classifiers for this task. Our analyses show that it is often possible to predict the demographic information of the authors based on textual features. However, some features learned by the models are specific to datasets. Moreover, models are prone to errors based on stereotypes associated with topical bias.

Contact

Hongyu Chen

Dr. Michael Roth

Dr. Agnieszka Faleńska

To the top of the page