Position within the page tree

Home
Research
Diversity-Aware NLP Intelligent Systems (DANIS) - Group Faleńska

Diversity-Aware NLP Intelligent Systems

Independent Research Group

Reflecting Intelligent Systems for Diversity, Demography, and Democracy (IRIS3D)

Diversity-Aware NLP Intelligent Systems (DANIS)

Project focus: Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language. An NLP Intelligent System is a tool, such as a search engine, voice assistant, or chatbot, designed to assist people by processing and responding to their requests in a natural and seemingly intelligent way.

The Diversity-Aware NLP Intelligent Systems group (DANIS) is focused on making these tools inclusive and fair. Our main motivation is recognizing that people communicate in diverse ways based on their life experiences, communication contexts, or personal preferences. For instance, individuals may express their gender identity through language, construct arguments in ways that resonate with them, or speak in specific dialects. Since standard NLP systems are often not equipped to recognize or accommodate these linguistic variations, they can discriminate against people from underrepresented groups or those who use non-standard linguistic expressions. DANIS explores how to model such linguistic phenomena computationally and design NLP intelligent systems that treat all users equally, regardless of how they communicate.
Duration: January 2023 - December 2026
Cooperation: SRF IRIS
Funding: The project is funded by the Ministry of Science, Research and the Arts of the State of Baden-Württemberg.

Publications

2024
1. Knuples, Urban, Agnieszka Falenska und Filip Miletić. 2024. Gender Identity in Pretrained Language Models: An Inclusive Approach to Data Creation and Probing. In: Findings of the Association for Computational Linguistics: EMNLP 2024, hg. von Yaser Al-Onaizan, Mohit Bansal, und Yun-Nung Chen, 11612–11631. Miami, Florida, USA: Association for Computational Linguistics, November. https://aclanthology.org/2024.findings-emnlp.680.
  Abstract
  Pretrained language models (PLMs) have been shown to encode binary gender information of text authors, raising the risk of skewed representations and downstream harms. This effect is yet to be examined for transgender and non-binary identities, whose frequent marginalization may exacerbate harmful system behaviors. Addressing this gap, we first create TRANsCRIPT, a corpus of YouTube transcripts from transgender, cisgender, and non-binary speakers. Using this dataset, we probe various PLMs to assess if they encode the gender identity information, examining both frozen and fine-tuned representations as well as representations for inputs with author-specific words removed. Our findings reveal that PLM representations encode information for all gender identities but to different extents. The divergence is most pronounced for cis women and non-binary individuals, underscoring the critical need for gender-inclusive approaches to NLP systems.
  BibTeX
  @inproceedings{knuples-etal-2024-gender, abstract = {Pretrained language models (PLMs) have been shown to encode binary gender information of text authors, raising the risk of skewed representations and downstream harms. This effect is yet to be examined for transgender and non-binary identities, whose frequent marginalization may exacerbate harmful system behaviors. Addressing this gap, we first create TRANsCRIPT, a corpus of YouTube transcripts from transgender, cisgender, and non-binary speakers. Using this dataset, we probe various PLMs to assess if they encode the gender identity information, examining both frozen and fine-tuned representations as well as representations for inputs with author-specific words removed. Our findings reveal that PLM representations encode information for all gender identities but to different extents. The divergence is most pronounced for cis women and non-binary individuals, underscoring the critical need for gender-inclusive approaches to NLP systems.}, address = {Miami, Florida, USA}, author = {Knuple{\v{s}}, Urban and Falenska, Agnieszka and Mileti{\'c}, Filip}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, month = {11}, pages = {11612--11631}, publisher = {Association for Computational Linguistics}, title = {Gender Identity in Pretrained Language Models: An Inclusive Approach to Data Creation and Probing}, url = {https://aclanthology.org/2024.findings-emnlp.680}, year = 2024 }
  Link
  https://aclanthology.org/2024.findings-emnlp.680
2. Kaiser, Jens und Agnieszka Falenska. 2024. How to Translate SQuAD to German? A Comparative Study of Answer Span Retrieval Methods for Question Answering Dataset Creation. In: Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), hg. von Pedro Henrique Luz de Araujo, Andreas Baumann, Dagmar Gromann, Brigitte Krenn, Benjamin Roth, und Michael Wiegand, 134–140. Vienna, Austria: Association for Computational Linguistics, September. https://aclanthology.org/2024.konvens-main.15.
  - BibTeX
  - Link
  BibTeX
  @inproceedings{kaiser-falenska-2024-translate, address = {Vienna, Austria}, author = {Kaiser, Jens and Falenska, Agnieszka}, booktitle = {Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)}, editor = {Luz de Araujo, Pedro Henrique and Baumann, Andreas and Gromann, Dagmar and Krenn, Brigitte and Roth, Benjamin and Wiegand, Michael}, month = {09}, pages = {134--140}, publisher = {Association for Computational Linguistics}, title = {How to Translate {SQ}u{AD} to {G}erman? A Comparative Study of Answer Span Retrieval Methods for Question Answering Dataset Creation}, url = {https://aclanthology.org/2024.konvens-main.15}, year = 2024 }
  Link
  https://aclanthology.org/2024.konvens-main.15
3. Go, Paul und Agnieszka Falenska. 2024. Is there Gender Bias in Dependency Parsing? Revisiting ``Women’s Syntactic Resilience″. In: Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), hg. von Agnieszka Faleńska, Christine Basta, Marta Costa jussà, Seraphina Goldfarb-Tarrant, und Debora Nozza, 269–279. Bangkok, Thailand: Association for Computational Linguistics, August. https://aclanthology.org/2024.gebnlp-1.17.
  Abstract
  In this paper, we revisit the seminal work of Garimella et al. 2019, who reported that dependency parsers learn demographically-related signals from their training data and perform differently on sentences authored by people of different genders. We re-run all the parsing experiments from Garimella et al. 2019 and find that their results are not reproducible. Additionally, the original patterns suggesting the presence of gender biases fail to generalize to other treebank and parsing architecture. Instead, our data analysis uncovers methodological shortcomings in the initial study that artificially introduced differences into female and male datasets during preprocessing. These disparities potentially compromised the validity of the original conclusions.
  BibTeX
  @inproceedings{go-falenska-2024-gender, abstract = {In this paper, we revisit the seminal work of Garimella et al. 2019, who reported that dependency parsers learn demographically-related signals from their training data and perform differently on sentences authored by people of different genders. We re-run all the parsing experiments from Garimella et al. 2019 and find that their results are not reproducible. Additionally, the original patterns suggesting the presence of gender biases fail to generalize to other treebank and parsing architecture. Instead, our data analysis uncovers methodological shortcomings in the initial study that artificially introduced differences into female and male datasets during preprocessing. These disparities potentially compromised the validity of the original conclusions.}, address = {Bangkok, Thailand}, author = {Go, Paul and Falenska, Agnieszka}, booktitle = {Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)}, editor = {Fale{\'n}ska, Agnieszka and Basta, Christine and Costa juss{\`a}, Marta and Goldfarb-Tarrant, Seraphina and Nozza, Debora}, month = {08}, pages = {269--279}, publisher = {Association for Computational Linguistics}, title = {Is there Gender Bias in Dependency Parsing? Revisiting {``}Women{'}s Syntactic Resilience{''}}, url = {https://aclanthology.org/2024.gebnlp-1.17}, year = 2024 }
  Link
  https://aclanthology.org/2024.gebnlp-1.17
4. Costa jussà, Marta, Pierre Andrews, Christine Basta, Juan Ciro, Agnieszka Falenska, Seraphina Goldfarb-Tarrant, Rafael Mosquera, Debora Nozza und Eduardo Sánchez. 2024. Overview of the Shared Task on Machine Translation Gender Bias Evaluation with Multilingual Holistic Bias. In: Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), hg. von Agnieszka Faleńska, Christine Basta, Marta Costa jussà, Seraphina Goldfarb-Tarrant, und Debora Nozza, 399–404. Bangkok, Thailand: Association for Computational Linguistics, August. https://aclanthology.org/2024.gebnlp-1.26.
  Abstract
  We describe the details of the Shared Task of the 5th ACL Workshop on Gender Bias in Natural Language Processing (GeBNLP 2024). The task uses dataset to investigate the quality of Machine Translation systems on a particular case of gender robustness. We report baseline results as well as the results of the first participants. The shared task will be permanently available in the Dynabench platform.
  BibTeX
  @inproceedings{costa-jussa-etal-2024-overview, abstract = {We describe the details of the Shared Task of the 5th ACL Workshop on Gender Bias in Natural Language Processing (GeBNLP 2024). The task uses dataset to investigate the quality of Machine Translation systems on a particular case of gender robustness. We report baseline results as well as the results of the first participants. The shared task will be permanently available in the Dynabench platform.}, address = {Bangkok, Thailand}, author = {Costa juss{\`a}, Marta and Andrews, Pierre and Basta, Christine and Ciro, Juan and Falenska, Agnieszka and Goldfarb-Tarrant, Seraphina and Mosquera, Rafael and Nozza, Debora and S{\'a}nchez, Eduardo}, booktitle = {Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)}, editor = {Fale{\'n}ska, Agnieszka and Basta, Christine and Costa juss{\`a}, Marta and Goldfarb-Tarrant, Seraphina and Nozza, Debora}, month = {08}, pages = {399--404}, publisher = {Association for Computational Linguistics}, title = {Overview of the Shared Task on Machine Translation Gender Bias Evaluation with Multilingual Holistic Bias}, url = {https://aclanthology.org/2024.gebnlp-1.26}, year = 2024 }
  Link
  https://aclanthology.org/2024.gebnlp-1.26
5. Dönmez, Esra, Thang Vu und Agnieszka Falenska. 2024. Please note that I’m just an AI: Analysis of Behavior Patterns of LLMs in (Non-)offensive Speech Identification. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, hg. von Yaser Al-Onaizan, Mohit Bansal, und Yun-Nung Chen, 18340–18357. Miami, Florida, USA: Association for Computational Linguistics, November. https://aclanthology.org/2024.emnlp-main.1019.
  Abstract
  Offensive speech is highly prevalent on online platforms. Being trained on online data, Large Language Models (LLMs) display undesirable behaviors, such as generating harmful text or failing to recognize it. Despite these shortcomings, the models are becoming a part of our everyday lives by being used as tools for information search, content creation, writing assistance, and many more. Furthermore, the research explores using LLMs in applications with immense social risk, such as late-life companions and online content moderators. Despite the potential harms from LLMs in such applications, whether LLMs can reliably identify offensive speech and how they behave when they fail are open questions. This work addresses these questions by probing sixteen widely used LLMs and showing that most fail to identify (non-)offensive online language. Our experiments reveal undesirable behavior patterns in the context of offensive speech detection, such as erroneous response generation, over-reliance on profanity, and failure to recognize stereotypes. Our work highlights the need for extensive documentation of model reliability, particularly in terms of the ability to detect offensive language.
  BibTeX
  @inproceedings{donmez-etal-2024-please, abstract = {Offensive speech is highly prevalent on online platforms. Being trained on online data, Large Language Models (LLMs) display undesirable behaviors, such as generating harmful text or failing to recognize it. Despite these shortcomings, the models are becoming a part of our everyday lives by being used as tools for information search, content creation, writing assistance, and many more. Furthermore, the research explores using LLMs in applications with immense social risk, such as late-life companions and online content moderators. Despite the potential harms from LLMs in such applications, whether LLMs can reliably identify offensive speech and how they behave when they fail are open questions. This work addresses these questions by probing sixteen widely used LLMs and showing that most fail to identify (non-)offensive online language. Our experiments reveal undesirable behavior patterns in the context of offensive speech detection, such as erroneous response generation, over-reliance on profanity, and failure to recognize stereotypes. Our work highlights the need for extensive documentation of model reliability, particularly in terms of the ability to detect offensive language.}, address = {Miami, Florida, USA}, author = {D{\"o}nmez, Esra and Vu, Thang and Falenska, Agnieszka}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, month = {11}, pages = {18340--18357}, publisher = {Association for Computational Linguistics}, title = {Please note that {I}{'}m just an {AI}: Analysis of Behavior Patterns of {LLM}s in (Non-)offensive Speech Identification}, url = {https://aclanthology.org/2024.emnlp-main.1019}, year = 2024 }
  Link
  https://aclanthology.org/2024.emnlp-main.1019
6. Erhard, Lukas, Sara Hanke, Uwe Remer, Agnieszka Falenska und Raphael Heiko Heiberger. 2024. PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag. Political Analysis: 1–17. doi:DOI: 10.1017/pan.2024.12, https://www.cambridge.org/core/article/popbert-detecting-populism-and-its-host-ideologies-in-the-german-bundestag/06C14C50B50D5A7AB45C4A7C8A5AD945.
  Abstract
  The rise of populism concerns many political scientists and practitioners, yet the detection of its underlying language remains fragmentary. This paper aims to provide a reliable, valid, and scalable approach to measure populist rhetoric. For that purpose, we created an annotated dataset based on parliamentary speeches of the German Bundestag (2013–2021). Following the ideational definition of populism, we label moralizing references to “the virtuous people” or “the corrupt elite” as core dimensions of populist language. To identify, in addition, how the thin ideology of populism is “thickened,” we annotate how populist statements are attached to left-wing or right-wing host ideologies. We then train a transformer-based model (PopBERT) as a multilabel classifier to detect and quantify each dimension. A battery of validation checks reveals that the model has a strong predictive accuracy, provides high qualitative face validity, matches party rankings of expert surveys, and detects out-of-sample text snippets correctly. PopBERT enables dynamic analyses of how German-speaking politicians and parties use populist language as a strategic device. Furthermore, the annotator-level data may also be applied in cross-domain applications or to develop related classifiers.
  BibTeX
  @article{erhard2024popbert, abstract = {The rise of populism concerns many political scientists and practitioners, yet the detection of its underlying language remains fragmentary. This paper aims to provide a reliable, valid, and scalable approach to measure populist rhetoric. For that purpose, we created an annotated dataset based on parliamentary speeches of the German Bundestag (2013–2021). Following the ideational definition of populism, we label moralizing references to “the virtuous people” or “the corrupt elite” as core dimensions of populist language. To identify, in addition, how the thin ideology of populism is “thickened,” we annotate how populist statements are attached to left-wing or right-wing host ideologies. We then train a transformer-based model (PopBERT) as a multilabel classifier to detect and quantify each dimension. A battery of validation checks reveals that the model has a strong predictive accuracy, provides high qualitative face validity, matches party rankings of expert surveys, and detects out-of-sample text snippets correctly. PopBERT enables dynamic analyses of how German-speaking politicians and parties use populist language as a strategic device. Furthermore, the annotator-level data may also be applied in cross-domain applications or to develop related classifiers.}, author = {Erhard, Lukas and Hanke, Sara and Remer, Uwe and Falenska, Agnieszka and Heiberger, Raphael Heiko}, booktitle = {Political Analysis}, doi = {DOI: 10.1017/pan.2024.12}, issn = {10471987}, pages = {1-17--}, publisher = {Cambridge University Press}, title = {PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag}, url = {https://www.cambridge.org/core/article/popbert-detecting-populism-and-its-host-ideologies-in-the-german-bundestag/06C14C50B50D5A7AB45C4A7C8A5AD945}, year = 2024 }
  Link
  https://www.cambridge.org/core/article/popbert-detecting-populism-and-its-host-ideologies-in-the-german-bundestag/06C14C50B50D5A7AB45C4A7C8A5AD945
7. Faleńska, Agnieszka, Christine Basta, Marta Costa jussà, Seraphina Goldfarb-Tarrant und Debora Nozza, Hrsg.O. A. 2024. Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP). Bangkok, Thailand: Association for Computational Linguistics. https://aclanthology.org/2024.gebnlp-1.0.
  - BibTeX
  - Link
  BibTeX
  @proceedings{gebnlp-2024-gender, address = {Bangkok, Thailand}, editor = {Fale{\'n}ska, Agnieszka and Basta, Christine and Costa juss{\`a}, Marta and Goldfarb-Tarrant, Seraphina and Nozza, Debora}, month = {08}, publisher = {Association for Computational Linguistics}, title = {Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)}, url = {https://aclanthology.org/2024.gebnlp-1.0}, year = 2024 }
  Link
  https://aclanthology.org/2024.gebnlp-1.0
8. Falenska, Agnieszka, Eva Maria Vecchi und Gabriella Lapesa. 2024. Self-reported Demographics and Discourse Dynamics in a Persuasive Online Forum. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), hg. von Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, und Nianwen Xue, 14606–14621. Torino, Italia: ELRA and ICCL, Mai. https://aclanthology.org/2024.lrec-main.1272.
  Abstract
  Research on language as interactive discourse underscores the deliberate use of demographic parameters such as gender, ethnicity, and class to shape social identities. For example, by explicitly disclosing one's information and enforcing one's social identity to an online community, the reception by and interaction with the said community is impacted, e.g., strengthening one's opinions by depicting the speaker as credible through their experience in the subject. Here, we present a first thorough study of the role and effects of self-disclosures on online discourse dynamics, focusing on a pervasive type of self-disclosure: author gender. Concretely, we investigate the contexts and properties of gender self-disclosures and their impact on interaction dynamics in an online persuasive forum, ChangeMyView. Our contribution is twofold. At the level of the target phenomenon, we fill a research gap in the understanding of the impact of these self-disclosures on the discourse by bringing together features related to forum activity (votes, number of comments), linguistic/stylistic features from the literature, and discourse topics. At the level of the contributed resource, we enrich and release a comprehensive dataset that will provide a further impulse for research on the interplay between gender disclosures, community interaction, and persuasion in online discourse.
  BibTeX
  @inproceedings{falenska-etal-2024-self-reported, abstract = {Research on language as interactive discourse underscores the deliberate use of demographic parameters such as gender, ethnicity, and class to shape social identities. For example, by explicitly disclosing one{'}s information and enforcing one{'}s social identity to an online community, the reception by and interaction with the said community is impacted, e.g., strengthening one{'}s opinions by depicting the speaker as credible through their experience in the subject. Here, we present a first thorough study of the role and effects of self-disclosures on online discourse dynamics, focusing on a pervasive type of self-disclosure: author gender. Concretely, we investigate the contexts and properties of gender self-disclosures and their impact on interaction dynamics in an online persuasive forum, ChangeMyView. Our contribution is twofold. At the level of the target phenomenon, we fill a research gap in the understanding of the impact of these self-disclosures on the discourse by bringing together features related to forum activity (votes, number of comments), linguistic/stylistic features from the literature, and discourse topics. At the level of the contributed resource, we enrich and release a comprehensive dataset that will provide a further impulse for research on the interplay between gender disclosures, community interaction, and persuasion in online discourse.}, address = {Torino, Italia}, author = {Falenska, Agnieszka and Vecchi, Eva Maria and Lapesa, Gabriella}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}, month = {05}, pages = {14606--14621}, publisher = {ELRA and ICCL}, title = {Self-reported Demographics and Discourse Dynamics in a Persuasive Online Forum}, url = {https://aclanthology.org/2024.lrec-main.1272}, year = 2024 }
  Link
  https://aclanthology.org/2024.lrec-main.1272
9. Chen, Hongyu, Michael Roth und Agnieszka Falenska. 2024. What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction. In: Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), hg. von Agnieszka Faleńska, Christine Basta, Marta Costa jussà, Seraphina Goldfarb-Tarrant, und Debora Nozza, 150–166. Bangkok, Thailand: Association for Computational Linguistics, August. https://aclanthology.org/2024.gebnlp-1.9.
  Abstract
  Authorship Profiling (AP) aims to predict the demographic attributes (such as gender and age) of authors based on their writing styles. Ever-improving models mean that this task is gaining interest and application possibilities. However, with greater use also comes the risk that authors are misclassified more frequently, and it remains unclear to what extent the better models can capture the bias and who is affected by the models' mistakes. In this paper, we investigate three established datasets for AP as well as classical and neural classifiers for this task. Our analyses show that it is often possible to predict the demographic information of the authors based on textual features. However, some features learned by the models are specific to datasets. Moreover, models are prone to errors based on stereotypes associated with topical bias.
  BibTeX
  @inproceedings{chen-etal-2024-go, abstract = {Authorship Profiling (AP) aims to predict the demographic attributes (such as gender and age) of authors based on their writing styles. Ever-improving models mean that this task is gaining interest and application possibilities. However, with greater use also comes the risk that authors are misclassified more frequently, and it remains unclear to what extent the better models can capture the bias and who is affected by the models{'} mistakes. In this paper, we investigate three established datasets for AP as well as classical and neural classifiers for this task. Our analyses show that it is often possible to predict the demographic information of the authors based on textual features. However, some features learned by the models are specific to datasets. Moreover, models are prone to errors based on stereotypes associated with topical bias.}, address = {Bangkok, Thailand}, author = {Chen, Hongyu and Roth, Michael and Falenska, Agnieszka}, booktitle = {Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)}, editor = {Fale{\'n}ska, Agnieszka and Basta, Christine and Costa juss{\`a}, Marta and Goldfarb-Tarrant, Seraphina and Nozza, Debora}, month = {08}, pages = {150--166}, publisher = {Association for Computational Linguistics}, title = {What Can Go Wrong in Authorship Profiling: Cross-Domain Analysis of Gender and Age Prediction}, url = {https://aclanthology.org/2024.gebnlp-1.9}, year = 2024 }
  Link
  https://aclanthology.org/2024.gebnlp-1.9
2023
1. Fanton, Nicola, Agnieszka Falenska und Michael Roth. 2023. How-to Guides for Specific Audiences: A Corpus and Initial Findings. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 321–333. Toronto, Canada: Association for Computational Linguistics, Juli. doi:10.18653/v1/2023.acl-srw.46, https://aclanthology.org/2023.acl-srw.46.
  Abstract
  Instructional texts for specific target groups should ideally take into account the prior knowledge and needs of the readers in order to guide them efficiently to their desired goals. However, targeting specific groups also carries the risk of reflecting disparate social norms and subtle stereotypes. In this paper, we investigate the extent to which how-to guides from one particular platform, wikiHow, differ in practice depending on the intended audience. We conduct two case studies in which we examine qualitative features of texts written for specific audiences. In a generalization study, we investigate which differences can also be systematically demonstrated using computational methods. The results of our studies show that guides from wikiHow, like other text genres, are subject to subtle biases. We aim to raise awareness of these inequalities as a first step to addressing them in future work.
  BibTeX
  @inproceedings{fanton-etal-2023-guides, abstract = {Instructional texts for specific target groups should ideally take into account the prior knowledge and needs of the readers in order to guide them efficiently to their desired goals. However, targeting specific groups also carries the risk of reflecting disparate social norms and subtle stereotypes. In this paper, we investigate the extent to which how-to guides from one particular platform, wikiHow, differ in practice depending on the intended audience. We conduct two case studies in which we examine qualitative features of texts written for specific audiences. In a generalization study, we investigate which differences can also be systematically demonstrated using computational methods. The results of our studies show that guides from wikiHow, like other text genres, are subject to subtle biases. We aim to raise awareness of these inequalities as a first step to addressing them in future work.}, address = {Toronto, Canada}, author = {Fanton, Nicola and Falenska, Agnieszka and Roth, Michael}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)}, doi = {10.18653/v1/2023.acl-srw.46}, month = {07}, pages = {321--333}, publisher = {Association for Computational Linguistics}, title = {How-to Guides for Specific Audiences: A Corpus and Initial Findings}, url = {https://aclanthology.org/2023.acl-srw.46}, year = 2023 }
  Link
  https://aclanthology.org/2023.acl-srw.46