Time: | January 22, 2025, 2:00 p.m. – 3:00 p.m. |
---|---|
Event language: | English |
Meeting mode: | in presence |
Venue: | U32EGO.131 Universitätsstr. 32 Campus Vaihingen Link: WebEx link to virtually attend the talk |
Download as iCal: |
|
Offensive speech is highly prevalent on online platforms. Being trained on online data, Large Language Models (LLMs) display undesirable behaviors, such as generating harmful text or failing to recognize it. Despite the potential harms from LLMs in such applications, whether LLMs can reliably identify offensive speech and how they behave when they fail are open questions. In this work, we probed sixteen widely used LLMs and showed that most fail to identify (non-)offensive online language. Our experiments reveal undesirable behavior patterns in the context of offensive speech detection, such as erroneous response generation, over-reliance on profanity, and failure to recognize stereotypes.
We send out a newsletter at irregular intervals with information on IRIS events. To make sure you don't miss anything, simply enter your e-mail address. You will shortly receive a confirmation e-mail to make sure that you really are the person who wants to subscribe. After receiving your confirmation, you will be added to the mailing list. This is a hidden mailing list, which means that the subscriber list can only be viewed by the administrator.
Note: It is not possible to process your subscription to the newsletter without providing your e-mail address. The information you provide is voluntary and you can unsubscribe from the newsletter at any time.