Time: | May 28, 2025, 2:00 p.m. – 3:00 p.m. |
---|---|
Venue: | Room 101 (UN 32.101), ground floor Universitätsstr. 32 (entrance via Universitätsstr. 34) Campus Vaihingen |
Download as iCal: |
|
Large language models (LLMs) can often be trusted to produce honest, harmless responses—yet they are not foolproof. We demonstrate a “deception attack” that fine-tunes LLMs to mislead users on chosen topics while remaining accurate elsewhere. Not only do these deceptive models undermine user trust, but they also exhibit toxic behaviors, including hate speech and harmful stereotypes. Our findings underscore the urgent need for stronger safeguards as LLMs become increasingly integrated into everyday applications.
We send out a newsletter at irregular intervals with information on IRIS events. To make sure you don't miss anything, simply enter your e-mail address. You will shortly receive a confirmation e-mail to make sure that you really are the person who wants to subscribe. After receiving your confirmation, you will be added to the mailing list. This is a hidden mailing list, which means that the subscriber list can only be viewed by the administrator.
Note: It is not possible to process your subscription to the newsletter without providing your e-mail address. The information you provide is voluntary and you can unsubscribe from the newsletter at any time.