For full functionality of this site it is necessary to enable JavaScript. Here are the instructions how to enable JavaScript in your web browser.

jump to content
jump to footer

Position within the page tree

Home
Events
IRIS Colloquium I Laurène Vaugrante: Compromising Honesty and Harmlessness in Language Models via Deception Attacks

IRIS Colloquium I Laurène Vaugrante: Compromising Honesty and Harmlessness in Language Models via Deception Attacks

May 28, 2025, 2:00 p.m. (CEST)

Our third colloquium this year will be held by Laurène Vaugrante. This is an open, university-internal event for students and interested parties.

Time:	May 28, 2025, 2:00 p.m. – 3:00 p.m.
Venue:	Room 101 (UN 32.101), ground floor Universitätsstr. 32 (entrance via Universitätsstr. 34) Campus Vaihingen
Download as iCal:

Large language models (LLMs) can often be trusted to produce honest, harmless responses—yet they are not foolproof. We demonstrate a “deception attack” that fine-tunes LLMs to mislead users on chosen topics while remaining accurate elsewhere. Not only do these deceptive models undermine user trust, but they also exhibit toxic behaviors, including hate speech and harmful stereotypes. Our findings underscore the urgent need for stronger safeguards as LLMs become increasingly integrated into everyday applications.

The lecture is held in English.

Join us for cake after the colloquium.

[Picture: SRF IRIS / S. Brandes]

IRIS Newsletter Registration

We send out a newsletter at irregular intervals with information on IRIS events. To make sure you don't miss anything, simply enter your e-mail address. You will shortly receive a confirmation e-mail to make sure that you really are the person who wants to subscribe. After receiving your confirmation, you will be added to the mailing list. This is a hidden mailing list, which means that the subscriber list can only be viewed by the administrator.

Note: It is not possible to process your subscription to the newsletter without providing your e-mail address. The information you provide is voluntary and you can unsubscribe from the newsletter at any time.

Newsletter Subscription Page

Past Events

April 2025

IRIS Insights I Nico Formanek: Are hyperparameters vibes?

Event
4/24/25

Film Screening: After Yang & Creative Writing Workshop with Alexander Weinstein

Event
4/16/25

March 2025

IRIS Colloquium I Mara Seyfert: Uncertainty and robustness against persuasion in large language models

Event
3/26/25

February 2025

IRIS Insights I Prof. Zamira Daw: Human-AI Teaming in Cockpit Systems (HAITICS)

Event
2/27/25

January 2025

IRIS Colloquium | Analysis of behavior patterns of LLMs

Event
1/22/25

November 2024

IRIS Fall Symposium

Event
11/28/24

AI and Coffee?

Event
11/22/24

October 2024

IRIS and Friends: Technology and reflection - A Day of Discovery and Interaction

Event
10/17/24

Day for school classes - Science Festival

Event
10/16/24

IRIS Colloquium | Navigating Trust and Distrust in Literary AI Narratives

Event
10/9/24

July 2024

IRIS Colloquium | The Spectrum of Demographics in Natural Language Processing: Moving from Gender Categories to Gender Continuum through Style Variation

Event
7/31/24

Thilo Hagendorff of IRIS3D at Next Frontiers

Event
7/19/24

June 2024

IRIS Colloquium | Evaluating Behavior in Language Models: A Looming Replication Crisis?

Event
6/26/24

Science Festival 2024

Event
6/8/24

New Date! Form and Meaning of New Ways of Remembrance Culture

Event
6/5/24

May 2024

16th ACM Web Science Conference 2024

Event
5/22 – 5/24/24

Brave Conversations

Event
5/21/24

Right to the City 4.0

Event
5/16 – 5/17/24

March 2024

IRIS Colloquium | Engaging Student Diversity in Self-Adaptive Learning Management Systems through Intelligent Tutoring

Event
3/6/24

February 2024

Einsatz digitaler Visualisierungstools im Rahmen von partizipativen Verfahren in der Infrastrukturplanung

Event
2/29/24

The 'Ordinary Magic’ of Resilience in Anglophone Literatures: Past, Present, Futures

Event
2/22 – 2/23/24

January 2024

IRIS Colloquium | Project Presentation by Solange Vega

Event
1/31/24

AI and cultural memory. Technology as a key to the past

Event
1/17/24

December 2023

IRIS Coffee Chat | Online information consumption and political opinion formation from a psychological perspective

Event
12/5/23

November 2023

Archival work in times of AI: technical complexity and democratic challenges

Event
11/29/23

Bots under control: who should regulate AI - and how?

Event
11/8/23

October 2023

IRIS Coffee Chat | Governing Platforms. Internet platforms and social order

Event
10/31/23

IRIS Colloquium | Computational Digital Psychology

Event
10/17/23

September 2023

Between responsible and responsive democratic governance: Imagining intelligent democratic futures in the 21st century

Event
9/12 – 9/13/23

July 2023

Bias in generative AI" with Algoright e.V.

Event
7/18/23

The bots and the teachers - how is AI transforming schools?

Event
7/11/23

June 2023

Learning coach or ghost writer? Academic Work in the Age of AI Chatbots

Event
6/12/23

IRIS Symposium and Poster Session

Event
6/5/23

May 2023

IRIS at the Science Day 2023!

Event
5/13/23

Perspectives on a "key technology" from the history of technology: The German-German AI development between 1960 and 1990

Event
5/2/23

April 2023

Intelligent transformation of energy infrastructures

Event
4/27/23

March 2023

AI and a Future Community

Event
3/17/23

February 2023

Brave New Storyworlds: Literary AI Narratives in Contemporary English Literature

Event
2/20/23

January 2023

Digitisation and surveillance

Event
1/19/23

December 2022

IRIS Coffee Chat | AI for Architecture, Engineering and Construction

Event
12/20/22

The future of citizens' councils

Event
12/13/22

November 2022

IRIS Coffee Chat | Wild bees meet app: Fostering environmental awareness with user-centered technology design

Event
11/29/22

Chances and risks of the platform economy

Event
11/15/22

Data Paradoxes: The Politics of Intensified Data Sourcing in Contemporary Healthcare

Event
11/14/22

Cancelled: Reflecting Intelligent Transformations in Healthcare: What’s Critical?

Event
11/14/22

October 2022

Digital Workshop "Reflection on intelligent systems: towards a cross-disciplinary definition"

Event
10/20/22

Colleague AI makes music – How new technologies influence music

Event
10/18/22

July 2022

What should AI be allowed to do? The stage belongs to the citizens!

Event
7/12/22

June 2022

Healing with algorithms? AI in medicine – Chances, risks, challenges

Event
6/28/22

May 2022

With colleague AI in the field – Intelligent systems in agriculture.

Event
5/17/22

April 2022

The new colleague AI. What consequences do self-learning systems have for the working world?

Event
4/26/22

IRIS Coffee Chat | Can AI help us deal with disinformation?

Event
4/20/22

February 2022

Where are you going, colleague AI?

Event
2/1/22

January 2022

IRIS Coffee Chat | NLP-supported (e-) deliberation: interdisciplinary challenges and real-world applications

Event
1/25/22

December 2021

Literature & Culture and/as Intelligent Systems

Event
12/16 – 12/17/21

November 2021

IRIS Coffee Chat | Careers in the (intelligent?) system of science (auf Deutsch)

Event
11/9/21

Reflection Lounge (auf Deutsch)

Event
11/8/21

October 2021

Reflection Lounge (English Event)

Event
10/18/21

September 2021

NoBIAS Summer School 2021

Event
9/20 – 9/22/21

July 2021

IRIS Coffee Chat | About the role and responsibility of science in the pandemic

Event
7/7/21

To the top of the page