Position within the page tree

Home
News
New Publication: "Deception abilities emerged in large language models"

New Publication: "Deception abilities emerged in large language models"

June 5, 2024 / IRIS3D

Dr. Thilo Hagendorff

Thilo Hagendorff's research on LLM behavior has resulted in another top-tier journal publication. In particular, the Proceedings of the National Academy of Sciences (PNAS) accepted his paper on deception abilities in LLMs. In the paper, he presents a series of experiments demonstrating that state-of-the-art LLMs have a conceptual understanding of deceptive behavior. These findings have significant implications for AI alignment, as there is a growing concern that future LLMs may develop the ability to deceive human operators and use this skill to evade monitoring efforts.

Deception abilities emerged in large language models

Contact	Dr. Thilo Hagendorff

New Publication: "Deception abilities emerged in large language models"

Audience

Formalities

Services

Organization

New Publication: "Deception abilities emerged in large language models"

Here you can reach us

Audience

Formalities

Services

Organization