New Publication: "Deception abilities emerged in large language models"

June 5, 2024 / IRIS3D

Dr. Thilo Hagendorff

Thilo Hagendorff's research on LLM behavior has resulted in another top-tier journal publication. In particular, the Proceedings of the National Academy of Sciences (PNAS) accepted his paper on deception abilities in LLMs. In the paper, he presents a series of experiments demonstrating that state-of-the-art LLMs have a conceptual understanding of deceptive behavior. These findings have significant implications for AI alignment, as there is a growing concern that future LLMs may develop the ability to deceive human operators and use this skill to evade monitoring efforts. 

Deception abilities emerged in large language models

Contact

Dr. Thilo Hagendorff

To the top of the page