JapanTech

[AI Security and Privacy Team Seminar] Talk by Eric Wong

Prof. Eric Wong (UPenn) on LLM safety and alignment through mechanistic theory, in person and on Zoom.

When
Tue, June 30, 2026 · 16:00–17:00 JST
Where
東京科学大 大岡山キャンパス 西8号館E 10F 系会議室 (1004) およびオンライン(Zoom) · Hybrid
Region
Kanto (Tokyo)
Organizer
RIKEN Center for Advanced Intelligence Project
Language
EN
Source
Doorkeeper
Summary
A talk on AI security by Prof. Eric Wong of the University of Pennsylvania, titled "Understanding Safety & Alignment with Mechanistic Theory." Attendance is available both in person at the Tokyo Institute of Science Ookayama campus and online via Zoom. The talk formalizes a mechanistic theory for why LLM guardrails are so easily broken and how they can be enforced. Starting from one-layer transformers, it presents the LogicBreaks framework, which identifies rule-breaking as an architectural vulnerability in the attention mechanism. It then extends to attention-based interventions, arriving at InstaBoost, a steering method that controls large-scale LLMs with just five lines of code. Eric Wong leads Brachio Lab on debugging machine learning and building systems that behave as intended, and is part of the ASSET Center on safe, explainable, and trustworthy AI.
About the community

A research seminar series hosted by RIKEN's AI research center, featuring talks by domestic and international researchers on topics such as AI security and privacy. It is aimed mainly at researchers and graduate students.

#ai-security#llm#alignment#machine-learning#research-seminar#privacy