[AI Security and Privacy Team Seminar] Talk by Eric Wong

Name: [AI Security and Privacy Team Seminar] Talk by Eric Wong
Start: 2026-06-30T16:00:00+09:00
End: 2026-06-30T17:00:00+09:00
Location: 東京科学大 大岡山キャンパス 西8号館E 10F 系会議室 (1004) およびオンライン（Zoom）

Prof. Eric Wong (UPenn) on LLM safety and alignment through mechanistic theory, in person and on Zoom.

When: Tue, June 30, 2026 · 16:00–17:00 JST
Where: 東京科学大大岡山キャンパス西8号館E 10F 系会議室 (1004) およびオンライン（Zoom） · Hybrid
Region: Kanto (Tokyo)
Organizer: RIKEN Center for Advanced Intelligence Project
Language: EN
Source: Doorkeeper

Open in Doorkeeper Add to Calendar

Summary

A talk on AI security by Prof. Eric Wong of the University of Pennsylvania, titled "Understanding Safety & Alignment with Mechanistic Theory." Attendance is available both in person at the Tokyo Institute of Science Ookayama campus and online via Zoom. The talk formalizes a mechanistic theory for why LLM guardrails are so easily broken and how they can be enforced. Starting from one-layer transformers, it presents the LogicBreaks framework, which identifies rule-breaking as an architectural vulnerability in the attention mechanism. It then extends to attention-based interventions, arriving at InstaBoost, a steering method that controls large-scale LLMs with just five lines of code. Eric Wong leads Brachio Lab on debugging machine learning and building systems that behave as intended, and is part of the ASSET Center on safe, explainable, and trustworthy AI.

About the community

A research seminar series hosted by RIKEN's AI research center, featuring talks by domestic and international researchers on topics such as AI security and privacy. It is aimed mainly at researchers and graduate students.

#ai-security#llm#alignment#machine-learning#research-seminar#privacy