[104th TrustML Young Scientist Seminar] Talk by John Robertson (UT Austin) "Language Model Control and Reliability: Understanding Steering Vectors and Agentic Aging"

Name: [104th TrustML Young Scientist Seminar] Talk by John Robertson (UT Austin) "Language Model Control and Reliability: Understanding Steering Vectors and Agentic Aging"
Start: 2026-06-23T14:00:00+09:00
End: 2026-06-23T15:00:00+09:00
Location: Online + Meeting RoomB at Nihonbashi (AIP researchers only)

A RIKEN AIP seminar with UT Austin's John Robertson on activation steering, concept granularity, and the longitudinal reliability of LLM agents.

When: Tue, June 23, 2026 · 14:00–15:00 JST
Where: Online + Meeting RoomB at Nihonbashi (AIP researchers only) · Hybrid
Region: Other
Organizer: RIKEN Center for Advanced Intelligence Project
Language: EN
Source: Doorkeeper

Open in Doorkeeper Add to Calendar

Summary

RIKEN AIP's TrustML Young Scientist Seminar series hosts John Robertson, a PhD student at UT Austin, for a talk on controlling and trusting large language models. Robertson opens with activation steering, a lightweight way to adjust model behavior without retraining, and argues that the wide variation in its effectiveness reflects search difficulty rather than a fundamental limit. He shows that the directional alignment of contrastive activations at the prompt boundary predicts where useful interventions emerge, letting geometry-guided optimization find them with roughly 40% fewer evaluations across three model families. The talk then introduces concept granularity, a measure of how much a steering direction rotates across input contexts. Computable from cached activations before any steering runs, it predicts both how hard a concept is to optimize and the quality ultimately achievable. Robertson closes by shifting from control to reliability over time, presenting AgingBench, a longitudinal benchmark that tracks how frozen-weight agents degrade as they compress history, retrieve from growing memory, and revise facts. The seminar runs online and in Meeting Room B at the Nihonbashi office, with the physical room open to AIP researchers only. The session is conducted in English.

About the community

The TrustML Young Scientist Seminar is a recurring research seminar series focused on the trustworthiness, reliability, and controllability of machine learning systems. Sessions feature early-career researchers presenting recent work, held online and at the Nihonbashi office, and are aimed at researchers and graduate students in machine learning.

#machine-learning#llm#interpretability#activation-steering#ai-agents#research-seminar