Loading schedule...
Coordination & Cooperation Session
Uncertainty constrains coordination: robustness to humans without human data
4:00 PM - 4:30 PM | Merrill Hall
Repeated AI Interaction: How Agents Learn and Strategize Over Time
4:30 PM - 4:50 PM | Merrill Hall
The Habermas Machine: AI-Mediated Deliberation to Protect Human Agency
4:50 PM - 5:10 PM | Merrill Hall
Panel Discussion
Eugene Vinitsky, Natalie Collina, Michiel Bakker, Cassidy Laidlaw
5:30 PM - 6:00 PM | Merrill Hall
Model Organisms of Misalignment Session
Friday, June 6 | 👑 Program Committee
Robustness & Guaranteed Safety Session
Friday, June 6 | 👑 Program Committee
Trustworthy and Transparent Alignment of Large Language Models
10:30 AM - 11:00 AM | Nautilus
Student Lightning Talks
Friday, June 6
Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts
1:30 PM - 1:40 PM | Merrill Hall
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
1:40 PM - 1:50 PM | Merrill Hall
Political Neutrality in AI is Impossible – But Here is How to Approximate it
2:10 PM - 2:20 PM | Merrill Hall
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
2:20 PM - 2:30 PM | Merrill Hall
Robust and Diverse Multi-agent Learning via Rational Policy Gradient
3:00 PM - 3:10 PM | Merrill Hall
Spooky Demos Session
o3 prevents itself from being shutdown: empirical evidence of instrumental convergence
1:30 PM - 1:50 PM | Nautilus
Live Demos of AI Risks for Policymakers and Civil Society
1:50 PM - 2:10 PM | Nautilus
Panel Discussion
Max Tegmark, Jeffrey Ladish, Siddharth Hiregowdara, Holly Elmore
2:30 PM - 3:00 PM | Nautilus
Well-Founded AI Session
Symbolic Reasoning about Large Language Models
10:30 AM - 11:00 AM | Merrill Hall
Human-like Concept Induction through Library Learning and Probabilistic Program Synthesis
11:00 AM - 11:30 AM | Merrill Hall
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
11:30 AM - 12:00 PM | Merrill Hall
Human Value Learning Session
Understanding what people want: Towards descriptive and identifiable psychological models for reward inference
9:30 AM - 10:00 AM | Merrill Hall
Neglected Approaches to Value Learning and Alignment
10:30 AM - 10:50 AM | Nautilus
Panel Discussion
Brad Knox, Diogo Schwerz de Lucena, Michael Bowling, Arushi Somani
11:30 AM - 12:00 PM | Nautilus
Researcher Spotlight Talks
Saturday, June 7
The Agentic Turn: Philosophical Reflections on AI Agents, Alignment and Impact
1:50 PM - 2:10 PM | Merrill Hall
A Meta-Game Evaluation Framework for Advanced Interactive AI
2:30 PM - 2:50 PM | Merrill Hall
Reward Model Interpretability via Optimal and Pessimal Tokens
2:50 PM - 3:10 PM | Merrill Hall
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
3:10 PM - 3:30 PM | Merrill Hall
Explainability & Interpretability Session
Areas of consensus and disagreement in interpretability research
10:30 AM - 10:50 AM | Merrill Hall
Scaling AI Understanding with an Automated Interpretability Agent
10:50 AM - 11:10 AM | Merrill Hall
Benchmarking Methods for Understanding and Controlling Large Language Models
11:10 AM - 11:30 AM | Merrill Hall
Panel Discussion
David Bau, Chris Potts, Tamar Rott Shaham, Atticus Geiger
11:30 AM - 12:00 PM | Merrill Hall
Adversarial Robustness Session
Booking Meeting Space in Triton Room
Book Triton RoomSurf & Sand Meeting Room
If you’d just like a space with a some tables, chairs, whiteboard and markers, please feel free to use the Surf & Sand Room (see map in Logistics).
This room is first come, first serve. No reservation required.