Upcoming Events
-
Auditing language models for hidden objectives
Thursday April 3, 2025, 6-8pm
Last month, Anthropic released a new paper about "systematic investigations into whether models are pursuing hidden objectives".
> We practice alignment audits by deliberately training a language model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.
Join us as Shivam Arora takes us through an explanation of the paper's key findings and some critiques of its approach and conclusions.
Past events
-
Generalization and Out of Context Reasoning
Thursday February 20, 2025, 6-8pm
A recent set of papers describe a phenomena called "Out-of-context-reasoning", which shows that AI models aren't just able to understand and recall what they have been trained on, but also make complicated inferences about their training data.
Max Kaufmann, who was part of the original team who discovered this, presented on what out-of-context-reasoning is, as well as its implications for our fundamental understanding of LLMs and AI Safety.
-
The Science of AI Evals
Thursday January 30, 2025, 6-8pm
For this week’s AI Safety event, Annie Sorkin presented on the Science of AI Safety Evals (Evaluations). We discussed how they are developed, what purpose they serve, and how effective they are. We also dove into possible avenues that could make the field of evaluations more rigorous in the future.
-
AI and Explosive Economic Growth
Thursday January 23, 2025, 6-8pm
This week, we had Epoch AI researcher Anson Ho presenting on the possibility of explosive economic growth due to the development of highly advanced AI. We due into the various feedback loops and aspects of diminishing returns that might push economic growth much higher (or not) than ever before.