Upcoming Events
-
AI Policy Tuesday: The Case for Regulating AI Companies, Not AI Models
Tuesday, September 30, 6:00 PM - 9:00 PM
Wim Howson Creutzberg will discuss the case for treating business entities — as opposed to models or use cases — as the main focal unit for preemptive risk regulation of advanced AI systems.
The talk will give an overview of "Entity-Based Regulation in Frontier AI Governance" by Ketan Ramakrishnan and Dean Ball, followed by commentary on the article and surrounding discussion.
-
AI Safety Thursday: Attempts and Successes of LLMs Persuading on Harmful Topics
Thursday, October 2nd, 6:00pm - 9:00pm
Large Language Models can persuade people at unprecedented scale—but how effectively, and are they willing to try persuading us toward harmful ideas?
In this talk, Matthew Kowal and Jasper Timm will present findings showing that LLMs can shift beliefs toward conspiracy theories as effectively as they debunk them, and that many models are willing to attempt harmful persuasion on dangerous topics.
-
AI Policy Tuesday: The Concept of Political Space and AI Safety
Tuesday, October 7th, 6:00 pm - 9:00 pm
International cooperation often becomes possible only after shocks, crises, or dramatic shifts in perception. ChatGPT's release in 2022 created space for x-risk discussions at the UK AI Summit, while just two years later, the Paris Summit cast these same concerns as "science fiction".
At this talk, Jason Yung will examine the concept of political space—how it opens and closes, what factors enable or constrain it, and how it can inform the way AI safety advocacy is advanced.
-
AI Safety Thursday: Building an economic model of AI automation
Thursday, October 9th, 6:00pm-9:00pm
In this presentation, Anson Ho will discuss Epoch AI's work on the GATE model, one of the first attempts at bridging the gap between growth economics and modern deep learning in a rigorous way.
-
AI Policy Tuesday: Redlines for AI
Tuesday, October 14th, 6:00pm - 9:00pm
Kathrin Gardhouse will walk us through a 3-part report of The Future Society introducing the concept of redlines and their relevance in global AI governance.
-
AI Safety Thursday: Modeling and Detecting Deceptive Alignment
Thursday October 16th, 6:00pm - 9:00pm
Annie Szorkin gives a talk on Modeling and Detecting Deceptive Alignment
-
AI Policy Tuesday: The Spectre of Transformative AI
Tuesday October 21st, 6:00pm - 9:00pm
Will AI driven security risks be more severe before the development of transformative AI, or after?
Wim Howson Creutzberg will give an overview of current research on how the severity and nature of risks stemming from the development of advanced AI are expected to change over time, drawing centrally on “The Artificial General Intelligence Race and International Security.”
-
AI Safety Thursday: The Limitations of Reinforcement Learning for LLMs in Achieving AI for Science
Thursday October 23rd, 6:00pm - 9:00pm
LLMs combined with Reinforcement Learning (RL) have unlocked new impressive capabilities. But do we simply need more scaling to reach the next step: AI for science and research? If not, what are the limitations, and what else is required?
In this talk, Yongjin Yang will share research on three fundamental bottlenecks of reinforcement learning for LLMs: skewed queries, limited exploration, and sparse reward signals. We will also discuss potential solutions to these challenges, as well as safety concerns.
-
AI Safety Tuesday: Debunking the US-Chinese AGI Race
Tuesday, October 28th, 6:00pm - 9:00pm
In this talk, Kristy Loke will present her research findings by drawing on both Chinese policy discussions and post-ChatGPT actions.
She'll discuss: (a) how China's been responding to the AGI race happening within Silicon Valley and how/ why China sees AI development differently and (b) what this implies for the trajectory of US-Chinese relations and room for global AI coordination.
-
AI Safety Thursday: Chain-of-Thought Monitoring for AI Control
Thursday, October 30th, 6:00 pm - 9:00 pm
Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviors by our LLMs and interpret their motivations simply by reading these chains of thought?
In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow.
Past Events
-
AI Safety Thursdays: Avoiding Gradual Disempowerment
Thursday, July 3rd, 6pm-8pm
This talk explored the concept of gradual disempowerment as an alternative to the abrupt takeover scenarios often discussed in AI safety. Dr David Duvenaud examined how even incremental improvements in AI capabilities can erode human influence over critical societal systems, including the economy, culture, and governance.
-
AI Policy Tuesdays: Agent Governance
Tuesday, June 24th.
Kathrin Gardhouse presented on the nascent field of Agent Governance, drawing from a recent report by IAPS.
The presentation covered current agent capabilities, expected developments, governance challenges, and proposed solutions.
-
Hackathon: Apart x Martian Mechanistic Router Interpretability Hackathon
Friday, May 30 - Sunday, Jun 1.
We hosted a jamsite for Apart Research and Martian's hackathon.
-
AI Safety Thursdays: Advanced AI's Impact on Power and Society
Thursday, May 29th, 6pm-8pm
Historically, significant technological shifts often coincide with political instability, and sometimes violent transfers of power. Should we expect AI to follow this pattern, or are there reasons to hope for a smooth transition to the post AI world?
Anson Ho drew upon economic models, broad historical trends, and recent developments in deep learning to guide us through an exploration of this question.
-
AI + Human Flourishing: Policy Levers for AI Governance
Sunday, May 4, 2025.
Considerations of AI governance are increasingly urgent as powerful models become more capable and widely deployed. Kathrin Gardhouse delivered a presentation on the available mechanisms we can use to govern AI, from policy levers to technical AI governance. It was a high-level introduction to the world of AI policy to get a sense of the lay of the land.
-
AI Safety Thursday: "AI-2027"
Thursday April 24th, 2025.
On April 3rd, a team of AI experts and superforecasters at The AI Futures Project, published a narrative called AI-2027 outlining a possible scenario of explosive AI development and takeover occurring during the coming 2 years.
Mario Gibney guided us through a presentation and discussion of the scenario where we explore how likely it is to actually track reality in the coming years.