The following papers have been accepted to the Technical AI Governance Workshop at ICML 2025:
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications
Jia Yi Goh, Shaun Khoo, Gabriel Chua, Leanne Tan, Nyx Iskandar, Jessica Foo
CALMA: Context‑Aligned Axes for Language Model Alignment
Prajna Soni, Deepika Raman, Dylan Hadfield-Menell
Deprecating Benchmarks: Criteria and Framework
Ayrton San Joaquin, Rokas Gipiškis, Leon Staufer, Ariel Gil
LLMs Can Covertly Sandbag On Capability Evaluations Against Chain-of-Thought Monitoring
Chloe Li, Noah Y. Siegel, Mary Phuong
Distributed and Decentralised Training: Technical Governance Challenges in a Shifting AI Landscape
Jakub Kryś, Yashvardhan Sharma, Janet Egan
Trends in AI Supercomputers
Konstantin Friedemann Pilz, James Sanders, Robi Rahman, Lennart Heim
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Aidan O'Gara, Gabriel Kulp, Will Hodgkins, James Petrie, Vincent Immler, Aydin Aysu, Kanad Basu, Shivam Bhasin, Stjepan Picek, Ankur Srivastava
Compute Requirements for Algorithmic Innovation in Frontier AI Models
Peter Barnett
Acceleration potential in the GPU design-to-manufacturing pipeline
Maximilian Negele
Scaling Limits to AI Chip Production
Maximilian Negele, Lennart Heim, Peter Ruschhaupt
Access Controls Will Solve the Dual-Use Dilemma
Evžen Wybitul
Exploring an Agenda on Memorization-based Copyright Verification
Harry H. Jiang, Aster Plotnik, Carlee Joe-Wong
Distinguishing Pre-AI and Post-AI Baselines in Marginal Risk Reporting
Jide Alaga, Michael Chen
Societal Capacity Assessment Framework: Measuring Vulnerability, Resilience, and Transformation from Advanced AI
Milan M. Gandhi, Peter Cihon, Owen C. Larter
Meek Models Shall Inherit The Earth
Hans Gundlach, Jayson Lynch, Neil Thompson
Relative Bias: A Comparative Approach for Quantifying Bias in LLMs
Alireza Arbabi, Florian Kerschbaum
A Conceptual Framework for AI Capability Evaluations
María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, Maria Vanina Martinez, Gerardo Simari
Watermarking Without Standards Is Not AI Governance
Alexander Nemecek, Yuzhou Jiang, Erman Ayday
From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms
Jessica Dai, Paula Gradu, Inioluwa Deborah Raji, Benjamin Recht
Position: Generative AI Regulation Can Learn From Social Media Regulation
Ruth Elisabeth Appel
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Shayne Longpre, Kevin Klyman, Ruth E. Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean McGregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Dr. Amit Elazari, Andrew Sellars, Casey John Ellis, Dane Sherrets, Dawn Song, Harley Geiger, Ilona Cohen, Lauren McIlvenny, Madhulika Srikumar et al.
Expert Survey: AI Safety & Security Research Priorities
Joe O'Brien, Jeremy Dolan, Jeba Sania, Jay Kim, Rocio Cara Labrador, Jonah Dykhuizen, Sebastian Becker, Jam Kraprayoon
Trends in Frontier AI Model Count: A Forecast to 2028
Iyngkarran Kumar, Sam Manning
Exploring Functional Similarities of Backdoored Models
Yufan Feng, Benjamin Tan, Yani Ioannou
Fragile by Design: Formalizing Watermarking Tradeoffs via Paraphrasing
Ali Falahati, Lukasz Golab
A Taxonomy for Design and Evaluation of Prompt-Based Natural Language Explanations
Isar Nejadgholi, Mona Omidyeganeh, Marc-Antoine Drouin, Jonathan Boisvert
Position: Formal Methods are the Principled Foundation of Safe AI
Gagandeep Singh, Deepika Chawla
Technical Requirements for Halting Dangerous AI Activities
Peter Barnett, Aaron Scher, David Abecassis
Proofs of Autonomy: Scalable and Practical Verification of AI Autonomy
Artem Grigor, Christian Schroeder de Witt, Ivan Martinovic
Guaranteeable Memory: An HBM-Based Chiplet for Verifiable AI Workloads
James Petrie
Detecting Compute Structuring in AI Governance is likely feasible
Emmanouil Seferis, Timothy Fist
LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
Zekun Wu, Seonglae Cho, Umar Mohammed, CRISTIAN ENRIQUE MUNOZ VILLALOBOS, Kleyton Da Costa, Xin Guan, Theo King, Ze Wang, Emre Kazim, Adriano Koshiyama
Probing Evaluation Awareness of Language Models
Jord Nguyen, Hoang Huu Khiem, Carlo Leonardo Attubato, Felix Hofstätter
AI Benchmarks: Interdisciplinary Issues and Policy Considerations
Maria Eriksson, Erasmo Purificato, Arman Noroozian, João Vinagre, Guillaume Chaslot, Emilia Gomez, David Fernández-Llorca
The Strong, weak and benign Goodhart's law. An independence-free and paradigm-agnostic formalisation
Adrien Majka, El-Mahdi El-Mhamdi
Locking Open Weight Models with Spectral Deformation
Domenic Rosati, Sebastian Dionicio, Xijie Zeng, Subhabrata Majumdar, Frank Rudzicz, Hassan Sajjad
Reproducibility: The New Frontier in AI Governance
Israel Mason-Williams, Gabryel Mason-Williams
Fallacies of Data Transparency: Rethinking Nutrition Facts for AI
Judy Hanwen Shen, Ken Liu, Angelina Wang, Sarah H. Cen, Andy K Zhang, Caroline Meinhardt, Daniel Zhang, Kevin Klyman, Rishi Bommasani, Daniel E. Ho
Robust ML Auditing using Prior Knowledge
Jade Garcia Bourrée, Augustin Godinot, Martijn De Vos, Milos Vujasinovic, Sayan Biswas, Gilles Tredan, Erwan Le Merrer, Anne-Marie Kermarrec
A Blueprint for a Secure EU AI Audit Ecosystem
Alejandro Tlaie
Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments
Christoph Schnabl, Daniel Hugenroth, Bill Marino, Alastair R. Beresford
Practical Principles for AI Cost and Compute Accounting
Stephen Casper, Luke Bailey, Tim Schreier
ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
Chhavi Yadav, Evan Laufer, Dan Boneh, Kamalika Chaudhuri
Methodological Challenges in Agentic Evaluations of AI Systems
Kevin Wei, Stephen Guth, Gabriel Wu, Patricia Paskov