How much can we really know about AI safety without interpretability?

Name: How much can we really know about AI safety without interpretability?
Start: 2025-07-14T12:00:00-04:00
End: 2025-07-14T13:00:00-04:00
Location: Link to be Provided on Registration

Mon, Jul 14

Link to be Provided on Registration

Cross-disciplinary panel discussion between AI researchers and AI governance lawyers, moderated by an AI governance lawyer and an ethical AI developer, about AI interpretability and how that impacts AI safety.

Registration is closed

See other events

How much can we really know about AI safety without interpretability?

Time & Location

Jul 14, 2025, 12:00 PM – 1:00 PM EDT

Link to be Provided on Registration

About the event

Learning objectives (what we hope our audience will come away understanding):

- Differentiate interpretability from explainability (with examples of each) [NAZELI]

- Understand why mechanistic interpretability is uniquely challenging for LLMs and traditional interpretability tools are not sufficient for frontier AI models [NAZELI]

- Understand the key issues and recent developments/advances in interpretability (e.g., dictionary learning, feature extraction -- Golden Gate Bridge -- and feature amplification or suppression, circuit tracing & attribution graphs) and how these advances could help with de-biasing or other safety enhancements in the future [NAZELI]

- Identify regulatory obligations around explanation (GDPR Article 22, AI Act Transparency Clauses) and connect these obligations to practical tools [KATALINA]

- Recognize an explanation does not equal justification (to meet regulatory obligations, still need human-in-the-loop for sensitive documents or use cases) [KATALINA]

- How do we as in-house lawyers interact with and handle lags or gaps in in AI interpretability? What are we requiring for in-house solutions or third-party vendors? [CAROLINA]

- What are the other benefits for AI safety (outside of regulatory and contractual obligations)? Market potential? Developing new tools (e.g., human's ability to control expressive elements, to meet "human authorship" requirement of most western copyright frameworks?) [CAROLINA]

Speakers and Facilitatiors:

Nazeli Ter-Petrosyan - AI researcher, Data Scientist at Opply (London, UK), Katalina Hernandez - Data Privacy & AI Governance Officer, VOIS (Málaga, Spain), creator of the newsletter Why Should AI Governance Professionals & Tech Lawyers Care About AI Safety?, Carolina Braga - Privacy Program Manager, Meta (UK) - moderators are Yelena Ambartsumian and Alexsai Srourali

Target Audience:

AI governance professionals, in-house lawyers and fractional GCs, anyone interested in AI interpretability / mechanistic interpretability

WiAIG

How much can we really know about AI safety without interpretability?

Time & Location

About the event

Share this event