Bilal Chughtai
About
I spend my time thinking about how to make powerful AI go well for humanity. I'll be joining the Google DeepMind language model interpretability team in February 2025. Opinions my own.
This site serves as a minimal home for artefacts I produce on the internet.
Links: google scholar // linkedin // lesswrong // twitter // instapaper // strava // email.
Posts
2025
- Open problems in mechanistic interpretability
- Product recommendations
- Intellectual progress in 2024
- Activation space interpretability may be doomed
2024
- Book Summary: Zero to One
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should remap your caps lock key
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
2023
2021
ai
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
interpretability
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
research
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
paper
- Open problems in mechanistic interpretability
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
productivity
careers
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
startups
books
sport
physics
personal