Bilal Chughtai
About
I spend my time thinking about how to make powerful AI go well for humanity. I currently work on the language model interpretability team at Google DeepMind. Opinions my own.
This site serves as a minimal home for artefacts I produce on the internet.
Links: google scholar // linkedin // lesswrong // twitter // instapaper // strava // email.
Posts
2025
- Bookshelf
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Product recommendations
- Intellectual progress in 2024
- Activation space interpretability may be doomed
2024
- Book Summary: Zero to One
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should remap your caps lock key
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
2023
2021
ai
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
interpretability
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
research
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
paper
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
careers
- Joining Google DeepMind
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
productivity
books
personal
sport
physics
startups