Bilal Chughtai
About
I spend my time thinking about how to make powerful AI go well for humanity. I work on language model interpretability and AGI safety at Google DeepMind. Opinions my own.
This site serves as a minimal home for artefacts I produce on the internet.
Links
google scholar // github // linkedin // lesswrong // twitter // instapaper // strava // email
Posts
2025
- Should you spend time making things more efficient?
- Product recommendations
- An opinionated guide to building a good to-do system
- everything2prompt
- My health dashboard
- Bookshelf
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Intellectual progress in 2024
- Activation space interpretability may be doomed
2024
- Book Summary: Zero to One
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should remap your caps lock key
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
2023
2021
ai
- everything2prompt
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
interpretability
- Joining Google DeepMind
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
productivity
- Should you spend time making things more efficient?
- Product recommendations
- An opinionated guide to building a good to-do system
- everything2prompt
- You should remap your caps lock key
research
- Activation space interpretability may be doomed
- Understanding positional features in layer 0 SAEs
- Unlearning via RMU is mostly shallow
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
paper
- Detecting strategic deception using linear probes
- Open problems in mechanistic interpretability
- Transformer circuit faithfulness metrics are not robust
- Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
careers
- Joining Google DeepMind
- Reasons for and against working on technical AI safety at a frontier AI lab
- You should consider applying to PhDs (soon!)
books
startups
health
sport
physics
personal