My team at GDM has written up a blog post outlining what our current high level research philosophy is, how we got here, and why we think it's an improvement over our previous philosophy -- we're calling it "pragmatic" interpretability: neelnanda.io/vision
We've also shared a list of concrete directions we are currently working on (along with theories of change) where we think interpretability techniques or mindsets have some comparative advantage over other methods, and might be uniquely well placed to contribute: neelnanda.io/agenda