Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The
…
»
ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our motivation for this is an analogy between
…
»
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural
…
»
The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here.
Update January 2024: we have paused hiring and expect to reopen
…
»