Jacob Hilton - Alignment Research Center

A bird's eye view of ARC's research

Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The … »

Backdoors as an analogy for deceptive alignment

ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our motivation for this is an analogy between … »

Formal verification, heuristic explanations and surprise accounting

ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural … »

ARC is hiring theoretical researchers

The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here. Update January 2024: we have paused hiring and expect to reopen … »