Alignment Research Center (Page 3)

Formalizing the presumption of independence

ARC has released a paper on Formalizing the presumption of independence, an open problem currently central to our approach to Eliciting Latent Knowledge (ELK). … »

ELK prize results

From January - February the Alignment Research Center offered prizes for proposed algorithms for eliciting latent knowledge. In total we received 197 proposals and are awarding 32 prizes of $5k-20k. We are also giving 24 proposals honorable mentions of $1k, for a total of $274,000. … »

ELK First Round Contest Winners

Thank you to all those who have submitted proposals to the ELK proposal competition. We evaluated 30 distinct proposals from 25 people. We awarded a total of $70,000 for proposals from 8 people. … »

Prizes for ELK proposals

Roughly speaking, the goal of ELK is to incentivize ML models to honestly answer “straightforward” questions where the right answer is unambiguous and known by the model. We are offering prizes of $5,000 to $50,000 for proposed strategies for ELK. … »

Counterexamples to some ELK proposals

In this post I’ll describe some possible approaches to eliciting latent knowledge (ELK) not discussed in our report. These are basically restatements of proposals by Davidad, Rohin, Ramana, and John Maxwell. For each approach, I’ll present one or two counterexamples that I think would break it. … »

ARC is hiring!

This is an archived version of the early 2022 hiring round page. … »

ARC's first technical report: Eliciting Latent Knowledge

ARC has published a report on Eliciting Latent Knowledge, an open problem which we believe is central to alignment. We think reading this report is the clearest way to understand what problems we … »