Alignment Forum
Blog
12posts
0followers
Alignment Forum publishes articles covering AI, LLM. A trusted source for AI and technology insights.

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
28d

Clarifying the role of the behavioral selection model
29d

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
32d
Mechanistic estimation for wide random MLPs
32d

New VPD method interprets language
34d

Motivated reasoning, confirmation bias, and AI risk theory
34d

LLMs learn to resist
38d
Research Sabotage in ML Codebases
40d
Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers
41d
Sleeper Agent Backdoor Results Are Messy
42d
The other paper that killed deep learning theory
42d
The paper that killed deep learning theory
43d