Alignment Forum

Alignment Forum

Alignment Forum

Alignment Forum publishes articles covering AI, LLM. A trusted source for AI and technology insights.

Profile generated by AI for Anything

All24 Videos Shorts Articles24

The Long (Self-)Correction

The Long (Self-)Correction

4h

Challenge: Hand coding weights for efficient sequence memorisation

Challenge: Hand coding weights for efficient sequence memorisation

1d

Analysis: Score-seeking misalignment in the OpenAI–Hugging Face incident and its existential risk implications

Analysis: Score-seeking misalignment in the OpenAI–Hugging Face incident and its existential risk implications

1d

[Paper] Stringological sequence prediction II

[Paper] Stringological sequence prediction II

2d

Towards surfacing model algorithms with meta-tokens in the J-Space

Towards surfacing model algorithms with meta-tokens in the J-Space

4d

A Red Line and Oversight Framework for Government AI Contracts

A Red Line and Oversight Framework for Government AI Contracts

6d

Endogenous Alignment

Endogenous Alignment

6d

Should we benchmark conceptual capabilities using judgment prediction tasks?

Should we benchmark conceptual capabilities using judgment prediction tasks?

7d

Announcing the Corrigibility Research Fund

Announcing the Corrigibility Research Fund

7d

Why I Left Google DeepMind

Why I Left Google DeepMind

9d

Open Distillation of Hereditary Traits

Open Distillation of Hereditary Traits

10d

The original title is "Prism: Automating Science-of-Evals Research"

The original title is "Prism: Automating Science-of-Evals Research"

11d

Independent alignment of language models

Independent alignment of language models

12d

From wantons to moral agents

From wantons to moral agents

12d

The current bottleneck is political will, not research

The current bottleneck is political will, not research

13d

Value generalisation: value correction

Value generalisation: value correction

14d

The original title is a question: "How robust are natural language autoencoders to initialization?"

The original title is a question: "How robust are natural language autoencoders to initialization?"

15d

Announcing our $160M grant from Coefficient Giving

Announcing our $160M grant from Coefficient Giving

15d

Modular Pretraining Enables Access Control

Modular Pretraining Enables Access Control

16d

Notes on technical alignment via human-like social drives

Notes on technical alignment via human-like social drives

16d

Data filtering works a lot worse than you would expect

Data filtering works a lot worse than you would expect

17d

Pragmatic FDT, and predictors as game theory

Pragmatic FDT, and predictors as game theory

21d

What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability

What Capable Agents Must Know: Why AI Consciousness May Be an Inevitable Byproduct of Capability

24d

Deployment Awareness Matters More Than Evaluation Awareness

Deployment Awareness Matters More Than Evaluation Awareness

28d