Technologies

Weak-to-Strong Generalization

The empirical question of whether a weaker supervisory model can reliably elicit aligned behavior from a stronger model it cannot fully evaluate

— defined in 150th Edition, Mar 10, 2026

1appearances

Mar 2026first appeared

Mar 2026most recent

Technologiescategory

150th The question of whether a less capable AI system can successfully supervise and train a more capable one — a key open problem in alignment research

Defined 150th EditionW10 · Mar 10, 2026