Alternatives to Humanity's Last Exam (HLE)

Humanity's Last Exam (HLE) and 2 alternative tools evaluated on the Tekai technology radar.

Humanity's Last Exam (HLE)

Subject

A 2,500-question expert-level benchmark curated by ~1,000 specialists to measure AI capabilities where frontier models still score 40-50%.

open-source CC-BY-4.0

assess

View full details →

Alternatives

MMLU (Massive Multitask Language Understanding)

A benchmark of 15,908 multiple-choice questions across 57 academic subjects for evaluating LLM knowledge, now effectively saturated by frontier models.

open-source MIT

hold

HCAST (Human-Calibrated Autonomy Software Tasks)

METR's primary benchmark measuring frontier AI autonomous software task completion, calibrated against 140 human experts across 189 tasks.

open-source MIT

assess

Comparison Summary

Tool	Radar	Type	License
Humanity's Last Exam (HLE)	assess	open-source	CC-BY-4.0
MMLU (Massive Multitask Language Understanding)	hold	open-source	MIT
HCAST (Human-Calibrated Autonomy Software Tasks)	assess	open-source	MIT

See all AI / ML tools →