Alternatives to Humanity's Last Exam (HLE)
Humanity's Last Exam (HLE) and 2 alternative tools evaluated on the Tekai technology radar.
Humanity's Last Exam (HLE)
SubjectA 2,500-question expert-level benchmark curated by ~1,000 specialists to measure AI capabilities where frontier models still score 40-50%.
open-source CC-BY-4.0
Alternatives
MMLU (Massive Multitask Language Understanding)
A benchmark of 15,908 multiple-choice questions across 57 academic subjects for evaluating LLM knowledge, now effectively saturated by frontier models.
open-source MIT
HCAST (Human-Calibrated Autonomy Software Tasks)
METR's primary benchmark measuring frontier AI autonomous software task completion, calibrated against 140 human experts across 189 tasks.
open-source MIT
Comparison Summary
| Tool | Radar | Type | License |
|---|---|---|---|
| Humanity's Last Exam (HLE) | assess | open-source | CC-BY-4.0 |
| MMLU (Massive Multitask Language Understanding) | hold | open-source | MIT |
| HCAST (Human-Calibrated Autonomy Software Tasks) | assess | open-source | MIT |