AI Summary
TL;DR: Benchmark datasets on Hugging Face can now host leaderboards. Models store their own eval scores. The community can submit results via PR. Verified badges prove that the results can be reproduced. Let's be real about where we are with evals in 2026.