Dynabench: Rethinking Benchmarking In NLP
@owid.dynabench
@owid.dynabench
This dataset captures the progression of AI evaluation benchmarks, reflecting their adaptation to the rapid advancements in AI technology. The benchmarks cover a wide range of tasks, from language understanding to image processing, and are designed to test AI models' capabilities in various domains. The dataset includes performance metrics for each benchmark, providing insights into AI models' proficiency in different areas of machine learning research.
CREATE TABLE owid_dynabench (
"benchmark" VARCHAR,
"year" INTEGER,
"performance" FLOAT,
"assessment_domain" VARCHAR
);Anyone who has the link will be able to view this.