Community-Aligned AI Benchmarks

AI is reshaping the world, so the public needs to shape AI. Aspen Digital is putting the public in the driver’s seat, enabling them to define the goals for future AI research and development to ensure that AI reflects the priorities of the people with Community-Aligned AI Benchmarks.

THE CHALLENGE

Today’s AI development ecosystem is out of step with what many people want and need. Communities around the world are rightly concerned about their safety, livelihoods, and futures. Without adequate public input, AI developers may continue to pursue goals that are unrelated—or even counter—to the public interest. In order for AI systems to uphold the values and priorities of the public, we must proactively create mechanisms to meaningfully steer the frontier of AI research.

OUR APPROACH

Developers of new AI systems are not adequately incentivized to center the public interest in their work. The proliferation of “AI for good” hackathons and projects demonstrates the AI community’s appetite to apply technical skills in the public interest, but these initiatives regularly fail to deliver durable impacts. Fundamentally, the capabilities prioritized in today’s AI systems are not the types of transformative capabilities that communities and organizations around the world need to make the world a better place.

Starting with the public’s number one goal for a better future—food security—we’re bringing together subject matter experts, community leaders with machine learning researchers to identify concrete challenges and opportunities for impact.

ABOUT A.I. BENCHMARKS

When people develop machine learning models for AI products and services, they iterate to improve performance.

What it means to “improve” a machine learning model depends on what you want the model to do, like correctly transcribe an audio sample or generate a reliable summary of a long document.

Machine learning benchmarks are similar to standardized tests that AI researchers and builders can score their work against. Benchmarks allow us to both see if different model tweaks improve the performance for the intended task and compare similar models against one another.

Some famous benchmarks in AI include ImageNet and the Stanford Question Answering Dataset (SQuAD).

Benchmarks are important, but their development and adoption has historically been somewhat arbitrary. The capabilities that benchmarks measure should reflect the priorities for what the public wants AI tools to be and do.

We can build positive AI futures, ones that emphasize what the public wants out of these emerging technologies. As such, it’s imperative that we build benchmarks worth striving for.

Measuring A.I. Matters

Benchmarks are just one piece of the picture. There’s a whole ecosystem of AI evaluations dedicated to measuring AI systems and how they are used in the real world. Other organizations are already focused on evaluating the impacts of AI models after they are built. To complement this important work, Aspen Digital is approaching this space from early in the development cycle, when the goals and targets for a system are still being set.

FOLLOW ALONG

                {"includes":[{"object":"page","value":"205384","label":"Feeding the Future","type":"event"},{"object":"page","value":"205308","label":"Intelligence in the Public Interest","type":"report"},{"object":"page","value":"205043","label":"Making the Most of the Global Digital Compact","type":"article"}],"excludes":[],"order":[],"meta":"","rules":[],"property":"","details":[],"title":"","description":"","columns":2,"total":6,"filters":[],"filtering":[],"abilities":[],"action":"swipe","buttons":["arrows","bullets"],"pagination":[],"search":"","className":"person random","sorts":[]}