Aspen Digital

Community-Aligned A.I. Benchmarks

People today do not feel like AI works for them. They are concerned about their privacy, their livelihoods, and their futures. 

We need to build AI systems that serve the public. To do that, we need to understand what people actually want from these systems and to build the technical and policy structures to incentivize that development. 

Other organizations are already focused on evaluating the impacts of AI models after they are built. To complement this important work, Aspen Digital is approaching this space from early in the development cycle, when the goals and targets for a system are still being set. We prioritize how to reimagine the technical machine learning benchmarks that drive model development to reflect and encode public values.

When people develop machine learning models for AI products and services, they iterate to improve performance. 

What it means to “improve” a machine learning model depends on what you want the model to do, like correctly transcribe an audio sample or generate a reliable summary of a long document.

Machine learning benchmarks are like standardized tests that AI researchers and builders can score their work against. Benchmarks allow us to both see if different model tweaks improve the performance for the intended task and compare similar models against one another.

Benchmarks are important, but their development and adoption has historically been somewhat arbitrary. The capabilities that benchmarks measure should reflect the priorities for what the public wants AI tools to be and do. 

We can build positive AI futures, ones that emphasize what the public wants out of these emerging technologies. As such, it’s imperative that we build benchmarks worth striving for.

This work is made possible thanks to generous support from Siegel Family Endowment.