It’s time to build technical benchmarks that prioritize community needs and values. This will incentivize the development of AI systems that people actually want, and need.
THE CHALLENGE
People today do not feel like AI works for them. They are concerned about their privacy, their livelihoods, and their futures.
We need to build AI systems that serve the public. To do that, we need to understand what people actually want from these systems and to build the technical and policy structures to incentivize that development.
OUR APPROACH
Other organizations are already focused on evaluating the impacts of AI models after they are built. To complement this important work, Aspen Digital is approaching this space from early in the development cycle, when the goals and targets for a system are still being set. We prioritize how to reimagine the technical machine learning benchmarks that drive model development to reflect and encode public values.
CONTRIBUTE TO THIS WORK
Want to elevate community priorities for AI tools? Or spur academics and technologists to develop value-driven benchmarks? Please reach out about how to contribute to this work.
ABOUT A.I. BENCHMARKS
When people develop machine learning models for AI products and services, they iterate to improve performance.
What it means to “improve” a machine learning model depends on what you want the model to do, like correctly transcribe an audio sample or generate a reliable summary of a long document.
Machine learning benchmarks are like standardized tests that AI researchers and builders can score their work against. Benchmarks allow us to both see if different model tweaks improve the performance for the intended task and compare similar models against one another.
Some famous benchmarks in AI include ImageNet and the Stanford Question Answering Dataset (SQuAD).
Benchmarks are important, but their development and adoption has historically been somewhat arbitrary. The capabilities that benchmarks measure should reflect the priorities for what the public wants AI tools to be and do.
We can build positive AI futures, ones that emphasize what the public wants out of these emerging technologies. As such, it’s imperative that we build benchmarks worth striving for.
ACKNOWLEDGEMENTS
This work is made possible thanks to generous support from Siegel Family Endowment.