Intelligence in the Public Interest

How AI research is falling short & what to do about it

Hands holding rulers and other implements over a stylized representation of a deep neural net

June 18, 2025

Eleanor Tursman
Researcher, Emerging Technologies

B Cavello
Director, Emerging Technologies

Responsible Innovation

Artificial Intelligence
Benchmarks
Food Security

Get Involved with this Work!

Take the Survey

Executive Summary

AI investment isn’t slowing down and much of it is informed by performance on AI benchmarks. Benchmarks are metrics used to evaluate and compare the performance of AI systems that shape the incentives of the field by defining what counts as “progress.” They act as guides for AI developers on which capabilities are worth pursuing, and they are used by developers to measure improvements in their models.

Unfortunately, the improvements being measured are often disconnected from what people need and are being developed at a time where there is growing distrust in AI. Artists feel sidelined, educators worry about job security and classroom impacts, and activists raise concerns about cultural loss. While the contexts vary, the sentiment is consistent: many see AI as something being done to them, not with them or even for them.

If we want AI to serve the public good, we need new benchmarks that measure what actually matters: progress on issues like hunger, clean water, and sustainability.

Aspen Digital is leveraging AI benchmarking as a new way to include community voice in AI development. Many AI developers want their technologies to serve a social benefit, but are not versed in the needs and priorities of the public. Our approach is grounded in a simple idea: make it as easy as possible to do the right thing by translating public needs into the language that AI developers already understand. By defining new benchmarks that measure success in terms of impact on the United Nations Sustainable Development Goals (SDGs), we can realign incentives, encouraging researchers and engineers to build systems that tackle real-world problems and deliver tangible public value.

What we’ve found:

The SDGs most prioritized by the global public are Zero Hunger, Clean Water & Sanitation, No Poverty, and Good Health & Well-being, with food security being the top priority.
Unfortunately, food security is one of the most at-risk SDGs with many of its subgoals stagnant or in regression.
On our current trajectory, a majority of experts do not believe AI will deliver progress on the SDGs anytime soon, with 42% of experts from lower- and lower-middle income countries, where food security is a more pressing public issue, saying it will never.
But evidence suggests that there is opportunity for AI to be helpful on over half of the SDGs, including for food security.
There is little alignment between where AI development today is focused and where it is needed, with only 3.6% of over thirteen thousand SDG-focused AI projects targeting food security.

Following a series of preliminary convenings, Aspen Digital will survey global food security experts to surface the biggest challenges in the field. Then, working with both community representatives and AI researchers, we’ll translate these challenges into benchmarks that reflect real-world needs and can guide future development.

This project is about moving beyond vague hopes that AI will help the world, toward concrete tools that will deliver the better AI futures we deserve.

AI progress should mean human progress

AI benchmarks are disconnected from public benefit

Many communities see more harm than good from AI

A new approach to building AI systems in the public interest

Food security: A first sector for community-aligned benchmarking

AI progress should mean human progress

AI developers around the world are investing heavily in building larger and larger models and ever-more-capable systems. AI researchers have been caught up in this momentum, striving toward a much-hyped frontier: to emulate, replicate, or even surpass human intelligence, often referred to as Artificial General Intelligence or AGI.

AGI has a long history of tantalizing technologists. Some pursue it out of fascination with “impossible” problems, while others desire to bring about a new form of life. Many AI developers profess that their pursuit of AGI is in service of humanity, which is now a major talking point of several AI researchers who have become AI company executives.

AI should work in service of humanity, but the dominant measures of AI “progress” fail to reflect this pursuit. As a cross-sectoral coalition of authors recognize in a recent paper, “In a quest to achieve AGI, [AI research] communities often lose sight of the needs of people as a goal, in favor of focusing on just the technology.”

So what are the needs of people? And how can we help AI researchers keep those goals in their sights?

Many of the biggest and most urgent problems the world faces are expressed in the form of the United Nations Sustainable Development Goals (SDGs). The SDGs are a globally agreed upon set of targets that individuals, governments, and organizations use to measure their progress towards making the world a better place by 2030. Adopted unanimously by all UN Member States in 2015, the SDGs cover diverse topics from Gender Equality (#5) to Good Health and Well-being (#3). Each SDG is composed of subgoals with targets and indicators for measuring progress. While they are behind schedule and have faced their fair share of critique, the SDGs nevertheless represent important challenges for humanity to tackle that the AI research community should consider central to measures of progress.

today’s AI ecosystem seems ill-poised to bring about the revolutionary benefits promised by AGI proponents.

If AI should work in service of humanity, today’s AI ecosystem seems ill-poised to bring about the revolutionary benefits promised by AGI proponents. A 2024 UN survey of over 120 experts from 38 countries on the potential impact of current AI capabilities on the SDGs found that the majority believed that AI was not currently bringing major progress nor did they expect it to in the near future, with 42% of respondents from lower middle- and lower-income countries saying that they didn’t expect it to ever have an impact.

As Nobel Prize winning physicist Dennis Gabor wrote, “The future cannot be predicted, but futures can be invented. It was man’s ability to invent which has made human society what it is.” The last 15 years have seen tremendous progress in AI research on a number of capabilities including 3D modeling, voice transcription, image generation, and playing games like Go. According to IDC, worldwide spending on AI is expected to reach $631 billion by 2028, and around the world, many governments are investing in AI as part of their national strategies.

How can we ensure these AI investments are used to invent futures that serve all of humanity? How do we change the incentive structure of the AI research community? We need new, better goals for both what AI systems can be and do. We need to change what is rewarded.

We need better benchmarks.

AI benchmarks are disconnected from public benefit

Benchmarks are a tool researchers and other AI developers can use to evaluate their AI models. They are like the standardized tests of the AI research world, used to both track progress over time and to compare and contrast how different models perform on a fixed objective. For example, the MS COCO dataset is used to evaluate model performance on tasks such as object recognition, object localization, and image captioning.

In fact, benchmarks are not only measures, but also drivers of the AI research frontier. They are relatively fast and easy for researchers to use, and they can bring significant reputational benefits for the developers of high-scoring models. For these reasons, benchmarks are widely used by both the research community and industry.

Benchmarking is related to but different from impact assessments (or other evaluation tools). While impact assessments are focused on performance and impact after a model is built and deployed, benchmarking is applicable earlier in the development cycle, while a model is being built.

Popular benchmarks can influence the type of research that gets funded and what types of projects are pursued. While benchmarks are not a perfect measurement tool (and there are some inherent limitations to the types of problem that can be solved or approached via benchmarking), their uptake is encouraged by the incentives driving the academic community. Performing well on a popular benchmark acts as a shorthand for successful research, which makes it easier to publish, easier to win grants, and easier to get support for research labs.

Popular benchmarks can influence the type of research that gets funded and what types of projects are pursued.

Many of today’s AI benchmarks are not constructed with important goals like the SDGs in mind. They prioritize compelling headlines (like passing the bar exam) or mimicking human performance in pursuit of AGI. As a humanitarian technologist we interviewed remarked, “Who said that [AGI] is what we should rally around? If we are to double down behind capabilities… we should probably agree on better things to be working towards.” Even researchers who are inspired by the goal of AGI ought to prioritize capabilities with the public interest in mind and ensure that the systems that they are building support the needs of people and the planet. At the 2025 AI Action Summit in Paris, Dr. Alondra Nelson said:

“It is not inevitable that AI will lead to great public benefits. The outcomes many of us hope for, or anticipate, are not inherent features of the technology itself. … These benefits will not emerge only from the invisible hand of market dynamics. They must be cultivated in partnership with civil society and with our democratic institutions. And most critically, AI in the public interest demands the meaningful involvement of the very people whose lives could be transformed by these technologies.”

Many communities see more harm than good from AI

Over the past six months, Aspen Digital has been interviewing people that are a part of, or work with, communities likely to be impacted by AI deployment. We’ve spoken with a few dozen experts, asking them what their community might actually want out of an AI system.

For many, the focus right now is responding to an onslaught of harms. A researcher who works with artists said that, although artists are typically excited about technology and tools, many feel increasingly sidelined by AI development, as most tools are not made with their vision and needs in mind. An educator told us that teachers are worried about threats to their jobs and how AI tools are impacting learning. A writer said there’s fear that the whole entertainment landscape will change irrevocably because public engagement with their work has already been disrupted by algorithmic recommendations. An indigenous activist is concerned about exploitation, even as they expressed a need for language support.

we lack measures of success which prioritize the needs of communities.

These specific concerns reflect broader trends of mistrust. Despite their proliferation and the many billions of dollars invested in them, today’s AI systems are not as perfect as advertised and yet hard to avoid, as buggy AI systems are increasingly embedded into other technologies like web search. Public trust in AI is eroding in many developed countries even as AI adoption rises. In the UK, for example, the word people most associated with AI is “scary!” If the public is to trust in these technologies, people need more meaningful ways to shape what gets built and how.

While there are a number of organizations working to incorporate public input and impact assessment later in the AI lifecycle, like red-teaming generative AI outputs, there is a gap in the critical early development period when researchers decide what capabilities are worth pursuing. That early development period is primarily determined by the availability of research funding and publication potential, which is often contingent on demonstrations of early success, like promising scores on benchmarks. However, right now we lack measures of success which prioritize the needs of communities, leading to a gap in how the AI research agenda is set.

A new approach to building AI systems in the public interest

We have approached the challenge of incentivizing the development of public interest AI by setting aside the pursuit of “intelligence” as a goal in and of itself and instead pursuing capabilities that are helpful to the public. To do this, we are focusing on the intersection of public priority, global targets for prosperity like the SDGs, and the most promising opportunities for AI interventions.

we can better align AI with the public interest by articulating problems in the language that AI developers are already familiar with.

As part of incentivizing lasting change, Aspen Digital has taken the approach of “making it as easy as possible to do the right thing.” We believe that we can better align AI with the public interest by articulating problems in the language that AI developers are already familiar with, such as benchmarks. Experts in the public interest and experts in AI research come from different contexts, each bringing their own specialized language, knowledge, and experience. In order to bridge these worlds, we are taking the following steps:

Find gaps in public interest AI research: Prioritize pressing public issues where AI research is lacking, where pressing issues are defined based on advancing a public priority
Identify high-opportunity applications: Narrow in on new capabilities needed to fill a gap, as validated by public interest experts
Co-design benchmark targets: Co-develop measures of success in the form of benchmark requirements with both public interest experts and AI researchers

Future work would involve a public contest for AI developers to create specific benchmarks (datasets and metrics) that meet the targets defined in the process above. Results would be judged by a panel of both public interest and AI research experts. We believe that this is the way AI contests in the public interest should be done.

To work on the first step outlined above, to find gaps in public interest AI research, we analyzed four gating factors:

Which of the SDGs does the global public prioritize?
Which SDGs are farthest off-track?
Where could AI be useful?
Where are there unaddressed gaps?

Which of the SDGs does the global public prioritize?

Global polling by Ipsos of 20,000 adults across 28 countries in 2021 shows that the public’s top five priorities are Zero Hunger (#2), No Poverty (#1), Good Health and Well-being (#3), Clean Water and Sanitation (#6), and Decent Work and Economic Growth (#8). This echoes the findings of a second paper surveying 366 sustainability experts from 66 countries. It similarly shows that their highest priority SDGs are Zero Hunger (#2), Clean Water and Sanitation (#6), and No Poverty (#1), with respondents from Asia, Latin America, the Caribbean, and Africa rating Zero Hunger significantly higher than their European, North American, and Oceanian counterparts. While it is difficult to accurately represent the sentiments of a global public, these surveys provide meaningful direction to what it means to build AI systems in the public interest.

Which SDGs are farthest off-track?

We investigated which of the SDGs are most off-track, based on the categories and data in the UN’s 2024 report on SDG progress. Affordable and Clean Energy (#7) is the only goal that appears to be mostly on-track, with No Poverty (#1), Good Health and Well-being (#3), Industry, Innovation, and Infrastructure (#9), and Partnerships for the Goals (#17) following behind. Many goals were impacted by significant regression. Among them, Zero Hunger (#2), a high-priority goal for the public, is one of the most behind schedule.

The evidence reinforces this conclusion. Data from the UN shows that global hunger surged between 2019 and 2021 and has remained persistently high since, with approximately 733 million people affected in 2023, roughly one in eleven worldwide. In the same year, 2.33 billion people experienced moderate or severe food insecurity, an increase of 383 million since 2019. Child malnutrition remains a critical issue: 148 million children under five were affected by stunting in 2022, and at the current pace, one in five will still be stunted by 2030. Additionally, nearly 60% of countries faced moderately to abnormally high food prices in 2022, largely driven by disruptions caused by global conflicts.

Where could AI be useful?

Next, we cross-referenced these results with a survey paper looking at where there is published evidence of AI either enabling or inhibiting each subgoal of each SDG. We calculated a simple opportunity score that highlights goals which are both not on track and where there is evidence that AI could be helpful for correcting that. We found that opportunity was high for almost half of the goals, and notably low for Gender Equality (#5), Good Health and Well-being (#3), Affordable and Clean Energy (#7) and Partnerships for the Goals (#17). These results are explained by #3 and #7 being relatively on track, and by there being less published evidence of AI enabling any of the subgoals of #5 and #17.

Where are there unaddressed gaps?

There is a body of existing work at the intersection of AI and the SDGs, from scholarly publications to startups to competitions to international multi-stakeholder convenings. According to a survey paper by researchers at Carnegie Mellon University, the published academic work in the “AI for Social Good” space is largely focused on health. The authors of a separate survey paper suggest this focus on health is in part because

“…most of the projects in our survey already represent a ‘scaling down’ of existing technology. More specifically, most examples of AI×SDG reflect the repurposing of existing AI tools and techniques (developed in silico in academic or industrial research contexts) for the specific problem at hand.”

We incorporated analysis from another UN report which surveyed experts on whether and where they expected AI systems to have positive impacts on the SDGs in the next three years. Experts feel that AI is most likely to positively impact Good Health and Well-being (#3) and Quality Education (#4) in the next three years, with Affordable and Clean Energy (#7), Climate Action (#13), and Sustainable Cities (#11) following behind. (N.B., Goals #8, #9, and #17 were not included in the figures we referenced from the report.)

To validate these results, we looked at the relative concentration of AI projects to advance the SDGs in a database from AI for Sustainable Development Goals (AI4SDGs). Out of 13,592 listed projects (with some being applicable to multiple SDGs), the majority are concentrated in Industry, Innovation, and Infrastructure (#9) at 36%, Good Health and Well-being (#3) at around 25%, Affordable and Clean Energy (#7) at around 16%, and Quality Education (#4) tied with Sustainable Cities (#11) at around 11%. These results largely confirm the findings from the UN survey of AI experts above, indicating that there is an extant body of work in the AI space addressing SDGs #3, #4, #7, and #11, while Zero Hunger (#2) makes up only 3.6% of the database. This is concerning as it is also the goal which is most behind schedule.

Animated gif illustrating the findings from the report

Food security: A first sector for community-aligned benchmarking

Based on these findings, we found that there is little alignment between where AI development today is focused and where it is needed and wanted on the global scale of the SDGs, especially for the public’s number one priority, food security.

Building off of the work to find gaps in public interest AI research, the next phase of our work will focus on identifying high-opportunity applications for AI within food security. To do this, we will consult with community representatives in the design process of a series of benchmarks to address food security. We are running a series of summer convenings and a global survey for food security experts. The survey and these convenings aim to surface blockers across the food security ecosystem that would benefit most from focused attention. We will then work with both AI researchers and food security experts to co-design benchmark targets and challenges for the AI development ecosystem to solve.

We need to move beyond prognosticating that AI will be good for the world, and get to a concrete vision for how we want things to be.

We know that the world will continue to invest in AI. We also know that investment and research are often driven by performance on benchmarks. To ensure that these investments in AI actually support the public interest, we must develop benchmarks that measure the capabilities that matter most: those that advance progress on global priorities like the SDGs. We need to move beyond prognosticating that AI will be good for the world, and get to a concrete vision for how we want things to be.

Acknowledgements

This work was made possible with the support of Siegel Family Endowment.

Thanks also to Francisco Jure, Shanthi Bolla, and Amy Schenk for their support and contributions.

Share your thoughts

If this work is helpful to you, please let us know! We actively solicit feedback on our work so that we can make it more useful to people like you.

LEARN MORE ABOUT AI

AI 101

Intro to
Generative AI

Finding Experts
in AI

Eleanor Tursman

Researcher, Emerging Technologies

Eleanor Tursman is an Emerging Technologies Researcher at The Aspen Institute, where they work at the intersection of novel technologies, public education, and public policy. They are also joining the Siegel Family Endowment as a Research Fellow for 2022-24.

B Cavello

Director, Emerging Technologies

B Cavello is a technology and facilitation expert who is passionate about creating social change by empowering everyone to participate in technological and social governance. B is the Director of Emerging Technologies at the Aspen Institute.

                {"includes":[{"object":"taxonomy","value":"131"}],"excludes":[{"object":"page","value":"205308"},{"object":"type","value":"callout"},{"object":"type","value":"form"},{"object":"type","value":"page"},{"object":"type","value":"article"},{"object":"type","value":"company"},{"object":"type","value":"person"},{"object":"type","value":"press"},{"object":"type","value":"event"},{"object":"type","value":"workstream"}],"order":[],"meta":"","rules":[],"property":"","details":["title"],"title":"Browse More Reports","description":"","columns":2,"total":4,"filters":[],"filtering":[],"abilities":[],"action":"swipe","buttons":[],"pagination":[],"search":"","className":"random","sorts":[]}

Browse More Reports

Yellow background decorated with curving lines in groups of three to emulate circuit board traces

Benchmarks 101

An accessible resource for understanding how artificial intelligence systems are evaluated through benchmarks.

December 2, 2025

An illustration of hands measuring an open UN Sustainable Development Goals color wheel.

Reclaiming AI for Development

Reclaiming AI for development through community-aligned benchmarks can steer innovation toward resilience and meaningful impact.

August 20, 2025