Building a Fair Data Future

Introduction

Only a few years ago, few people thought about personal data past their credit scores or medical information. Data was the domain of scientists, analysts, policymakers, and financial industries. Today, the sheer volume and types of data generated by our use of new technologies, including smartphones and other IoT devices, has raised new awareness that our every action creates valuable personal information that can be mined and monetized by others, often without our knowledge, consent or ability to opt-out – data that being collected, stored, used, and sold by an unknown number of actors.

Those data can also be used in training of multi-use AI models that are subsequently deployed to drive decisions on everything from financial accessibility to hiring. The result is a concerning lack of balance: too few people control and capitalize the data of the many while individuals, communities, and society as a whole are missing out on innumerable beneficial opportunities to harness that data for the greater good.

In order to more equitably realize the benefits, and protect people from the negative ramifications, every level of society needs to be part of the future data paradigm.

What’s in this Guide

Introduction

Acknowledgements

Council for a Fair Data Future

Themes

Empowered communities

Corporate behavior

Regulations and policies

The leading role of the funder

Recommendations

Invest in data infrastructure

Advocate for communities

Build and elevate data standards

Conclusion

Acknowledgements

Council for a Fair Data Future

We extend our immense gratitude to all the members of the Council for a Fair Data Future for generously sharing their time and expertise over the span of 6 months during our virtual working sessions and our in-person convening. In addition to the members, we extend a heartfelt thank you to Jasmine McNealy, Associate Professor at the University of Florida, who participated in our in-person convening.

The Council members truly enriched this endeavor with their contributions:

Alex Krasodomski
Astha Kapoor
David Carroll
Diane Coyle
Dirk Bergemann
Emily Chi
Fred Benenson
Jack Hardinges
Jeni Tennison
John Wilbanks
Kelly Jin
Khuram Zaman
Lili Gangas

Lorrayne Porciuncula
Manar Waheed
Matt Prewitt
Natalie Evans Harris
Parminder Singh
Paromita Shah
Rajesh De
Ravi Naik
Robert (Bob) Fay
Sushant Kumar
Sylvie Delacroix
Valentina Pavel

Aspen Digital

Thank you to members of the Aspen Digital team who supported the many aspects of this work. This was a perfect example of how it takes a village to do good work, and this team is amongst the best people to have behind you:

Beth Semel – Event Support
Eleanor Tursman – Event Support
Morgan McMurray – Researcher, Writer
Ryan Merkely – Project Lead, Facilitator, Writer
Shanthi Bolla – Project Manager, Writer
Vivian Schiller – Advisor

Omidyar Network

Finally, thank you to the Omidyar Network for supporting this work and for their commitment to a fair data future. The team provided crucial guidance and expertise in the understanding of the data economy today and how it could evolve tomorrow to benefit the many:

Govind Shivkumar
Liza Paudel
Sushant Kumar

Return to Table of Contents

Council for a Fair Data Future

To address this imbalance, the Aspen Institute assembled the Council for a Fair Data Future, a global community of academics, technologists, policymakers, and civil society actors from across the data-driven economy, each a leader in their areas of focus. Our core question: how can we help establish a future where the benefits of data accrue to individuals and communities, not solely to private companies?

By design, each member of the council brought a different focus to the work, revealing the complexity and difficulty of identifying common areas of interest. The group approached the topic through a variety of entry points – some academic, some systemic, some specifically attuned to particular specializations, communities, risks, or harms. Some members ambitiously sought to explore benefits yet to be imagined; others cautiously attempted to reel in the worst harms and most egregious exploitation of data that too often hurts marginalized communities. The Council conceded that the topic is too broad to yield a single overarching solution. Instead they chose to recommend a series of independent and interconnected remedies aimed at expanding access for all while also curtailing the concentrated power of corporations and governments to achieve a fair data future.

Each of the Council’s ideas and solutions should be thought of as a lever in a complex multilevel system, with many moving parts that needs to be built with the intention of empowering the communities who intentionally or not — are the sources of the data that informs and advances our world. Where we found broad agreement was the need for affected communities to be engaged, empowered, resourced, and centered in any future discussions. Too often communities, particularly those of which are historically marginalized or underserved, are overlooked or intentionally left out.

The conversations of the Council took place during the explosion of generative AI tools in 2023 after the launch of ChatGPT and its counterparts raised concerns, excitement, and uncertainty across industries, especially as the general public became more aware of the data used to train these tools. While it was important to allow space for a discussion of AI, the group was encouraged to move to other areas of conflict common in the data space and think through some of the areas crucial to a new fair data paradigm.

The group convened online over a period of 7 months to align on its understanding of the issues and opportunities, and to discuss the possibilities to build a fair data future at a time when data has become the critical fuel for innovation. This group of 30 experts were split into two groups and met monthly, starting with the very broad concept of what is the current state of the data paradigm and what did they want a future version of it to look like? From here, a list of discussion areas was developed and referred to for the following virtual meetings. At the same time, space was made for members to comment, question, or concur with the discussion happening in the other group.

The virtual discussions culminated with an in-person convening in July 2023, when the Council members met in-person in New York City for two and a half days to dive deep into the areas of interest, agreement, and conflict that had emerged throughout the process. The aim of the meeting was to highlight the work the council had done virtually to develop insights and tangible recommendations to enhance data equity and shared collective benefits.

Return to Table of Contents

Themes

From its online sessions, the group identified three major themes for exploration:

Empowered Communities
Regulations and Policies
Corporate Behavior

Empowered communities

The Council for a Fair Data Future chose to center communities, especially marginalized and underserved communities, in every conversation. “Community” in the context of data can mean many different things, from geographic area, to ethnic or cultural group, to individuals and families with specific medical conditions, to users of a common platform or service. For the most part, communities are unable to obtain data that could be used to generate collective knowledge and insights, as they are frequently excluded from its collection, re-distribution, and use.

The Council agreed that there can be no real progress without the inclusion and consistent involvement of the communities whose data is being utilized. The Council followed that thread to explore ways that communities could engage directly in collection, management, and use of their own data to enhance or advocate for themselves. Marginalized and underserved communities in particular have the potential to make significant progress towards any number of goals if they were able to utilize the treasure trove of data collected about them and the factors affecting them.

Data rights advocates have already begun to strike out on their own to collect and analyze data in support of specific communities. Dr. Desi Small-Rodriguez uses a traveling trailer she has deemed the “Data War Pony” that supports the data autonomy for Indigenous People by assisting with the collection and analysis of data including “language repositories, health assessments, demographic and economic surveys, and even fish counts”. With this information, Indigenous communities can properly advocate for the resources owed to them, learn how to better address common health concerns, and make decisions about vital sources of food and employment.

Corporate behavior

The vast majority of data is collected, stored, and exploited by private actors. As new AI tools emerge, those private actors not only pursue data pulled from their own products, but are increasingly reliant on alternative sources of data acquired from third party data vendors or scraped from public-facing websites, despite objections from the owners of those websites or their users.

Companies and users are scrambling to find ways to protect publicly available data. Linkedin has updated its terms of service to ban data scraping in an attempt to stop hiQ Labs from using this method to help determine when an employee is most likely to quit. On the other side of the coin, Google is currently fighting a class action lawsuit from eight unknown individuals that claim “the company’s scraping of data to train generative artificial-intelligence systems violates millions of people’s privacy and property rights”. Meanwhile, some companies are signing licensing deals with AI providers to access content for use in training, like Reddit and Quora, often against the objections of the users who created that content. As questions of “Can they? Can they not?” are fought out in courtrooms over the next few years, it is clear that the collection and combination of the datasets has created new implications for privacy, ethics, intellectual property, and human rights as they are turned into products. The speed of new product development, deployment and mainstream adoption has outpaced essential public policy discussions that might have led to more responsible deployment, or the opportunity to engage on the potential for harms and benefits to communities.

The Council shared the concerns of leading data rights organizations, advocates, and academics about the lack of transparency provided by companies who collect, exploit, and broker the sale of data. In a new data paradigm, the Council called for new norms that would require the collectors of data to articulate their data sources, methods, and intent prior to collection.

The council pushed themselves to look beyond how data is traditionally used and think further about what can be done with data and should be done with data? For instance, heart disease is the leading cause of death in Black communities in the US. Can greater access to the health data of Black patients be given to corporate healthcare researchers trying to develop new medications? Given the sensitive nature of the data and the history of how data from Black communities have been used against them under the guise of helping them, should greater access be given? These questions forced the council to think more thoughtfully about data and should be applied to any entity that handles data. If the answers to these questions are required to be public, leaps can be made towards a more transparent and equitable approach to the collection and use of it.

Regulations and policies

Regulations provide an important opportunity to set rules and standards for corporations, governments, and all other entities, private or public, that interact with data. Existing laws at various levels of government such as the European Union’s General Data Protection Regulations or more recently, California’s Consumer Privacy Act, are being used to reign in negative actors, but they focus primarily on protecting an individual’s data and do not go far enough even in this area. While progress continues on the data privacy front, there has been a much slower move towards opening data stores, which hold infinite possibilities to benefit the collective good from advancing research fields to environmental protection to increasing the success rate of social programs.

Regulations can also protect and empower communities, and can force corporations and even governments towards a more transparent data paradigm, which in turn can lead to more fair outcomes. Unfortunately today, we see a considerable amount of regulatory capture of governing bodies by corporate actors, who too often use their economic power to shape laws to their own benefit. The Council discussed the need for more data and technical experts among lawyers, judges, and legislators in order to reduce the reliance by governments on the expertise of corporations when authoring new regulations.

In a new data paradigm, advanced regulatory benefits could include: enhanced privacy

protection of citizens; anti-monopoly and data combination restrictions; enhanced transparency and required disclosure into how data is collected, combined, brokered, and employed in systems, especially those used in decision-making that impacts the lives of individuals and communities.

Return to Table of Contents

The leading role of the funder

By the end of the convening, the Council generated close to 100 ideas and recommendations for consideration. They ranged vastly from the achievable to the improbable, and while few ideas garnered the complete consensus of the group, one general area of agreement was the unique power of philanthropy to lead in this space.

Funders can play an important role to affect change in each of the three themes mentioned above: empowering communities, regulation and policy, and corporate behavior. They are influencers of public policy, trusted collaborators with communities, and are often large data utilizers themselves, either directly or indirectly from the projects they fund.

Return to Table of Contents

Recommendations

Philanthropy has an opportunity to lead toward a more fair data future by breaking through the noise being created around the advancement of generative AI and elevating issues such as community involvement, enforcement mechanisms, and corporate behavior while also pushing the conversation forward about what the possible data opportunities of the future might be. In the process, ethical, transparent, and community-focused processes for collection, analysis, sharing and use has become an afterthought at best, and an intentional omission at worst.

Funders can lead by example and establish new norms and higher expectations within their own organizations, with the organizations they fund and engage with, and in the subject areas and peers they influence.

Invest in data infrastructure

Build, implement, and maintain robust data management systems that can be directed towards community uses and benefits. These organizations can also provide upskilling opportunities to their staff to enable and ensure proper data use.
Incentivize grantees that collect and use data to become better stewards of that data, which may include the creation of collaborative organizations in the form of a data trusts or cooperative. This would allow groups such as academics and advocates to have better access to data that could help them advance their work.
Prioritize or require good data principles in foundation endowments using LPs, pension funds, endowments, and metrics that drive business.
Invest in projects that make data available for community use. For example, the Asian American community is made up of over 20 different groups but more often than not, all data collected from these groups is put in the same bucket with no way to disaggregate it, by racial or ethnic backgrounds. At the same time many communities still only orally pass down vital information about their cultures to the next generation. In the process more and more information is being lost to time and if recorded properly, could help close the digital divide within those communities. The resulting digital data can unlock progress towards more representative datasets, foundational AI models, and products.

Advocate for communities

Support community engagement and upskilling so communities can better control data collection, management, and use. Funders should look for opportunities to support their training of local leaders who are already doing some form of data management, often in an unofficial capacity. Trusted local institutions, such as libraries, can be partners in this work. Community training has the added benefit of opening up other areas of stable work that can help provide financial stability and advancements for underrepresented and marginalized communities.
Consider data collection as an opportunity to improve equitable and accurate representation of communities. Funders should ensure that data related to communities accurately represents them to the best of their abilities, and if not possible, to transparently acknowledge who is missing from the data and why. For example, AI healthcare tools used to identify the health needs of older adults are trained with data on predominantly younger populations, potentially leading to incorrect diagnoses or trend predictions.
Establish data impact assessments as a requirement of all grantees. That might include community effects, equity assessments, privacy evaluations, or human rights criteria. This would go a long way towards preventing harms and lead to more inclusive outcomes.

Build and elevate data standards and guardrails

Support an inventory of both existing and new policies and regulations that are being developed, debated, and implemented at an accelerated pace. Attempting to keep up with technological advancements, particularly as it relates to artificial intelligence, is difficult. Such a database would be an invaluable resource to anyone seeking to ensure communities benefit from data, and are protected from harm.
Give space for research on the possible long-term impacts that different standards and regulatory systems might have in the future. With support to think through outcomes beyond the issues of today, experts will be better equipped to inform and advise regulatory bodies as technologies continue to advance.
Establish a set of foundational questions that need to be answered by every applicant and grantee regarding the collection or utilization of data sets. These might include:
- “Why” is the data being collected/used? What is the context in which the data is collected/used ?
- What is your data stewardship plan, and how are you evaluating your suppliers and partners who will access or collect data?
- What is your plan for public disclosure of data collected?

Return to Table of Contents

Conclusion

While most attention is focused on the technology that the data powers, there is a very real opportunity to shift the data economy towards one in which the benefits of data accrue to individuals and communities rather than solely to a few private companies. After discussing and debating for nine months, the Council for a Fair Data Future concluded that while this is a lofty goal, it is a crucial one to work towards.

Funders are uniquely positioned as both major influencers in all sectors and as changemakers in their own right with the projects, organizations, and people they fund. As a result the Council created a set of recommendations to help guide funders towards leading the data economy into a more equitable and beneficial future for all. These recommendations focus on the creation of new structures and the alterations of current ones while at the same time exploring ways in which to ensure more diverse voices are elevated and often ignored communities are heard.

This is an opportunity for Funders to reimagine an industry, one that impacts every single person in the digital age and shift the focus towards re-establishing it for the better and mitigating the worse to create a future that allows for innovation, real privacy, and creative collaboration. Multiple levers need to be established in all sectors at all levels for there to be a truly fair data paradigm and while Funders are just one piece of a very large, incomplete puzzle, they can be and should be at the forefront of building a strong and equitable foundation for the data paradigm powering this new age of digital innovation.

Return to Table of Contents

Building a Fair Data Future

Introduction

What’s in this Guide