Aspen Digital

Responsible Data Practices for Product Equity 

Overlapping colored waves and circles in blue, purple, and red. It represents responsible data practices in Product Equity.
February 4, 2025
  • Aspen Digital

This evolving resource is primarily intended for product teams—such as researchers, designers, product managers, and strategists—and data teams, including scientists, architects, and analysts, to guide their work on Product Equity and responsible data practices. Practitioners in compliance, security, legal, and academia may also find this resource helpful.

The examples and case studies are based on extensive research and consultation with Product Equity practitioners and civil society groups.

To effectively build products for all, and especially for systemically marginalized communities, we need to deepen our relationship with the people using the products. Anyone engaged in product design and development needs to understand the diverse identities, contexts, and experiences that inform how people interact with the world—and with products. Product Equity is not only about eliminating and minimizing harm—it also seeks to introduce and enhance positive experiences, ultimately enabling people to thrive.

Doing so means intentionally creating an experience that empowers individuals to succeed, grow, and feel fulfilled in their interactions with a product, service, or platform. It goes beyond just meeting people’s basic needs or expectations, focusing instead on fostering positive, empowering experiences that help them achieve their goals, improve their well-being, and develop their potential. In the context of digital platforms, fostering individual thriving can include elements like personalized experiences, ease of use, accessibility, and support for long-term success, contributing to overall user satisfaction and loyalty.

Demographic data can be a powerful tool for assessing and improving equity and fairness in products, but there are multiple ways to analyze and apply this data once collected.

Companies already collect data about their customers, but that data is not always helpful for Product Equity purposes. 

If companies do collect demographic data, it might be: 

(1) collected in a way that’s not inclusive; 

(2) segmented in a way that doesn’t tell us enough information about particular user needs and experiences; and/or 

(3) collected in a way that makes the data unreliable for statistical analysis. 

Responsible data practices require an eye toward harm mitigation. It also requires an acknowledgment that demographic data is not like other data. For example, multinational companies may be interested in improving experiences for individuals in the LGBTQIA+ community, but they also may operate in countries where identifying as LGBTQIA+ is criminalized. How might this company build inclusive products for this community without putting those individuals at risk of harm?

This primer gives an overview of the main concerns product practitioners and their teams should consider before and during demographic data collection. Our goal is to empower practitioners to collect data safely and effectively, making harm mitigation a foundational part of how products are developed.

There are ethical dilemmas associated with collecting and using data to make decisions that impact people’s lives. Responsible data practices are essential in addressing these challenges. For example, considering how an individual’s gender or marital status might impact their creditworthiness raises important questions about fairness and transparency, as well as the potential for discrimination based on data-driven insights and decisions.

1

Discriminatory practices and bias in data collection: Algorithms may inadvertently or intentionally discriminate against certain groups based on characteristics, experiences, and/or demographics due to incomplete data collection. For example, Daneshjou et al. (2022) documents how dermatology AI models trained on datasets predominantly composed of lighter-skinned patients performed worse at predicting skin diseases for patients with darker skin tones.

2

3

Unequal access and experiences: Companies use collected data to develop products and services that may not benefit all individuals equally. Early versions of facial recognition technology were less effective at recognizing people with darker skin tones, a problem rooted in the use of biased training data that lacked sufficient representation of darker skin tones. This created unequal experiences for some people and perpetuated the digital divide by making certain features less accessible to some systemically marginalized groups.

Due to the potential harms listed above, ask yourself and your stakeholders if data collection is the only way forward. Perhaps there are other ways of understanding your target or current audience, such as data you can infer by proxy or publicly available sources of data.

Defining your use case ahead of time helps protect against collecting more sensitive data than is necessary. Establishing clear use case boundaries can also reduce the likelihood of later misuse. If you don’t know why you’re collecting a category of personal identity data, chances are your customers won’t know either, which can damage credibility and trust.


To collect demographic data effectively, one needs to know how to define what data is needed and how the data will be used. For the purposes of this primer, we define demographic data as measurable traits of any given population such as, but not limited to, age, gender, and race. 

Before data collection, the practitioner’s goal is to understand how the data will be used in as much detail as possible. This practice may include any of the approaches outlined in the Using Demographic Data to Advance Product Equity section, including fairness testing. Fairness testing, however, is only necessary if it is a focus of your analysis, and will be explored further in the sections below.

The following sections detail each of the six components of the pre-analysis plan, outlining key considerations, promoting responsible data practices, and identifying relevant stakeholders.

Data Landscape Audit

The first step in the process is to understand the structures, governance, and datasets already in place. Companies rarely start with a blank slate, and a practitioner should not dive into data collection without developing a sense of the landscape. 

Data assessments should evaluate the impact of your company’s data collection practices on systemically marginalized groups to identify areas for improvement. Explore other data collection and retention policies that exist. Your team might determine that regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are not doing enough for your use case.

Are there better models and/or approaches your company can adopt to ensure responsible data practices?

Minimizing harm by anticipating what could go wrong by collecting data is a top priority. Harms can arise from improper use, insecure storage, or data leaks, and they can manifest in many forms. 

To address these risks:

Even if your company has established data retention policies and timelines, revisit those and decide if stricter policies are needed. 

Tips for data retention:

  • Only keep identity data for as long as it is necessary for your defined use case. 
  • Determine who in your internal team needs to have access and who does not. Establish these permissions before data collection begins.

To minimize risks associated with data breaches, regularly review and purge data that is no longer needed. Data leaks or theft can expose individuals to malicious actors, identity theft, financial loss, and other serious consequences. People who hold systemically marginalized identities are more vulnerable to these risks, and leaked data may expose these communities to additional negative—potentially lifelong—consequences.

Equity assessments and fairness tests must be aligned with the product being evaluated. If the product is designed to address a specific user experience, the focus should be aimed at measuring disparities within that experience. This, too, should inform the design of the assessment or fairness test.

The nature of the product also shapes the discussions that follow the evaluation. Findings from an assessment or fairness test do not exist in a vacuum; any burdens or benefits identified for one group need to be weighed against other considerations. If the goal is to modify the product based on the findings, product teams should set realistic expectations about what can and can’t be changed.

Key Stakeholders
  • Collaboration with product managers and engineers is vital. They can provide insights into what aspects of the product are most important and what changes are feasible. Including these teams in the pre-analysis process builds goodwill and secures their buy-in into the process. 
  • Each participant in this process—lawyers, product managers, data scientists, UX researchers, and others—brings a unique perspective informed by their domain expertise. This stage is a chance for the researcher or fairness tester to understand the product team’s concerns and goals while building a collaborative relationship to effectively interpret the eventual findings.
  • Consulting external sources, such as law firms, academics, and NGOs, can provide valuable broader perspectives that may be difficult to capture internally.

At this stage, the team should align on the specific types of fairness or equity that are the primary concern, ensuring that responsible data practices are integrated into the process.

For example, if the team is conducting an equity-focused assessment, they might consider questions like:

  • Are certain demographic groups disproportionately facing barriers or gaining benefits from the product?
  • Are there differences in how the product or service is used or experienced by different demographic groups?
  • Do certain groups face higher risks, costs, or limitations in benefiting from the product?

In contrast, a fairness test may be used to evaluate an algorithm. For instance, suppose a new algorithm is developed to decide whether an individual qualifies for a loan. The team might ask:

  • Is there racial disparity in who receives a loan offer? 
  • Relative to past default rates, is there racial disparity in who is approved for a loan? 
  • Are there differences in the average size of loans offered?
  • Does the effort required to secure a loan differ across demographic groups? 
  • Are there disparities in other positive outcomes?
Key Stakeholders

Gather input from a diverse range of stakeholder teams, including:

  • Product
  • Engineering
  • Data Science
  • Legal and policy
  • Communications and marketing
  • Employee resource groups and other relevant working groups
  • Most importantly, the individuals who will be impacted by the product

Engaging with these groups will help support a robust discussion about what fairness and equity means in the specific context of the product being tested.

Once the outcome to be measured is selected, the pre-analysis plan should specify which data points are required. The type of demographic data needed will depend on the dimensions being assessed—for instance, is the evaluation going to measure disparities by national origin, gender, race, or some other attribute? You will want to prioritize the dimensions most relevant to your product; there is no one-size-fits-all solution.

For example:

  • Information on skin tone may be critical for ensuring a camera application is inclusive and performs equitably across its diverse user base.
  • For a personal finance app, segmenting metrics by skin tone would be inappropriate and potentially even be subject to misuse and user harm, barring a very specific reason for doing so.

The team should also identify any existing data relevant to the assessment or fairness test. For example, in the previous loan scenario, past default rates or income might provide valuable context. The goal is to identify all data that might help explain a disparity.

When choosing attributes, consider intersectionality to address the nuanced experiences of people who belong to multiple marginalized groups. Without capturing intersectionality, there is a risk that these individuals’ experiences are lost in the aggregation of one of the identities. For example:

  • A Black woman or a disabled LGBTQ+-identifying individual might face distinct challenges when using your application that might be overlooked if you just analyzed race or LGBTQ+ status in isolation.

The level of granularity (or disaggregation) is another facet of acknowledging the nuanced experiences of your customers. When considering user groups, examine the level of granularity or coarseness a dimension of identity requires.  For example:

  • Is it sufficient to know that a person is “Asian,” or is it crucial to distinguish that they are “Japanese”?
  • Is it enough to know a person has accessibility needs, or should it be specified that they are visually impaired? 
  • Is it adequate to know that an individual selects the “LGBTQIA+” checkbox or is it essential to know that the person is transgender?

Since many identity categories are based on social constructs—many of which are not created by members within those identity categories—it is important to apply responsible data practices to determine when granularity is necessary.

While identity data is valuable for measuring inclusivity, using such data to directly personalize user experiences without explicit, opt-in consent can be harmful. For example:

  • A recommendation engine that incorporates race or ethnicity predictions implicitly assumes that an individual’s preferences are, at least in part, determined by their racial or ethnic identity, which can effectively tokenize them and deny their individuality.
  • Personalization is often better served by emphasizing demonstrated behavior and directly expressed preferences over immutable identity characteristics.

The decision on the level of granularity is not one-size-fits-all: 

  • On the one hand, minimizing data collection can have privacy benefits, simplify the process, and support smaller-scale datasets. 
  • On the other hand, overly coarse granularity risks grouping identities in ways that obscure meaningful differences in the way your product is used.

Deciding on the right balance between intersectionality and disaggregation should be guided by the size of your data set. This is often an iterative process. This resource from the Partnership on AI grapples with the privacy/accuracy trade-offs at length. It can help anticipate challenges during the analysis phase and inform the establishment of feedback mechanisms and monitoring practices to validate that your chosen trade-off yields useful insights.

Key Stakeholders
  • Data scientists can provide guidance on what data is required.
  • Legal teams can assess which variables may have substantive legal implications.

This process ensures that the data collected is both relevant and compliant, setting a strong foundation for equity and fairness evaluations.

If the team is pursuing a fairness test, the pre-analysis plan should specify the statistical approach to assess fairness, once there is agreement on the type of fairness being tested, the groups of interest, and the relevant variables. For example:

  • To assess whether the average outcome for group 1 differs from that of group 2, a t-test of averages can be performed. 
  • If there are confounding variables—e.g., group 1 is more likely to be younger, have longer tenure on the platform, or live in a city—a multiple regression analysis that controls for these variables would be more appropriate. 
  • For products undergoing an A/B test, fairness can be assessed by looking at the causal impact of the product.

Describing statistical tests in advance is considered the gold standard in the social sciences because it enforces discipline and mitigates hindsight bias. This upfront specification helps mitigate the risk of teams inadvertently manipulating testing by running repeated analyses to achieve more favorable results. Even in good faith, a team might adjust new variables after finding a problematic result.

A well-defined pre-analysis plan clarifies the statistical approach from the outset, highlighting any deviations from the planned approach. Responsible data practices ensure transparency in these decisions. Departures are often warranted as new things are learned or new data comes in, but being clear that these are departures from the plan is key.

With the data and testing approach clearly outlined, the pre-analysis plan should anticipate potential outcomes and remediations. 

In the plan, write out all possible findings—e.g., group A does worse than group B; group A is equal to group B; group A does better than group B; and so on. For each of these findings, describe potential remediations, like retraining an algorithm, modifying the product, delaying the product launch, and so on.

Key Stakeholders

All teams should weigh in:

  • Engineering teams can assess technical feasibility.
  • Product teams can evaluate the practicality of remediations.
  • UX teams can gauge the impact of changes on the intended groups.
  • Legal teams can ensure defensibility with regulators.
  • Employee Resource Groups (ERGs) can provide insights into whether proposed changes will have meaningful impacts for their respective communities.

The period before data collection should be devoted to outlining a thorough pre-analysis plan. In doing so, the team can coalesce around a strategy and answer all the relevant questions necessary to embark on responsible data collection.

Once you’ve determined your use cases, established policies, and built safeguards, consider user rights around transparency, consent, and ownership during the data collection process.

Consider Co-Design

Ideally, this process is owned by user experience (UX) stakeholders, such as UX researchers and UX designers. Co-design fosters direct engagement with communities, enabling teams to build with, rather than just for, these groups

Key elements of collaboration include:

  • Aligning research goals with participant needs and experiences
  • Providing fair compensation
  • Utilizing co-design methodologies
  • Sharing results in a way that benefits all involved organizations and individuals

Offer Self-Identification

Tools & Collection Methods

When selecting tools or vendors for data collection, confirm they align with your team’s privacy and security standards, as third-party platforms may have differing safeguards. 

Additionally, determine whether data collection will be conducted in-person, online, within a product experience, or separately. Each method requires careful consideration of user rights, transparency, consent, and data ownership.

Transparency is essential in explaining the purpose for data collection and addressing common concerns such as privacy, data storage, and retention. 

Consider creating a Frequent Asked Questions (FAQ) resource to answer key questions like, Who will have access to this data? and What data will they access? Use clear and accessible language so that all participants, including those with disabilities or limited English proficiency, can understand. Keep communications concise and link to more detailed policies as needed. 

Legal review of company policies and user communication is recommended at this stage. Clearly articulate the value exchange—how data collection will benefit participants— as this fosters trust and helps them make informed decisions.

Informed consent is a critical process often facilitated through written documents, such as Non-Disclosure Agreements (NDAs). However, participants may overlook these materials or be overwhelmed by legal jargon. To promote genuine informed consent, include a verbal explanation in moderated data collection settings, outlining the research approach, data types, access permissions, and any sensitive topics that will be discussed. 

Obtain affirmative consent for each element shared, and emphasize participants’ right to withdraw at any time without any negative consequences. To enhance understanding, consider using visual aids or videos, similar to Patient Decision Aids (PDAs) in healthcare, which help individuals make informed decisions based on their values and preferences.

Anonymity is another important consideration. Clearly communicate the actual level of anonymity participants can expect and their rights regarding data retention and retrieval. Participants should be made aware of the risks associated with sharing identifying information, from severe consequences like exposure to harm due to their identity or experiences, to milder yet still significant risks, like exclusion from systems. Never share personally identifiable information (PII), and ensure participants are aware of this from the outset.

Before launching externally, conduct internal testing with representative groups, such as employee resource groups (ERGs), to gather feedback and make necessary adjustments. 

Always offer participants the option to opt-out or refuse participation in any user research or data collection process. Additionally, individuals should have the option to withdraw their data after it has been collected. This mechanism plays a crucial role in ensuring that data collection is ethical, respects privacy, and adheres to consent principles.

Deciding which groups to include in data collection should align with the predetermined research goals, intended use cases, as well as the groups you may be co-designing with. 

It is important to also address potential biases in the dataset. You may aim for proportional representation to mirror the general population or over-sample specific groups for more granular insights.

For example, oversampling is vital for disaggregating data by race and generating meaningful results for smaller populations like American Indian/Alaska Native and Asian American/Pacific Islander groups, often overlooked in broad categories. Even if you’re targeting a narrow market, it is beneficial to include diverse perspectives to gain a broader understanding or uncover contrasting insights.

We hope this guide serves as a valuable starting point for product and data teams to adopt responsible data practices in advancing Product Equity. By using this resource, teams can better navigate the complexities of ethical data collection, minimize potential harms, and build more inclusive, equitable products. Future iterations will incorporate broader insights and case studies to further refine and strengthen this resource.

We invite you to collaborate with the Product Equity Working Group to help shape industry-wide best practices. Through collective effort, we can accelerate change and work toward a future where more people have access to all that tech has to offer. 

If you are interested in joining the Product Equity Working Group, please email us at TechAccountabilityCoalition@AspenInstitute.org. For more Product Equity resources or to learn about the Product Equity Working Group, visit our Resource Hub.

We extend immense gratitude to the members of the Product Equity Working Group, both past and present, for sharing their time and expertise over the past two years to develop this foundational guide on responsible data practices. Their perspectives were crucial in ensuring we approached this work with care and consideration. A special thanks to our Chair, Dr. Madihah Akther (Product Inclusion Manager, PayPal), who drove this initiative forward ensuring care and consideration in every step of the process.

Thank you to the subject matter experts who reviewed drafts of this primer and were immensely generous with their time and expertise:

  • Miranda Bogen
  • Daniel Camacho
  • Dr. Symone Campbell
  • Dr. Jen King
  • Eliza McCullough
  • Jaryn Miller
  • Tammarrian Rogers
  • Alex Rosalez
  • Dr. Dan Svirsky
  • Rebekah Tweed

This effort was led by the following team members at Aspen Digital: 

  • Zaki Barzinji, Senior Director, Empowered Communities
  • Cindy Joung, Product Equity Lead
  • Shreya Singh Hernández, Senior Research Manager, Tech Accountability Coalition

Responsible Data Practices for Product Equity by Aspen Digital is licensed under CC-BY 4.0.