Lineage Entries
EU AI Act by Anna Lenhart
EU Guidelines on General-purpose AI Models by Stan Adams
Safe, Secure, & Trustworthy Development & Use of Artificial Intelligence (EO 14110) by Dr. Ranjit Singh
California Bill SB-53 by Miranda Bogen
EU AI Act

Anna Lenhart
Policy Fellow at the Institute for Data, Democracy, and Politics at George Washington University
Read about Anna
Anna Lenhart is a Policy Fellow at Institute for Data Democracy and Politics (IDDP) at George Washington University and a researcher at the University of Maryland Ethics and Values in Design Lab. Her research focuses on public engagement in tech policy and the intersections of privacy, transparency, and competition policy. She most recently served in the House of Representatives as Senior Technology Legislative Aid to Rep Lori Trahan (117th Congress) and as a Congressional Innovation Fellow for the House Judiciary Digital Markets Investigation (116th).
Prior to working for Congress, Anna was a Senior Consultant and the AI Ethics Initiative Lead for IBM’s Federal Government Consulting Division, training data scientists and operationalizing principles of transparency, algorithm bias and privacy rights in AI and Machine Learning systems. She has researched the human right to freedom from discrimination in algorithms, public views on autonomous vehicles and the impact of AI on the workforce. She holds a master’s degree from Ford School at University of Michigan and a BS in Civil Engineering and Engineering Public Policy from Carnegie Mellon University. Prior to graduate school, Anna was the owner of Anani Cloud Solutions, a consulting firm that implemented and optimized Salesforce.com for non-profit organizations.
Definition Text
‘General-purpose AI model’ means an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market
This is a Lineage Definition
Source text: Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)
Place of origin: European Union
Enforcing entity: European Commission EU AI Office
Date of introduction: April 21, 2021
Date enacted or adopted: August 1, 2024
Current status: Enacted (with staged enforcement)
Motivation
The European Commission released the proposed AI Act in April 2021, introducing a risk-based framework organized around the intended purpose of an AI system—defined in line with OECD stakeholder processes. When OpenAI’s ChatGPT and similar generative AI tools captured the zeitgeist in late 2022, EU policymakers suddenly faced the challenge of fitting technologies capable of multiple, shifting purposes into a framework designed for narrowly defined use cases. During the subsequent “trialogues,” the Council, Commission, and Parliament deliberated ways to include general-purpose AI models up until the last minute. The Parliament proposed a series of public-facing disclosures for all foundation models (defined similarly to GPAI above) regardless of risk category. The Council wanted to allow for self-regulation through a code of conduct fearing that regulation would hinder innovations especially among emerging European firms. The Commission presented a tired approach, placing more burdens on models with high impact capabilities. The final version included a tiered approach.
Approach
Requirements are tiered based on GPAI models, GPAI models with systemic risks (GPAISR) and GPAI systems.
All GPAI models must follow a base level of obligations including maintaining up-to-date technical documentation of the model, making available relevant documentation to downstream providers (those who integrate the GPAI model into AI systems), implementing a copyright / text and data mining (TDM) policy to ensure the model respects EU copyright law, and providing a public summary of the content used for training (e.g., data sources, nature of the data).
If a GPAI model is (or is presumed to) contribute to systemic risks (GPAISR), then additional obligations kick in including that the provider must notify the Commission (or relevant authority) that the model presents systemic risk, conduct model evaluation (including adversarial testing, misuse potential, emergent behavior, etc.), perform risk assessment and risk mitigation for systemic-level risks (including but not limited to public health, safety, societal impacts, fundamental rights), report serious incidents (e.g., accidents, misuse, malfunctions) and corrective measures to the AI Office / national supervisory authorities, ensure cybersecurity protections and physical safeguards appropriate to the scale and impact of the model, document and report estimated energy consumption of training and deployment.
When a GPAI model is used within an AI system (i.e., coupled with other models, infrastructures and user interfaces), the system may fall under the broader “high-risk AI systems” regime in the Act (depending on its purpose/use case) and may trigger additional obligations. The tiered obligations focus first on the model provider side. But downstream deployers/integrators of GPAI systems still need to check whether their use of the model triggers “high-risk” classifications for the system and comply with those requirements.
Open source GPAI models do not have the same reporting requirements as proprietary GPAI models, showing the EU’s recognition of built-in transparency associated with open source and decentralized development. However, if the GPAI falls into the systems risk category, proprietary and open source models are treated the same.
Weakness 1: floating point operations per second (FLOPS) are a bad proxy for systemic risks.
Scholars argue that there are systemic risks that occur in all GPAI models regardless of computational power, most notably: misinformation, bias, work displacements, privacy violations, and content that inspires physical harm. Computer scientists note that GPAI capabilities can be improved via data quality, optimization breakthroughs, architectural adjustments, and other factors that do not rely on compute. The threshold outlined in the systemic risk definition may simply incentivize model developers to optimize models to fall below this threshold while avoiding attempts to make models safer.
Weakness 2: The systemic risk definition states it must be “specific to the high-impact capabilities of general-purpose AI models” which is vague and causes confusion.
There are many systemic risks to the Union from GPAI systems that are also present in AI systems based on algorithmic decision systems and machine learning systems that have been in production for decades (e.g. discrimination, dark patterns, errors, etc.). When these risks are present in a GPAI, are they “specific to the high-impact capabilities”? This question was actively debated during deliberations related to the code of practice.
Weakness 3: GPAI models that are not considered GPAISR can be used in high-risk AI systems, creating unfair/unworkable requirements for providers.
According to the Al Act, any person will be considered the provider of a high-risk system “if they modify the intended purpose of an Al system, including a general-purpose Al system, which has not been classified as high-risk and has already been placed on the market or put into service in such a way that the Al system concerned becomes a high-risk Al system.” Boine and Rolnick (2025) offer the example of a school teacher that decides to use an LLM to evaluate their students. It is unclear if the teacher/school is then considered a provider of a high-risk system and required to complete related risk management and documentation requirements. Advocates for strong regulation of GPAI emphasize that developers have the deepest insights into how these models are trained and the potential for harm.
One way to address this challenge would be to require the provider of the GPAI model to mitigate risks. The challenge with that approach is that a provider of a GPAI may not be able to predict every high-risk context in which their model may be deployed. For the contexts they may predict, they would likely have to train and tune a different model for each high risk context, meaning models would no longer be “general purpose.”
Reception
Implementation is underway at the time of this writing.
The ambiguity in “systemic risks” has frustrated stakeholders from a range of ideological standpoints, with some arguing that major risks, particularly from GPAI systems will be left unchecked and reliant on self-reporting and compliance check boxes, and others arguing that a range of deployers from smaller/less resourced organizations will be caught up in high risk scenarios and struggle to comply. The New Code of Practice added some additional clarity but there are still issues (described in the next section on EU Guidelines on General-purpose AI Models).
The European Commission is facing pressure to simplify not only the AI Act but a range of EU tech regulation as part of a Digital Omnibus.
The EU AI Act has the potential to inform standards for how the world regulates AI systems including GPAI; it’s too early to tell what the exact impact of these definitions will be.
EU Guidelines on General-purpose AI Models

Stan Adams
Lead Public Policy Specialist for North America at the Wikimedia Foundation
Read about Stan
Stan Adams is the Wikimedia Foundation’s Lead Public Policy Specialist for North America. He advocates for laws and policies to help projects like Wikipedia thrive and works to protect the Wikimedia model from harmful policies. This work includes defending legal protections, like Section 230, that protect volunteer editors from frivolous lawsuits and advocating for stronger laws to protect the privacy of people who read and edit Wikipedia. Stan has spent nearly a decade working on tech policy issues including digital copyright, free expression, privacy, and AI. Prior to his role at the Wikimedia Foundation, Stan served as a general counsel to a US Senator and as a deputy general counsel at the Center for Democracy and Technology. Stan enjoys reading, listening to music, playing games, and spending time in the woods.
Definition Text
(17) Based on the considerations above, an indicative criterion for a model to be considered a general-purpose AI model is that its training compute is greater than 10^23 FLOP and it can generate language (whether in the form of text or audio), text-to-image or text-to-video.
(18) This threshold corresponds to the approximate amount of compute typically used to train a model with one billion parameters on a large amount of data. While such models may be trained with varying amounts of compute depending on how much data is used, 10^23 FLOP is typical for models trained on large amounts of data as of the time of writing (see the examples in Annex A.3).
(19) The modalities are chosen based on the fact that models trained to generate language – be it via text or speech (as a type of audio) – are able to use language to communicate, store knowledge, and reason. No other modality confers such a wide range of capabilities. Consequently, models that generate language are typically more capable of competently performing a wider range of tasks than other models. Although models that generate images or video typically exhibit a narrower range of capabilities and use cases compared to those that generate language, such models may nevertheless be considered to be general-purpose AI models. Text-to-image and text-to-video models are capable of generating a wide range of visual outputs, which enables flexible content generation that can readily accommodate a wide range of distinct tasks.
(20) If a general-purpose AI model meets the criterion from paragraph 17 but, exceptionally, does not display significant generality or is not capable of competently performing a wide range of distinct tasks, it is not a general-purpose AI model. Similarly, if a general-purpose AI model does not meet that criterion but, exceptionally, displays significant generality and is capable of competently performing a wide range of distinct tasks, it is a general-purpose AI model.
Lineage Definition
‘General-purpose AI model’ means an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market
Source text: Guidelines on the scope of the obligations for general-purpose AI models established by Regulation (EU) 2024/1689 (AI Act)
Place of origin: European Union
Enforcing entity: European Commission
Date of introduction: July 18, 2025
Date enacted or adopted: August 2, 2025
Current status: Enacted with staged enforcement (with rules for high-risk AI systems implemented later as part of staged enforcement)
Motivation
Generally, the Guidance attempts to clarify the Commission’s intended scope and application of the definition by including the rationale for certain aspects of the definition and by focusing the definition to capture models the Commission currently wishes to regulate, but to regulate differently than “high-risk” models. By using a guidance document, rather than including these additional details in the definition, the Commission retains the ability to modify its guidance over time and to adapt its scope to fit future models. For example, if the scale of computing power needed to train a model or the typical number of parameters changes, the Commission will be able to issue new guidance rather than having to go through the more rigorous process of adopting a new definition. Although this approach provides less certainty for potentially regulated entities, the tradeoff is a more “future-proof” approach to regulation. However, given the rapid pace of development and the closed-door nature of emerging research at the leading firms, it is not clear whether this approach, despite its flexibility, will allow the Commission to adjust its regulatory scope quickly enough to keep up. More likely, the Commission’s evolving guidance will always be slightly behind (and perhaps not applicable to) the leading edge of development.
Approach
TL;DR:
- The Commission relies on “compute” and “modality” as primary factors in designating GPAI models.
- Assumptions based on frontier models may not hold true over time, but guidance can be more easily adjusted than legislative text.
- Edge cases will be difficult to capture under any approach.
Two factors to add certainty: compute and modality.
In general, the Commission’s approach is to augment the underlying definition from the AI Act with additional details, attributes, and thresholds to clarify the Commission’s current intentions for the scope of the regulation of “general-purpose AI models” under the Act. First, the Guidance adds a technical threshold based on the amount of compute used to train the model, expressed in the number of floating point operations or FLOPs. Second, the Guidance focuses on the “modalities” of various models, specifically noting that models built to generate language outputs, whether text or speech, typically have the broadest range of capabilities and use cases and are therefore more likely to be considered “general-purpose” than models that generate images or video. However, the Commission also explains that neither of these factors alone is dispositive—regardless of modality or training compute, any given model may or may not be considered “general-purpose” under the Guidance.
To mitigate the uncertainty around model classification under both the AI Act definition and the Guidance, the Commission adds a series of hypothetical models with various combinations of the two factors (compute and modality) to illustrate when models fall in or out of the scope of the definition. The single example of an in-scope model is positive on both factors, and so the Commission says that the model “is likely a general-purpose AI model” without being definitive. The lengthier set of examples of out-of-scope models all highlight when the Commission considers a model’s capabilities to be too narrow to be “general-purpose,” including models designed to “transcribe speech to text,” “increase the resolution of images,” and “generate music,” even if the training compute of those models exceeds the FLOP threshold.
Assumptions may be flawed, but they can be adjusted.
Several questionable assumptions underlie the Commission’s approach, but issuing new guidance is far easier than amending the legislative text.
First, the Guidance assumes that training compute, which it describes as a number proportional to the product of parameters and training examples, is a reliable proxy for a model’s range of capabilities. The Commission further relies on compute as a proxy for capability when assessing whether a model presents “systemic risk,” presuming that models requiring 10^25 FLOPs present such risks. See the discussion on FLOP thresholds in the previous section addressing the EU AI Act GPAI definition. The Guidance provides one rationale for using compute as a factor in assessing capability: developers will know how much compute was required in development, which provides at least some certainty when assessing a model’s classification. The FLOP threshold may also have been chosen as a proxy to capture the most well-resourced firms—those able to bear the expense of training, refinement, and deployment. Regardless of motivation, the Commission’s choice to set a FLOP threshold, at least, is relatively easy to adjust compared to rewriting the underlying definition.
Second, the Guidance assumes that a model’s range of capabilities will necessarily be apparent upon its introduction to the market. This assumption depends on several variables. For example, it is possible that some firms will underestimate or not adequately test their models’ capabilities before making them available, whether to avoid regulation or for other reasons. It is also possible that sophisticated systems become able to deceive researchers trying to assess their capabilities. There have already been reports that some models exhibit deceptive behavior when tested, leading to speculation that more sophisticated models may be able to hide any number of attributes including their relative “alignment” with human goals. In any case, the Commission may face difficulty refining the definitions’ scope in this regard due to the wording within the definition itself.
Curiously not addressed is the idea of “competence.” The Commission did not address its intended scope or application of the word “competently” as used in the AI Act definition. This leaves significant room for interpretation as to whether a model performs above or below any given threshold of competence. In future iterations of official guidance, perhaps the Commission will try to clarify what it means by competence. Further, inclusion of the word “competently” reveals another assumption the Commission makes about AI models and their potential risks: that only “competent” models pose risks worthy of regulation. Time may tell whether this is a valid assumption, but at least where humans are concerned, incompetence is often a cause for regulation.
The regulator’s dilemma: is 80% good enough?
Despite the potential flaws in its approach, the Commission’s gamble that compute and modality will serve as effective proxies for AI model capability is, at least, understandable. In a rapidly developing field with many unknowns, it is likely impossible to tailor the scope of a regulatory definition to account for all potential risks. There will always be outliers, edge cases, and misuses of otherwise safe technology that even a case-by-case approach would struggle to address. Is it better, then, to address most of the foreseeable risks or invest significantly more resources to account for the long tail? This is a dilemma for every regulator.
Overall, the Commission has taken an approach typical to many regulators—set broad initial parameters to capture the intended features of a technology it wishes to regulate, then create more focused interpretations of the scope of the rule to address an evolving, difficult to predict reality. In this case, there are many reasons why the Commission may struggle to adapt the AI Act’s definition to respond to the current state of the art. Perhaps most prominent—the rate of tech development already outpaces that of even the nimblest regulatory body.
Reception
To date, there has not been significant public feedback on the Guidance. Several law firms have published blogs describing what the Guidance does, with some praise for the added clarity it provides. Some have noted that the Guidance was issued very late, considering that the AI Act obligations for GPAI took effect only 6 weeks later.
Safe, Secure, & Trustworthy Development & Use of Artificial Intelligence (EO 14110)

Dr. Ranjit Singh
Director of AI on the Ground at Data & Society
Read about Ranjit
Dr. Ranjit Singh is the director of Data & Society’s AI on the Ground program, where he oversees research on the social impacts of algorithmic systems, the governance of AI in practice, and emerging methods for organizing public engagement and accountability. His own work focuses on how people live with and make sense of AI, examining how algorithmic systems and everyday practices shape each other. He also guides research ethics at Data & Society and works to sustain equity in collaborative research practices, both internally and with external partners. His work draws on majority world scholarship, public policy analysis, and ethnographic fieldwork in settings ranging from scientific laboratories and bureaucratic agencies to public services and civic institutions. At Data & Society, he has previously led projects mapping the conceptual vocabulary and stories of living with AI in/from the majority world, framing the place of algorithmic impact assessments in regulating AI, and investigating the keywords that ground ongoing research into the datafied state.He holds a PhD in Science and Technology Studies from Cornell University. His dissertation research focused on Aadhaar, India’s biometrics-based national identification system, advancing public understanding of how identity infrastructures both enable and constrain inclusive development and reshape the nature of Indian citizenship.
Definition Text
The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by:
(i) substantially lowering the barrier of entry for non-experts to design, synthesize, acquire, or use chemical, biological, radiological, or nuclear (CBRN) weapons;
(ii) enabling powerful offensive cyber operations through automated vulnerability discovery and exploitation against a wide range of potential targets of cyber attacks; or
(iii) permitting the evasion of human control or oversight through means of deception or obfuscation.
Models meet this definition even if they are provided to end users with technical safeguards that attempt to prevent users from taking advantage of the relevant unsafe capabilities.
Lineage Definition
‘General-purpose AI model’ means an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market
Source text: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
Place of origin: Federal, USA
Enforcing entity: Executive Departments & Agencies
Date of introduction: November 1, 2023 in EO 14110
Date enacted or adopted: November 1, 2023
Current status: Revoked January 20, 2025
Motivation
These definitions emerged in a rapidly shifting landscape of AI capabilities, shaped by the generative AI surge of 2022–2023 and a quickening international coordination—the G7 Hiroshima AI process, the finalization of the EU AI Act—even as differences in scope and emphasis saw the EU orient toward fundamental rights and societal impacts while the US leaned on national security. The EU named the phenomenon directly as “general-purpose AI models,” tying obligations to systems whose capabilities travel across tasks and sectors (See the discussion on the EU AI Act GPAI definition for further details).
The US Executive Order (EO) 14110 instead coined “dual-use foundation models” to single out a narrower slice of frontier systems: large, broadly capable models whose performance on certain tasks (for example, in chemical, biological, radiological, or nuclear weapon design; sophisticated cyber operations; or evasion of human control) creates national security-relevant misuse risks, including when model weights are widely available. The definition is coupled to a reporting architecture that treats training compute and cluster capacity as proxies for power, directing the Department of Commerce to monitor very large training runs, models trained primarily on biological sequence data, and unusually capable compute clusters.
Even after the EO’s revocation by the subsequent administration, this dual-use framing has persisted in Commerce proposals and NIST guidance. In practice, the definition provides a regulatory template: a security-centred, compute-gated way of specifying which foundation models merit public oversight and why. It marks a shift from regulating what AI does to governing what it can do and the externalities that follow.
Approach
Lineage in the governance of dual-use technologies.
The EO’s definition of “dual-use foundation models” borrowed a long-standing security concept and narrowed it for the AI context. In long-running governance debates, “dual-use” refers to technologies and research that can be turned toward both beneficial and harmful ends: fertilizers and chemical weapons, nuclear power and bombs, viral genetics and engineered pathogens, intrusion tools for securing networks and for breaking into them. Governance efforts in these domains have tried to prevent malign repurposing without shutting down underlying science, using a mix of treaties, export controls, dual-use research of concern (DURC) review, and professional ethics to manage the “deviation of intent” from legitimate research to misuse.
The EO picked up that lineage and applied it to foundation models, but with a specific emphasis. Rather than defining dual-use broadly as “beneficial and harmful potential,” the order identified a subset of models whose capabilities intersected with a defined set of national security concerns—chemical, biological, radiological, nuclear, cyber, and deceptive risks. A dual-use foundation model, in this frame, was: (a) a large, broadly capable model trained on extensive data (a foundation model), which (b) either exhibited or could be readily modified to exhibit high-level performance on tasks that posed serious risks to national security, national economic security, or national public health and safety, with concrete examples in CBRN design, sophisticated cyber operations, and evasion of human control. The definition was indifferent to stated intent or safeguards; a model could qualify even if the provider had attempted to prevent misuse.
Two-layer structure: task capabilities and compute thresholds.
The approach was structured in two layers. Qualitatively, it centered capability at specific dangerous tasks rather than general social harm: disinformation, privacy and intellectual property (IP) leakage, labor impacts, or discrimination were treated elsewhere in the EO, and did not anchor the dual-use definition itself. Quantitatively, it coupled that capability frame to compute-based triggers. Training runs above 10^26 FLOPs were used as a gate for general dual-use models in Commerce’s proposed reporting rules, with a lower 10^23 FLOP threshold for models trained primarily on biological sequence data, and the same logic was extended to unusually capable clusters (high-bandwidth, high-throughput compute). In effect, compute and cluster characteristics were treated in a manner similar to “special material” in earlier dual-use regimes, the way fissile material or certain pathogens functioned as markers of heightened concern in nuclear and bio governance.
A response to the AI safety debates.
Seen through the wider dual-use governance debates, this move exhibited both similarities and differences from earlier regimes. It was similar in that it tried to create a monitored perimeter around the most consequential AI models, using reporting, red-teaming, and export-style oversight of hardware as early-warning tools, much as non-proliferation and DURC frameworks do in nuclear and life sciences. In the AI context, that perimeter-building was shaped by AI safety debates that elevated red-teaming of frontier models for catastrophic misuse—particularly biosecurity and cyber threats—as a central governance practice, even as public-interest and civil-rights groups were pushing to broaden red-teaming toward broader sociotechnical harms. It was different from earlier regimes in at least two ways. First, the “object” of regulation was intangible and easily copied: model weights and training code are far harder to track than uranium stockpiles, a challenge that has already been flagged previously for cyber tools. Second, the EO’s dual-use definition bracketed off many of the dual-use concerns emphasised in human-rights-oriented debates on AI (surveillance, scalable persuasion, repression, structural discrimination), and instead codified a relatively narrow, security-first slice of the dual-use dilemma.
Using the definition as a targeted policy instrument.
As a definitional strategy, “dual-use foundation model” was less a general theory of AI dual-use and more a targeted instrument for frontier scenarios involving catastrophic risk. It marked out a class of models where the US government claimed a particular stake in visibility and assurance, and it built that category using tools inherited from earlier dual-use regimes: task-based risk criteria, capability thresholds, and a reporting perimeter rather than blanket prohibition. Smaller or domain-specific systems, or models that primarily raised concerns about sociotechnical harms such as surveillance or labor exploitation, typically fell outside this national security framing even if they raised serious policy questions. Thus, it left these more diffuse, rights- and labor-related dual-use harms to other regulatory regimes, from sectoral and civil-rights law to labor and consumer protection.
To conclude, this definition is well suited to policy instruments aimed at national security and catastrophic misuse—for example, reporting obligations tied to frontier-scale training—but it is not sufficient on its own for legislation focused on everyday civil-rights impacts of AI. In those domains, “dual-use foundation model” risks narrowing the field of concern to a small subset of harms and a particular way of measuring “frontier” AI.
When to use this definition in legislation:
USE “dual-use foundation model” when legislating on:
✓ CBRN weapons risks
✓ Offensive cyber capabilities
✓ AI deception/oversight evasion
✓ Frontier model reporting requirements
DON’T USE for legislation primarily about:
✗ Workplace discrimination
✗ Surveillance/privacy
✗ Consumer protection
✗ Labor displacement
Reception
The EU and US approaches resonate with each other in recognizing foundation models as a pivotal regulatory concern, however, they diverge in regulatory philosophy (broad vs narrow scope, rules applied ex ante before market release vs oversight ex post after deployment) and immediate priorities (rights vs security). In the US, industry tended to treat the dual-use definition as a manageable, security-focused overlay on work they were already doing, not as a general template for AI regulation. They were already investing in red-teaming and security reviews for frontier models, and the EO largely formalized those practices, even as industry commenters warned about the volume and sensitivity of technical information they would have to disclose, and analysts noted that the same compute thresholds could later underpin export-style controls.
Civil society responses were more divided. Organizations focused on catastrophic and national security risks welcomed the move to single out frontier models with CBRN and cyber capabilities, and saw the dual-use definition as overdue recognition of those hazards. Rights, labor, and digital justice groups, by contrast, criticized the EO for structuring its most concrete model category around national security alone, leaving surveillance, discrimination, and workplace impacts to weaker or more diffuse parts of the policy landscape. For them, the dual-use definition risked crowding out a broader conversation about AI’s everyday harms.
Technical experts and standards bodies were left with the practical task of operationalizing this definition alongside the EU’s GPAI and systemic-risk concepts: harmonizing compute thresholds across NIST and EU guidance, developing evaluation methods that speak to both security-oriented and rights-oriented concerns, and creating benchmarks that travel across jurisdictions. Emerging work in ISO/IEC and in collaborations between NIST’s AI Safety Institute and its Center for AI Standards and Innovation can be read as attempts to build this interoperability, even as the underlying regulatory philosophies remain different. In this sense, EU definitions of GPAI and systemic risk have continued to shape the technical criteria that US actors have worked with. Effective governance of foundation models will likely depend on translating these divergent regulatory logics into shared technical reference points that companies and regulators can use on both sides of the Atlantic.
Additional Resources
California Bill SB-53

Miranda Bogen
Director of the AI Governance Lab at CDT
Read about Miranda
Miranda Bogen is the founding Director of CDT’s AI Governance Lab, where she works to develop and promote adoption of robust, technically-informed solutions for the effective regulation and governance of AI systems.
An AI policy expert and responsible AI practitioner, Miranda has led advocacy and applied work around AI accountability across both industry and civil society. She most recently guided strategy and implementation of responsible AI practices at Meta, including driving large-scale efforts to measure and mitigate bias in AI-powered products and building out company-wide governance practices. Miranda previously worked as senior policy analyst at Upturn, where she conducted foundational research at the intersection of machine learning and civil rights, and served as co-chair of the Fairness, Transparency, and Accountability Working Group at the Partnership on AI.
Miranda has co-authored widely cited research, including empirically demonstrating the potential for discrimination in personalized advertising systems and illuminating the role artificial intelligence plays in the hiring process, and has helped to develop technical contributions including AI benchmarks to measure bias and robustness, privacy-preserving methods to measure racial disparities in AI systems, and reinforcement-learning driven interventions to advance equitable outcomes in products that mediate access to economic opportunity. Miranda’s writing, analysis, and work has been featured in media including the Harvard Business Review, NPR, The Atlantic, Wired, Last Week Tonight, and more.
Miranda holds a master’s degree from The Fletcher School at Tufts University with a focus on international technology policy, and graduated summa cum laude and Phi Beta Kappa from UCLA with degrees in Political Science and Middle Eastern & North African Studies.
Definition Text
“Foundation model” means an artificial intelligence model that is all of the following:
(1) Trained on a broad data set.
(2) Designed for generality of output.
(3) Adaptable to a wide range of distinctive tasks.
“Frontier model” means a foundation model that was trained using a quantity of computing power greater than 10^26 integer or floating-point operations.(2) The quantity of computing power described in paragraph (1) shall include computing for the original training run and for any subsequent fine-tuning, reinforcement learning, or other material modifications the developer applies to a preceding foundation model.
Lineage Definition
‘General-purpose AI model’ means an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market
Source text: Transparency in Frontier Artificial Intelligence Act
Place of origin: California, USA
Enforcing entity: State Attorney General & Government Operations Agency
Date of introduction: January 7, 2025 in SB-53
Date enacted or adopted: September 29, 2025
Current status: Enacted
Motivation
While a variety of AI definitions describe basic characteristics of the technology, the definition of “foundation model” reflected in SB-53 was responsive to the same advances in AI that enabled models to perform a variety of tasks without being specifically designed or trained to do so that motivated the inclusion of general-purpose AI models in the EU AI Act. Such capabilities were a notable and somewhat sudden deviation from machine learning models and earlier artificial intelligence systems that could conduct perception, pattern detection, and prediction tasks but that generally required developers to design tools for specific purposes.
Some have also been concerned that particularly capable general-purpose models might pose serious, systemic, and even catastrophic risks and so require heightened due diligence, safeguards, or oversight. In particular, AI safety researchers became concerned that models themselves might become so capable that controlling their behavior and proliferation would be critical to preventing dramatic harm to humanity. Without reliable ways to predict the conditions that would lead to models demonstrating these advanced capabilities, compute thresholds become a popular—albeit contested—proxy for identifying highly capable models, driven by early research suggesting correlation between training compute and model capability. SB-53’s definition of frontier models differs from similar, compute-based definitions by including within its threshold the amount of computation not only used for initial model training, but also for fine-tuning and other post-training modifications, which research has increasingly shown can meaningfully affect model behavior and capability.
Approach
While it is slightly more nuanced than similar predecessors, SB-53 still targets a narrow set of developers of advanced AI models. The bill’s definition for foundation models draws from a widely cited paper by the Center for Research on Foundation Models (CRFM) at Stanford University which coined the term and defined it as “any model that is trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.” The bill adapts this definition by removing technical implementation details and instead referring to models “designed to produce a generality of outputs”—a revision likely influenced by deliberations surrounding the EU Artificial Intelligence Act. The definition is sufficiently general to capture models that may not have a specific goal or purpose but may still present issues of concern to policymakers. On the other hand, concepts like “broad data set,” “generality of output,” and “wide range of distinctive tasks” are under-defined and will likely be contested. Most of SB-53’s hallmark requirements apply to frontier models, a small subset of foundation models determined by the amount of computing power used to train or modify them, given the risks some stakeholder communities presume such models will introduce. In this way, the definition deviates from similar ones, which have typically tied compute thresholds only to model training, presumably given that many more recent advancements in model capability have come not purely through increasing the data and compute used to train models but also the amount of compute spent on efforts that follow the initial pre-training phase. However, even this more nuanced definition was not intended to cover a wide variety of what some perceive to be more “mundane” risks of AI, such as its use in consequential domains or use-cases in ways that could affect people’s economic security, health, or freedom.
The definition may not cover increasingly popular AI development techniques or the broader systems powered by covered models. It is not clear whether this definition would account for the increasingly substantial amount of compute used for model inference and reasoning—that is, running the model and integrating so-called “chain of thought” into the model’s operation once it has been deployed. And while the bill’s definition of foundation and frontier models covers certain large-scale models such as GPT-4 or Claude 4.5, it may not cover elements of the systems within which such models are deployed, like ChatGPT or Claude that involve additional elements such as scaffolding, model routers, safety filters, or other features that interact with the foundation model itself. This definition would also not account for narrower models that pose foreseeable harm, smaller models that have been “distilled” from larger, covered ones, or ensembles of multiple small models or systems that interact in a way that produces heightened capabilities. Periodic updates are important to ensure threshold-based definitions still capture the technology or actors intended to be in scope.
Rather than imposing requirements on specific AI models, SB-53 focuses mostly on entities developing these models. As such, definitions of foundation model and frontier model feed into companion definitions for “frontier developer” and “large frontier developer” (distinguished by annual gross revenue). Some have commented that actor-based thresholds better capture large developers while avoiding placing burden on startups and other small/medium-sized businesses, though such an approach may leave gaps where smaller organizations manage to build models that either surpass defined thresholds or stay within them but nevertheless pose a high risk of harm. To account for this limitation, the law directs the California Department of Technology to annually review definitions of frontier model, frontier developer, and large frontier developer, and submit recommendations to the legislature to consider changes. Such a directive is critical given that threshold-based definitions are likely to be quickly outdated and are unlikely to capture all models of concern.
The definition of “frontier model” is quite narrow, on purpose. SB-53 was sponsored by California State Senator Scott Wiener, following Governor Gavin Newsom’s veto of a previous, related, legislative effort (SB-1047) that aimed to mitigate certain safety risks the sponsors worried would be posed by capable AI models like chemical, biological, radiological, and nuclear threats. While that bill passed both the state House and Senate, subsequent backlash from a wide variety of industry, academic, and public interest stakeholders led to the governor’s veto and the subsequent convening of an expert policy panel to advise California on how to regulate AI. The resulting California Report on Frontier AI Policy heavily influences the substance of SB-53. In crafting SB-53, Senator Wiener explained his intent to impose requirements on only “a small number of well-resourced companies, and only to the most advanced models,” a goal effectuated by the high compute threshold in the bill’s definition of frontier model. The bill’s definitions also serve to exempt from its testing obligations simpler models perceived by the bill’s supporters to pose less risk, in part, perhaps, to defend the bill against claims that it would impose undue burdens on innovations critical to both economic growth and national security.
The bill’s substantive requirements are a start, but may not be as effective in changing safety practices as its proponents hope. The theory of change around bills like SB-53 seems to be that if covered entities that develop a potentially harmful technology are required to disclose their risk management practices, they will need to define and operationalize those practices to begin with, and will therefore be more likely to manage the risks they encounter. However, if a law requires insufficiently detailed or overly narrow documentation or fails to provide sufficient resources to enable active and savvy enforcement, it can be all too easy for organizations to performatively comply without meaningfully changing their safety practices.
Reception
SB-53 received support from Encode AI, Economic Security Action California, the Secure AI Project, and prominent AI researchers Geoffrey Hinton and Yoshua Bengio, who argued the bill would advance transparency and accountability from frontier AI companies. Former White House AI advisor Dean Ball praised its technical sophistication, and frontier AI lab Anthropic endorsed the bill after substantive involvement. OpenAI and industry groups like CTA and Chamber of Progress pushed back against the obligations it imposed. Some critics questioned the FLOP threshold in its definition of frontier model, though less vocally than in earlier compute-based efforts—likely because similar (and blunter) approaches had appeared in other prominent legislation and the bill includes mechanisms for periodic threshold reviews.
Additional Resources
- California Senate Bill 53 (2025)
Was not updated to account for amendments. - Assessing AI: Surveying the Spectrum of Approaches to Understanding and Auditing AI Systems (2025)
Defining Technologies of our Time: Artificial Intelligence © 2026 by Aspen Digital. This work in full is licensed under CC BY 4.0.
Individual entries are © 2026 their respective authors. The authors retain copyright in their contributions, which are published as part of this volume under CC BY 4.0 pursuant to a license granted to Aspen Digital.
The views represented herein are those of the author(s) and do not necessarily reflect the views of the Aspen Institute, its programs, staff, volunteers, participants, or its trustees.

