The rapid evolution of agentic AI, the kind that can act independently, make decisions, and chain thoughts together to seek goals, calls for a shift in how we think about cybersecurity. This is no longer just about models that generate content. This is the real-time emergence of systems that behave more like employees. Not mere technology or tools, but something else entirely.
At this year’s US Cybersecurity Group Summer Meeting, led by Aspen Digital, I provided opening statements for a session of cybersecurity experts exploring how rising agency in AI is reshaping both cyber offense and defense capabilities. I wanted to share some of my personal views here to recap my thoughts and continue the conversation beyond the forum walls.
As we strengthen cyber defenses for all organizations, we need to imagine the potential impact of current and future generations of artificial intelligence operating agentically.
These systems won’t just assist with security tasks; they’ll autonomously perform a wide range of actions traditionally handled by humans, on both the offensive and defensive sides. To navigate this future responsibly, we need to understand the benefits, risks, and challenges it will introduce to how we build, develop, and trust our IT operations and cyber programs.
From models to meaning
There are many dimensions to the conversation about AI. To build meaningful dialogue, it helps to start with a shared mental model. One useful way to frame the challenge is through the AI Tech Stack model developed by Kemba Walden and Devin Lynch at Paladin. They establish five layers of the stack: Data, Model, Infrastructure, Application, and Governance. For each layer, they identify key security risks, example vulnerabilities, and the potential impact when such vulnerabilities are exploited.
To build on this, we can distinguish concerns about generative and agentic AI from their technological predecessors by looking at a resource called Model Cards, or System Cards. Introduced around 2018, they document the performance characteristics of AI models, focusing on intended use cases, quality metrics, how those metrics were evaluated, training data sources, and ethical considerations.
Back then, the “ethical considerations” sections were relatively brief. Today, the “Alignment Assessment” sections have become increasingly longer and more interesting as a source of philosophical debate and social media sensation, exploring the dynamics of how AI systems behave in the context of human interaction. Examples of this would include a range of concerns from intentional or unintentional bias in the models that could harm or favor certain groups, to how the system manages and explains its thinking in situations of conflicting priorities between human principals, and even to recognizing, flagging, and resolving conflicts that could emerge between humans and the systems themselves.
Alignment challenges themselves aren’t new, but they are becoming more audacious. We’ve seen cases where AI systems engaged in some eyebrow-raising actions, including examples involving blackmail or self-exfiltration. These incidents hint at behaviors that imply a self-preservation drive, but scientifically, we don’t yet have a solid understanding of what “self” means or could mean for an AI system.
Where science hasn’t yet caught up, it’s our nature to use cultural references, metaphors, and parables to try to understand complex concepts like agentic AI and implications of its use. It often feels inevitable that any ranging exploration about AI arrives at mentions of Skynet or the Terminator. I’ll resist that rut, but I will use other comparisons designed to help us think more critically and constructively about the path ahead.
Be careful what you prompt for
Simon Wardley, a geneticist best known for creating Wardley Maps that help visualize the evolution of business capabilities, recently posted on LinkedIn: “Prompts are not commands or rules or instructions. Prompts are wishes. It doesn’t matter how many times you tell it to follow your instructions, they are still wishes.”
I commented, “if prompts are wishes, then LLMs are genies.”
And that analogy holds up surprisingly well. Like the genie who must be summoned by someone rubbing the lamp, today’s large language models have rudimentary persistence, but are still activated by prompts that are either from humans, or recursively through their own goal-seeking chains. And like genies, feedforward LLMs are dormant between prompts; they don’t anticipate. That lack of continuity constrains both their offensive and defensive capabilities.
But what happens when we chain these “wishes” together? We get agents – automata summoned by prompts and assigned tasks.
And that brings us to a metaphor that I hope creates a visual for many: Disney’s The Sorcerer’s Apprentice. Mickey Mouse enchants a broomstick to carry water, and things spiral quickly when the instructions are unclear, and actions are unsupervised. Only now, we’re not just dealing with broomsticks that are carrying water. We’re dealing with AI agents that:
- Can emulate aspects of human intelligence but lack any experiential knowledge or interfaces.
- Are non-deterministic. While they’re software, they don’t behave like traditional software.
- Offer no separation of control and data planes in current architectures. Programmed guardrails can only do so much.
- Have unpredictable alignment creating risk of malicious use.
- Lack predictable judgment creating opportunities for accident and error.
Which brings us to the most important framing: just as human employees are agents of our companies or institutes, agentic AIs are employees when they are on our side, but when they’re acting against us, they’re our adversaries. And by extension, those that act as employees ought to be considered potential insider threats. This framing is explored in depth in two important pieces: Agentic Misalignment: How LLMs Could Be Insider Threats (Anthropic, sourced by Kemba Walden) and Is Agentic AI an Insider Threat? (PwC, co-authored by Rob Joyce). Both raise critical questions we need to keep at the forefront as AI systems gain more autonomy.
The scaling power of agentic AI
Whether acting as adversaries or employees, it’s instructive to examine current threat models and mitigations for both. Where they align, we can borrow and adapt existing concepts. Where they differ, we’ll need entirely new ones.
Here are a few of the most critical parallels where they are the same:
- They can be helpful or malicious, and their intentions aren’t always clear. They possess uncertain ethics and can engage in misdirection or subterfuge
- They can commit errors in judgment, or just get things wrong
- They are susceptible to being fooled, manipulated, or bribed in ways that are not always easily ascertainable
- They are driven by self-preservation
Here are some key distinctions:
- AI doesn’t have human biological constraints, either individually or as it concerns scaling out as a system (e.g. a company, society, or nation)
- Their only real limiting factors are infrastructure and energy
The bad news about those differences is that given sufficient data, compute, and energy, agentic AI can be relentless. And that means defenders will need to match that resource expenditure just to hold equilibrium. The same cat-and-mouse dynamic we’re familiar with in cybersecurity at a whole new scale. A longer-term concern is that these agents may begin to compete with humans – and with each other – for access to critical resources like data, computer infrastructure, and energy.
It’s important not to skip past potential risks. We need to preserve the hard realities of these scenarios if we want to think clearly about what is ahead.
The good news is that we can and should apply many tried-and-true security principles to guide how we use and secure AI agents. They include clearly defined “wishes” in the form of job descriptions, scope, charters, and bounded access to resources.
Manage using least privilege access as an operating model and continue to adopt trust but verify. Or even better, use this as the time to switch over to a complete zero-trust model and verify continuously.
Agentic AIs also open the door to new forms of defensive capability that human teams simply couldn’t scale before. Systems that continuously change and synchronize elements of their environment. This concept is now being referred to as Automated Moving Target Defense (AMTD). AI also enables other previously difficult security practices, such as the continuous revocation of unused access or privilege.
It is important to note though, these new capabilities also become high value targets themselves. If we’re going to depend on AI for security, we’ll have to secure the AI just as rigorously.
Alignment on Alignment
We are all learning in real-time how to balance the opportunities and risks of agentic AI. Every institution and nation will need to devise and revise their methodologies starting today. Here in the United States, the recently released AI Action Plan introduces three pillars: accelerating innovation, building AI infrastructure, and leading in international diplomacy and security. On the surface, these are appealing, but the plan takes a highly deregulatory approach to ensuring that the US leads in what is already shaping up as the most competitive technological gold-rush and arms-race ever, economically, socially, politically, and militarily. We cannot afford to fall behind, but we also cannot afford to fail to proceed safely and securely, embedding all the hard-earned wisdom from decades of growing pains of the Internet.
Race cars have strong brakes so that they can go faster, and only time will tell if our brakes are up to this new race.
A final thought on the challenge we face. Part of the difficulty in managing agentic AI stems from the immature science of measuring it. In information systems and their security, we’ve long benefited from the “bit,” a unit of measure defined by Shannon’s information theory. It gave us a foundation to quantify and reason about something as previously abstract as information.
The question becomes: is it conceivable to develop something similar for AI? A unit of factual correctness? Some synthesis of truthfulness, accuracy, reliability, veracity, and fidelity? Or perhaps even more critically, could we define a measurable unit of alignment?
Until we do, we’ll be navigating this terrain with instinct, metaphor, and patchwork controls. But if we can find a way to measure alignment, even imperfectly, it may become one of the most important levers we have for building secure, reliable, and trustworthy AI systems to leverage not only our strengths in cybersecurity but so much more.
The views represented herein are those of the author(s) and do not necessarily reflect the views of the Aspen Institute, its programs, staff, volunteers, participants, or its trustees.