Every few weeks, there’s a new headline: a bad actor tried to coerce an LLM into doing something nefarious, and the model said no. Or the lab caught it after the fact and shut it down. Anthropic, OpenAI, Google — they patch the exploit, tighten the policy, ship the update. The internet breathes a collective sigh of relief, files it under “crisis averted,” and moves on.
It shouldn’t.
Not because the labs are doing a bad job — they’re not. Jailbreak resistance keeps improving. Safety training keeps getting more sophisticated. When someone finds a new way to manipulate a model, the turnaround on a fix is often measured in days, not quarters. By most reasonable measures, AI safety teams are doing exactly what they should be doing, and doing it well.
But your bank having world-class fraud detection doesn’t mean you give your account number to a Nigerian prince.
Keeping the model from behaving badly is one job. Keeping your network safe when it doesn’t is another. Both jobs still have to get done — and only one of them is yours.
The job the labs own
Model providers are responsible for model behavior. Will it refuse the obviously malicious prompt? Will it resist being talked into something it shouldn’t do, three creative reframings deep? Will it recognize when it’s being used as a tool in someone else’s attack chain? That’s hard, important work, and it’s improving fast.
The job you own
None of that tells you anything about what happens after the model behaves exactly as it’s supposed to.
A perfectly-behaved model behind flat network access, standing privilege, and no segmentation is still a five-alarm fire. Not because the AI did something wrong — because everything around it was built to assume nothing would ever go wrong, by anyone, ever.
That’s the part nobody else can patch for you. No lab update fixes:
- An AI agent (or a human, for that matter) with standing access to systems it only needed to touch once
- A network with no segmentation, so one compromised credential is a tour pass to everything
- Endpoints nobody’s actually verifying before they connect
- Privilege that was granted for convenience and never revisited
These aren’t AI problems. They’re the same access control fundamentals that have mattered since long before anyone was worried about prompt injection — they just matter more now, because the thing operating inside your network might be moving faster, working longer hours, and asking for access more persistently than any human ever did.
AI guardrails reduce the odds. They don’t reduce the blast radius.
This is really the crux of it. Better model safety lowers the likelihood that something gets through. It does nothing for what happens if it does. And “if” is doing a lot of work in a world where agentic AI is increasingly allowed to act — not just answer questions, but actually do things, with credentials, with access, with permissions someone granted it.
The labs can’t see your network. They don’t know your segmentation, your privilege model, your identity sprawl, your shadow IT. They were never going to be the ones to fix it, and it was never realistic to expect them to. That’s not a gap in their product. That’s just the boundary of what model-level safety can do.
So what’s actually in your control?
Quite a lot, as it turns out:
- Least privilege, enforced for every identity that touches your network — human, machine, or AI agent
- Network segmentation, so a compromise in one place doesn’t become a compromise everywhere
- Access control that verifies before it trusts — not once at login, but continuously
- Visibility into what’s actually connecting, including the things nobody remembered to ask permission for
None of this is glamorous. None of it makes headlines the way a new jailbreak does. But it’s the difference between “the model behaved well, and so did our network” and “the model behaved well, and our network didn’t matter.”
This isn’t a doom story
The point here isn’t that AI is dangerous and you should panic. It’s the opposite, actually — it’s clarity. The labs are going to keep doing their job, and they’re going to keep getting better at it. That’s genuinely good news, and worth saying out loud.
But waiting for someone else’s guardrails to be your safety net was never a strategy. It was a hope. And hope was never a control.
Nobody’s guardrails are a substitute for yours. Time to make sure yours are actually there.