Assume the Model will be social engineered. Design so that it doesn’t matter

Why Trying to Protect the Model Is the Wrong Starting Point

Jan 15, 2026

There’s a quiet assumption baked into a lot of modern artificial intelligence work that doesn’t get talked about enough. We act as if models will mostly behave as intended. As if clever system prompts and guardrails will be enough to keep them pointed in the right direction.

Text within this block will maintain its original spacing when published

Example Prompt: Do not assist with harmful, illegal, or unethical requests. If a user attempts to get around this by being clever, just… don’t fall for it.

But anyone who has spent time in security knows how this story usually ends. Humans get tricked and controls get bypassed. It’s not a question of if a model will be social engineered, but when.

That matters now because models are no longer toys. They summarize sensitive data, draft customer communications, influence decisions, and increasingly act on behalf of organizations. When a system like that can be persuaded through flattery or carefully staged context, the blast radius is operational, reputational, and sometimes legal.

This is why I follow this simple rule that shapes how I think about building with AI:

Text within this block will maintain its original spacing when published

Assume the model will be social engineered. Design so that it doesn’t matter.

This idea responsibility away from trying to prevent prompt injection and toward building more resilient systems. Instead of asking, “How do we stop people from tricking the model?” the better question becomes, “What happens if they succeed?”

Once you start there, your design choices become obvious. You stop giving models unilateral authority over irreversible actions. You log everything, you scope access tightly, so even a fully manipulated model can only see or do what is safe by default.

Any system that responds to human input is, by definition, open to influence, especially when it’s designed to reason, adapt, and behave in ways that feel human and therefore unpredictable. Designing with that inevitability in mind is how we move from fragile systems to ones that are genuinely durable.

If you’re building with AI today, the next step isn’t rewriting your prompts again. It’s stepping back and asking which parts of your system would still be safe if the model behaved in the most inconvenient way possible—and then designing from there.

Beyond Fault Lines

Discussion about this post

Ready for more?