TIL Harness
I was catching up on Ben’s Bites, which had this paragraph.
People are just realizing that “Harness is the new app” but this was very clear early this year when we saw Claude and Gemini play pokemon. We’ve been talking about it since May but it’s gaining even more momentum now. Cursor is talking about improving their harness for Codex models and Poetic used a custom harness Gemini 3 Pro to get the best score on ARC-AGI 2.
I could infer a bit, but I hadn’t heard the term harness for AI agents yet, so I asked ChatGPT. Sharing this because it was such a good explanation and I literally laughed out loud at “The harness is everything that stops it from lighting your house on fire while “optimizing.””
In AI agent land, a harness is basically the rig that keeps the agent useful instead of feral.
More concretely, an agent harness is the surrounding system that wraps, constrains, observes, and directs an AI agent so it can operate safely and predictably in the real world.
Think of it like this
The model is the brain. The agent is the brain plus goals and actions. The harness is everything that stops it from lighting your house on fire while “optimizing.”
What a harness usually includes
- Inputs and outputs
What data the agent can see
How prompts are structured
How responses are parsed or validated
- Tools and permissions
Which tools it can call (APIs, files, browsers, databases)
Guardrails on those tools. Read-only vs write. Rate limits. Sandboxes.
- State and memory
Short-term context
Long-term memory storage
Rules for what gets remembered or forgotten
- Control flow
When the agent is allowed to act
When it must ask for approval
Retry logic, fallbacks, and timeouts
- Constraints and policies
System rules
Safety filters
Business logic
Cost limits. The CFO’s favorite feature.
- Evaluation and monitoring
Logging decisions and tool calls
Scoring outputs
Detecting hallucinations or drift
Alerting when things go sideways
Why people bother with harnesses
Without a harness, an “agent” is just vibes and API calls. With a harness:
You get repeatable behavior
You can test it
You can debug it
You can turn it off quickly, which is underrated
Real-world examples
A coding agent whose harness limits file access and requires tests to pass
A research agent whose harness verifies sources before returning answers
A marketing agent whose harness enforces brand voice and legal rules
A UI automation agent whose harness says “absolutely not” to prod data
Skeptical take
If someone says they “built an AI agent” and can’t explain the harness, they probably built a prompt and a dream.
If you want, I can map this to something concrete like LangChain, OpenAI Assistants, AutoGen, or how you’d design one for a front-end or product workflow.