I keep seeing posts in various socials asking some version of "how do I catch weird behavior from my AI agent in production?"
Every time, I wonder the same thing: don't you have a process?
Every business runs on a process, whether or not anyone wrote it down. You move a deal through a sales cycle. You onboard a new hire through a set of steps. You close the books the same way every month. The work has structure. Someone just never drew the map.
So when someone ships one big agent plus tools to do the whole job and then asks how to "observe weird behavior," I read that as a team that never mapped its process. They've asked the agent to figure out what the job even is, not just how to do it. No wonder the output drifts in ways nobody can evaluate.
If you've mapped your process, weird behavior is a contract violation. You know what step 4 should produce, so you can check whether it produced it. If you haven't, weird behavior is a vibes assessment.
Most teams never wrote their process down, and for a long time that was fine. The work lived in people's heads, and there was no agent to hand it to. Writing it all down sounded expensive anyway. Someone has to interview each role, write down each step, and argue about what "good" even means.
Big aha moment: that second part isn't true anymore.
The same big agent that's tempting you to skip the process work can do that work for you. Give it your artifacts. Let it ask you annoying questions about who does what. Have it write down the process you're already running. That's an hour, not a six-month consulting engagement.
Now the big agent stops being your production system. It just built you the map. You drop smaller, targeted agents into each step instead. They run faster. You can actually reason about them. Since each step has a contract, weird behavior shows up as a broken contract instead of a vibe.
So the real choice isn't whether to do the process work. It's when. You can use a big agent once to drag the process into the open, then run smaller agents against it. Or you can keep pointing one big agent at the whole job forever and hope it does the same process every time.
The first is cheaper in the long run. And the step that used to be expensive is now cheap.
If your business has a process, and it does, your AI deployment should look like that process. Not like a generalist trying to do the whole thing.