One week ago I was learning about the exploding popularity of OpenClaw (previously Clawdbot, Moltbot). Indulging my continuing severe case of Not Invented Here Syndrome, I dug back into some of my earlier personal-bot work.
Prior to the AI revolution I had a telegram bot that ran on a home server. Inspired by Hubot (from GitHub) the idea was to have a chat-based interface to “home stuff” like controlling lights, security cameras, other local software I might run, etc. Just to see if it was interesting.
I really wanted a seemingly “natural” language interface. The entire system operated on trigger words or phrases to fake it. But then computers got a lot smarter. I briefly tried to integrate the early OpenAI API into it, but it was too early. I played with ideas like memory (remember trying to get by with just 4k tokens?), tool calling (models didn’t really know how) and even sub-agents. But it didn’t really come together and I went on to other projects.
But now all these things can clearly be a reality. So let’s build.
Hydra would be my new AI platform. Self-improving and integrating with anything and everything. I should be able to communicate any way I like, and it can find me to let me know what’s going on. Asynchronous operation should be the norm, being driven internally by alarms and schedules, kicking off sub-agents to accomplish longer running tasks. All the bells and whistles.
The core innovation I had in mind was to design the primary agent loop (that’s the LLM conversation) to be entirely event driven. That is, the core loop runs based on being awoken from its electric dreams by:
- Timers
- Incoming communication
- Asynchronous tasks completing
All work will be delegated to asynchronous tasks in the form of sub-agents: Task specific agent loops. It’ll be AI talking to AI. Architectural-AI Jazz, man.
The Beginning
Progress was immediately fast and productive. I was trying to work completely in Codex (GPT-5.3). I was using the Codex Mac App on one side, building bigger features, coming up with designs and reviewing code. I had Codex CLI with tmux in a VM managing my integration workflow:
- Pull updates from Git
- Rebuild and restart the server
- Read the logs and diagnose when it inevitably breaks.
Within the day I had my core loop running. By the next day I had a sub-agent named Builder who was authorized to customize the Hydra install by writing plugins. I immediately had it write up a Telegram integration to take over my old bot’s account.
It was getting pretty late into the evening when I hit this point. I couldn’t wait for it to complete. So I left an instruction:
When you get it working, ping me on Telegram to let me know.
It worked!
One Week Later
I’ve learned a lot about pure-agent Vibe Coding over the past couple of months. I’ve experimented with different points along the spectrum. On one hand you have full control over the code, watching it edit each file, reviewing each line of code. This is what you get if you use a VS Code Agent Plugin or Cursor. A fully AI IDE.
On the other end you just chat with your AI Coder and it ships code.
The latter is working surprisingly well these days. All my progress so far had resulted in me looking at exactly 0 lines of code. What I did do was push the agent really hard to try to improve itself. After making changes, I have it tell me about ways it could have gone better. What sorts of design changes should we consider? The other key is keeping the feedback loop really tight. The AI writes the code, the AI runs the code. You have to be a real stickler for insisting the agent makes the software self-observable. Meaning the AI can observe what the program is doing.
But today I still hit a wall. After a fairly significant change to the core control loop I hit a bug where the server would consume all memory and shut down. I’m actually not 100% sure this was triggered by this change as I had several cases of the server process simply disappearing since the beginning. My Integration agent always assured me it would run the server in a way we could catch the failure the next time. We never did. Until now: OOM (out of memory).
This is not a failure mode I expected from this project. It’s not a high-scale service, or doing anything but processing text and calling APIs. Yet it consumed gigs of RAM in just a few minutes after my latest change.
I had some theories. Codex had some theories. We added logging, retrieved profiling data. Triggered some panics just for fun. Nothing it was telling me sounded right. Codex’s intuition just wasn’t convincing and we weren’t making progress. Ugh, fuck it, fine I’ll look at some code.
Oh boy, this is what I was afraid of. While Codex was having no difficulty adding whatever features I dreamed up, it never really got the architecture I implicitly imagined. It wasn’t necessarily an architecture problem causing the bug, but the architecture made it exceedingly difficult to reason about what this now rather complex system was ever doing.
What had we created? There is a central “Manager” object with two tiers of locks, a half dozen sub-components broken out as attributes with Callbacks to communicate between them, and goroutines (oh, yeah this is all written in Go) spawned from literally everywhere. There were duplicated concepts, similar functionality written in completely different ways. sigh.
Now I’m not saying any individual decision was wrong. But it lacked a cohesive design. In my mind there was a theory about how this code would manifest. It was going to be beautiful. Well, i’m here to report it was not.
But it almost worked.
Epilogue
Look I can’t leave you or me in this place. Disappointed about the revolution. Defeated by complexity. A dream left to die on digital vine.
Here is the thing: An ugly, partially working dump of a codebase is not fatal. Fixing shit like this has been essentially my entire career. This is what happens when you move fast. In the old days it took years to create this kind of mess. Codex and I did it in a couple days.
Here’s the thing, coding with AI also makes fixing it so much faster.
Back to work.