Current Agentic Development Workflow(June)

Using AI for developement is changing pretty fast(maybe a lil too fast) and I keep evolving my workflow as it progresses. So I thought I'll start documenting it at certain times, just to reflect and see how it changes with time.

My current development setup:

3 Cursor workspaces 1 for large tasks, 1 for debugging/bugs/etc, 1(in Agent mode) which has all the OSS repos I like to explore and learn from(Pi, Opencode, etc).

Cloud agents fully set up with their envs and prod OTEL, so I can kick off cloud agents via Slack for any issues/debugging/etc

We've built our Littlebird codebase to be pretty agent friendly. An agent can do so much more than just writing code. It can run tests, access my local DB, do the manual testing itself via browsers, run LB agents via CLI, run evals in loops and analyze... The list goes on and on

Usual flow for a M/L size task:

Brainstorm and plan with the agent. Mostly done via ask-user-questions tools and a lot of back and forth. Helps that I understand the codebase well so I know where it's lacking. Sometimes I'll let it implement something, look at the code to understand the tricky parts, extract the learning, throw away that branch and refine my plan. I don't do this more than once(and not too frequently). Use Opus 90% of the time for this.
Let the agent rip. Depending on the task, either Opus 4.8 or, if it's straightforward, gpt 5.5 medium.
While the agent is running, I'll go check up on my cloud agents, debug other small tasks, etc. Again, depends on the task; sometimes I just sit and look at the agents trace and keep nudging it a bit.
Once it's all done, I'll have it test via browser/cli, evals, etc. These are new sessions where I set the expectations with ref to the session where we built it.
If all is done, run /deslop and /thermo-nuclear-pr-review
Make changes
Raise PR, run /ci-watcher

This is overall what it looks like for general dev work. Different amounts of effort go into diff tasks depending on what they do. Sometimes a lot of iteration will go into evals. The flow looks different when I'm building evals or optimizing agents for it instead of just regression checking or something. More on that later...

Current Agentic Development Workflow(June)

Comments

More from this blog

The cost of doing evals has gone down substantially...

K.I.S.S(Keep It Static, Stupid): System prompt ft. caching

Not All Caches Are Equal: Claude, OpenAI, and Gemini

Some notes on Agentic search & Turbopuffer

Command Palette

Comments

More from this blog