Four coding agents, four pricing models, one honest cost comparison
Draft preview. Edit before publishing.
Hi, Michael here.
Four coding agents shipped meaningful updates this month, and the pricing models are now so different that "which is best?" is the wrong question. The right one: what do you actually pay per shipped PR?
This week I ran Devin, Cursor Agent, Cline and Codex CLI through the same exercise: give each one the same three tasks (a refactor, a bug fix with a failing test, a small greenfield feature) and track the all-in cost: subscription plus tokens plus my time reviewing.
The spread was bigger than I expected. Not 2x. Closer to 40x at the extremes.
→ Full breakdown with the per-PR math
The 30-second version
- Devin ($500+/mo). The only one that genuinely runs unattended overnight. Expensive per seat, cheapest per shipped PR if you actually queue work for it.
- Cursor Agent ($20+/mo). Best price-to-capability if you're already living in the editor. The default for most working engineers.
- Cline (OSS). Bring your own API key. Cheapest on paper, but token spend on a real codebase lands around $80 to $150/mo. Worth it for the transparency.
- Codex CLI (OSS). The terminal-native option. Built for refactors, audits and migrations: the jobs you'd otherwise script yourself.
Pick by your actual constraint, not the marketing
Same conclusion as the cold email post a few weeks back: the use-case framing is a trap. All four of these will write code. They differ on how you pay and how much supervision you'll tolerate.
Match the agent to the constraint, not the task:
- Pick Devin if your scarce resource is engineering time and you'd rather pay than babysit.
- Pick Cursor Agent if your budget is under $30/mo and your workflow is IDE-native.
- Pick Cline if you need auditability and model choice (you want to see every prompt).
- Pick Codex CLI if your jobs are one-shot terminal work (migrations, audits, codemods).
The per-PR math (the part that surprised me)
I tracked 12 PRs across the four agents on the same Next.js codebase. Rough numbers:
- Devin: $42/PR all-in. High variance, cheap on the greenfield feature ($18), expensive on the legacy refactor ($91) because it burned cycles re-reading files.
- Cursor Agent: $4/PR amortized. Doesn't ship unattended, so my review time isn't in there. Add ~15 min of supervision per PR.
- Cline: $11/PR in tokens (Sonnet 4.6). Same supervision overhead as Cursor. The transparency is genuinely useful for debugging why it made a choice.
- Codex CLI: $2/PR for the kind of jobs it's good at. Useless for the ones it isn't. Don't try to use it like Devin.
The headline: if you're not queueing overnight work, you cannot make Devin's economics work. And if you are queueing overnight work, nothing else gets close.
One stack I'd actually run this week
For a working engineer shipping daily:
- Cursor Agent ($20/mo). Primary, all in-editor work.
- Cline (OSS, ~$60/mo in tokens). When I want to see the reasoning trace.
- Codex CLI (OSS, ~$15/mo in tokens). Migrations and codemods in the terminal.
~$95/mo total. Skip Devin until you have a backlog of well-specified work that can run while you sleep. That's the only condition where the $500 floor pencils out.
The open-source caveat nobody mentions
"OSS" in this category doesn't mean free. Cline and Codex CLI both let you bring your own key, which is great for transparency and terrible for budgeting if you don't watch it. A single Cline session refactoring a large module can burn $8 to $12 in Sonnet tokens. Set a monthly cap in your Anthropic console before you install either one.
That's the issue. Reply and tell me which one you're running. I'm collecting per-PR numbers from readers for a follow-up.
Michael