Most AI agent purchases fail for the same reason: people buy on demo, not on fit. Here are five questions that actually predict success.
Q1: What's the verb?
What single action does the agent perform on your behalf?
- "Writes code" — Cursor Agent, Devin, Claude Code
- "Replies to emails" — Lindy, Martin
- "Schedules meetings" — Lindy, Granola
- "Generates leads" — Clay, Apollo
- "Answers customer tickets" — Sierra, Decagon, Intercom Fin
If you can't say the verb in one sentence, you don't actually know what you're buying. Agents that "do everything" usually do nothing well.
Q2: What's the autonomy level you actually want?
Three tiers:
- Assistant — drafts/suggests, you approve every action. Lowest risk, smallest leverage.
- Semi-autonomous — agent acts on whitelisted scenarios, escalates ambiguous ones. Most common in production.
- Autonomous — agent acts independently with logging. Highest leverage, biggest blast radius if wrong.
Match autonomy to stakes:
- Writing internal docs → assistant or semi-auto is fine
- Sending external emails → semi-auto with escalation
- Modifying production code → only autonomous if you have rollback infrastructure
- Customer-facing decisions → semi-auto with audit trail
Most companies overshoot autonomy because demos show full autonomy. Production usually settles in semi-auto for months.
Q3: How does it fail?
Every agent fails. The question is how — gracefully or catastrophically.
Good failure modes:
- "I'm not sure, asking the human" (graceful)
- "I tried but couldn't complete; here's where I got stuck" (recoverable)
- "I attempted and rolled back" (safe)
Bad failure modes:
- "I confidently did the wrong thing" (silent corruption)
- "I kept retrying for 6 hours and burned $500" (cost runaway)
- "I'm done!" (when it's not)
During trial, deliberately give the agent an ambiguous task. Watch what it does. The handling of uncertainty predicts production behavior.
Q4: Where does it live in your stack?
Three integration patterns:
- Standalone product — separate dashboard you check (Lindy, Lavender)
- In your existing tools — invoked inside the app you use (Notion AI, GitHub Copilot)
- Background daemon — fires on events, you only see results (Devin, autonomous SDR agents)
Pick by where you'd naturally see the work. An inbox agent should live in your email client (not a separate dashboard). A coding agent should live in your editor. A research agent can live anywhere — you visit it deliberately.
Don't buy an agent that adds another browser tab to check daily. You'll forget about it within two weeks.
Q5: What does the math show?
Before buying, calculate:
- Hours/week the task currently takes (be honest)
- Your hourly cost (loaded — salary × 1.5)
- Realistic time saved (50-70% if the agent works well)
- Tool cost
ROI = (hours saved × hourly cost) - tool cost
For most knowledge workers:
- Inbox automation: 5-8 hrs/week × $50/hr = $1000-1600/mo value, $50-200/mo cost → strong ROI
- Coding assistant: 5-10 hrs/week × $80/hr = $1600-3200/mo value, $20-100/mo cost → very strong ROI
- Cold outreach: 10-15 hrs/week × $40/hr (junior SDR) = $1600-2400/mo, $300-500/mo cost → strong ROI
If the math doesn't work at 50% time savings, don't buy. Demos always show 80%+ savings — production rarely matches.
The decision flow
- Define the verb in one sentence (Q1)
- Pick autonomy tier based on stakes (Q2)
- Test failure mode during trial (Q3)
- Verify it lives where you'd naturally see it (Q4)
- Math out ROI at 50% savings (Q5)
- If ✓ all five → buy. If ✗ on any → next agent.
For more agent options see our agents catalog, best AI agents 2026, or filter by category.