Code execution
An agent capability for writing and running code in a sandboxed environment — usually Python — to compute, transform data, or test hypotheses.
Code execution is one of the highest-leverage tools an agent can have. Instead of approximating a calculation in natural language, it writes a small Python script, runs it, and uses the deterministic result.
Coding agents need code execution to run tests and verify their work. Research agents use it for data analysis. Even non-technical agents benefit — a marketing agent that can compute CTR ratios beats one that estimates them in prose.
The sandbox isolation matters: production code-exec uses VMs or containers per session and discards them after.
Where this shows up
Frequently asked
Is code execution safe?+
When sandboxed correctly, yes. Production agents run code in ephemeral containers with no network access by default and tight memory/time limits. The risk is the same as running untrusted code in CI.