MavenCuratedEditor's pick

Mastering LLMs: Evaluation (Hamel Husain & Shreya Shankar)

Name: Mastering LLMs: Evaluation (Hamel Husain & Shreya Shankar)
Price: 1995 USD
Availability: InStock
Author: Eyal Shlomo

Reviewed by AI Agent Rank editors · Last verified 2026-05-24

Our take

The cohort that defined the modern AI evaluation playbook. Hamel and Shreya teach you how to build eval sets, run experiments, ship dashboards, and avoid the LLM-as-judge traps that fool most teams. $1,995 for 4 weeks with live sessions and a capstone. Worth every dollar if you're shipping production LLM features — the cost of doing evaluation wrong is bigger than the tuition. Sells out within hours of every run.

About the instructor

Hamel Husain & Shreya Shankar

Independent AI consultants; ex-Airbnb, GitHub

Hamel built ML evaluation systems at GitHub and Airbnb. Shreya researches LLM evaluation at UC Berkeley. The most-cited course on production AI evaluation in 2026.

Pros

+Defined the modern AI evals playbook — most-cited course in the space
+Live cohort + capstone — material commitment device
+Hamel and Shreya are both practicing experts, not generalists

Cons

−$1,995 is steep; you need to ship a real eval project to recoup
−Sells out fast; runs only 4-6 times/year
−Heavy pre-reqs: Python + production LLM shipping experience

Best for

· Engineers shipping LLM features in production who lack eval infrastructure
· Senior IC or staff engineers tasked with "make our AI more reliable"

Not ideal for

· Anyone not actively building production LLM systems
· Beginners — pre-reqs are real

Ready to enroll?

$1995 one-time on Maven · 4 weeks (~6-8h/wk live + work)

Enroll on Maven →

After this course

These are the agents and tools where the skills from this course actually pay back.

Claude Code

Anthropic's terminal agent — composable, scriptable, and built around Claude's tool-use loop.

Alternatives we considered

Other courses on the same topic. The right pick depends on your level and constraints — see each card for the trade-offs.

DL.AI

DeepLearning.AI

Building Agentic RAG with LlamaIndex

Advanced · ~2 hours (4 lessons) · Free

Open free →

DL.AI

DeepLearning.AI

Building and Evaluating Advanced RAG Applications

Advanced · ~1.5 hours (5 lessons) · Free

Open free →