Week 2

📄 Recorded Lesson: Automated Evaluators

Resources

Notes

📄

Recorded Lesson: Automated Evaluators

Lesson 1

Automated evaluators are essential because humans cannot label every output of an AI system—manual review is too slow, costly, and inconsistent at scale. Instead, automated evaluators let us scale our judgment, providing fast, repeatable measurements of whether a system is meeting quality standards. In this lesson, we cover two main kinds: code-based evaluators, which work well for objective, rule-based checks like parsing or structural validity, and LLM-as-Judge evaluators, which can capture more subjective aspects such as helpfulness or tone. You’ll see how to define clear failure modes, design prompts with precise pass/fail criteria and examples, and validate evaluators against human labels. We’ll also discuss how to correct for bias when using LLM judges, so you can estimate true success rates with confidence intervals. The aim is to give you practical methods for building evaluators that provide trustworthy, scalable insight into your system’s performance.

Slides:

CompleteComplete this lesson

[

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=ger3pdhumf)

[

📄

Lesson 1

Recorded Lesson: Automated Evaluators

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=oumjjqd57c)
- [

📄

Lesson 2

Chapter 1: Introduction

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=ger3pdhumf)
- [

📄

Lesson 3

Chapter 2: Error Analysis Recap

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=1kesnlso95e)
- [

📄

Lesson 4

Chapter 3: Code-based vs LLM-based Evaluators

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=l0d8vp059j)
- [

📄

Lesson 5

Chapter 4: Overview of Creating a LLM-as-Judge Evaluator

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=kwvk6ff9exn)
- [

📄

Lesson 6

Chapter 5: Example Criterion for LLM-as-a-Judge

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=rpbapruasgn)
- [

📄

Lesson 7

Chapter 6: Crafting and Refining the LLM Judge Prompt

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=j51tj9fyzsc)
- [

📄

Lesson 8

Chapter 7: LLM as Judge Coding Demo

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=pt4ba03ydnb)
- [

📄

Lesson 9

Chapter 8: Correcting Bias in LLM-as-Judge Evaluators

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=wtd458l9bub)
- [

📄

Lesson 10

Chapter 9: Pitfalls to Avoid When Building Automated Evaluators

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=4tnvqupsv1g)
- [

📄

Lesson 11

Optional HW

](/parlance-labs/evals/2025-3/syllabus/modules/b51ca3?item=o8wmm10ftsg)