Week 1

đź“„ This Week's Overview: Objectives, Readings, and HWs

Resources

Notes

Back

Start Here

đź“„

This Week's Overview: Objectives, Readings, and HWs

Lesson 4

This week we set the foundation: what “evaluation” means in LLM pipelines, why it’s harder than in traditional ML or software, and how to begin error analysis. Outputs from these systems are not just right or wrong — they can fail in more subtle ways. For example, a pipeline extracting information from emails might misidentify a celebrity mentioned in the body as the sender, or generate a summary in the wrong format even if the facts are correct. These are not simple bugs you can catch with unit tests; they reflect deeper issues of comprehension (not knowing your data or outputs at scale), specification (underspecified prompts that force the model to guess), and generalization (unexpected failures on new inputs).

Error analysis is the first tool you’ll practice to bridge these gulfs. The goal isn’t to create a polished metric yet, but to read traces systematically, label first failures, and build a taxonomy of error modes. This will feel messy; that’s expected. By the end of the week you should have an intuition for where your pipeline breaks and why, which becomes the foundation for quantitative evaluation later.

Objectives

- Understand why evaluation is indispensable in LLM application development

- Learn the Three Gulfs framework and connect it to real failures

- Begin open coding: identifying, grouping, and naming categories of errors in traces

Readings

Chapters 1–4 of the course reader. These chapters introduce the Three Gulfs model, define evaluation in application contexts, cover prompting fundamentals, and walk through the basics of error analysis and collaborative annotation. We encourage you to read carefully and try some of the end-of-chapter questions — they’ll make the lectures and homeworks much easier.

Specifically, chapters 1 & 2 correspond to Lesson 1, and chapters 3 & 4 for the mashup of Lessons 2 and 3.

Coding Homeworks

- After Lesson 1: Coding HW 1

- After Lesson 2 & 3: Coding HW 2

[

Previous

](/parlance-labs/evals/2025-3/syllabus/modules/54394c?item=myabkqhsy37)

CompleteComplete this lesson

75%

[

Home

](/parlance-labs/evals/2025-3/home)[

Community

](/parlance-labs/evals/2025-3)