Week 3

OCT 22 Optional: Live Office Hours 7 WED 10/229:00 AM—10:00 AM (GMT+5:30) OPTIONAL Recording

Notes

Back

Recording

Optional: Live Office Hours 7

Oct 22, 20259:00 AM - 10:00 AM GMT+5:30

Audio Transcript

Chat Messages

Shreya Shankar

I need to admit everybody.00:00:23

That… would fix my problem.00:00:27

Okay, cool.00:00:33

Happy night in Pacific time, or in the US.00:00:36

Looks like Francesco has a dog.00:00:43

Oh, that's so cute.00:00:48

My dog is here as well.00:00:52

Well done.00:00:55

Francesco Lanciana

Let's see…00:00:57

Shreya Shankar

Say goodbye!00:00:58

Francesco Lanciana

A much bigger dog.00:01:00

Shreya Shankar

Yeah, he's a huge dog. Hold on, I'm trying to not blur myself. He's, like, coming in and out.00:01:02

But… Anyways, I give up on this background blur thing.00:01:09

Francesco Lanciana

He's also given up.00:01:14

Shreya Shankar

Cool! Small group today, but as usual, if people have questions, feel free to raise your hand.00:01:19

And… we'll just take them one by one, otherwise we can just chat and do nothing.00:01:27

joke around if people don't have questions, I guess.00:01:33

ganesh

Awesome.00:01:39

katenesmyelova

I didn't realize my video wasn't on.00:01:41

Can you hear me?00:01:44

Shreya Shankar

All good, yeah.00:01:44

katenesmyelova

Awesome, thank you.00:01:45

Oof, I've been, I've been kind of a little bit, slacking with,00:01:49

the, homework, because I have lots of things to do at work, and… That's okay.00:01:56

Yeah, people have a real…00:02:03

Shreya Shankar

They can't learn about evals their whole, 40 hours a week.00:02:04

katenesmyelova

I would say… I would say that just, like, evals are super important for my job, because I am responsible for driving the engineering with evals at my company, so it's in my best interest to learn them.00:02:09

But I was wondering, so, I had a question about the workshop tomorrow, this optional live coding workshop. Will it be recorded, and, does it make sense to watch it in recording, or if I can't be fully present?00:02:22

For the whole…00:02:40

Shreya Shankar

Oh, so they will be recorded, and the…00:02:42

format is Isaac, one of our TAs. He basically goes through… I think tomorrow's the first one, so it's probably going to be homework one, and he'll go through and do live coding with it, and I think he's really good at also trying to leverage AI tools to help him do the homework, so…00:02:46

It's also very accessible for folks who don't have such coding experience. I would say, if you can't attend the whole thing.00:03:03

Maybe watch the recording, or at least start from the beginning, because it's probably going to be a little bit… if you haven't done the first homeworks, then you'll probably be lost if you join in the middle of it, so I just recommend starting from the beginning.00:03:13

katenesmyelova

I would say I did the first homework. Right now, I'm just a little bit procrastinating doing the fun part of the second one, which is open coding.00:03:28

Shreya Shankar

Fun part.00:03:38

katenesmyelova

Well, again, it's a fun part because I'm learning, but, you know, I need to be in the right state for it. But, yeah, thank you. Thank you so much, yeah.00:03:40

Shreya Shankar

Yeah, of course. No, they're usually pretty fun, so I always like watching them, because it feels like watching a wizard doing a thing.00:03:50

Anyone?00:03:57

katenesmyelova

Fuck.00:03:58

Shreya Shankar

Ganesh.00:03:59

ganesh

Yeah, I'm a vibe coder. I create a lot of AI agents for automating my workflows.00:04:02

One of the things, for example, let's say I do lead scoring for our company, lots of leads come through website and various places.00:04:09

let's say, for example, the scoring is not coming as desired. There is a tendency to go back and fix it by changing the prompt or whatever. Just wanted to know, you know, how do I go about recording the errors?00:04:17

Shreya Shankar

Hmm.00:04:34

Okay, so this is also something Hamel actually has a good answer for that I'd steal, is you don't always have to do error analysis, and you don't always have to use evals. It's not a good fit for applications where you are the only consumer, and it's, like, a 1 or 2 or 3 time use. Like, if you know that the lifetime of this application is limited.00:04:36

Like, you're only gonna run it 2 or 3 more times, only you're gonna use it, then, yeah, fix the prompt, like, hack away to get to the artifact that you need. I think evals only becomes important when you kind of know that you're gonna continue using that for a while, or somebody else is getting onboarded.00:04:59

Onto… or, like, they're gonna help you maintain this…00:05:16

workflow or whatever you've offered. And then you want to start doing error analysis, kind of like we do in the course.00:05:19

ganesh

Got it. I have a lot of, you know, agents which are recurring. For example, let's say, you know, we have a chief of staff, note-taker, and he takes the actions, and then converts that into plan, etc.00:05:28

Now, I do get a lot of, you know, errors, right? Sometimes, for example, when people request for a meeting.00:05:41

In a way which is not the way in which I've trained the model.00:05:49

Now, there is always a tendency to go back and fix it. And let's say for this scenario, it is done.00:05:54

Right? But something else comes up.00:06:01

Now, you know, what is…00:06:03

Shreya Shankar

I would say if you feel like you're playing the whack-a-mole game, and you want to stop.00:06:06

then you have to start with error analysis. Like, you have to do evals from week one, run some trace… get some traces, do error analysis, find failure modes, try to, like, systematically00:06:12

or at least systematize your process, is what I would say. But if it's, like, you know, an AI agent note-taker, and it makes errors, but you don't really care because you just read them and then correct them in your head and move on, or whatever.00:06:23

And I think you use your own judgment, kind of, on that. It's hard to say, right? Like…00:06:38

What the stakes are for the thing that you're building. And you only have finite time, so you can't, like, be evaluating everything that you're building.00:06:43

katenesmyelova

Technically, Shreya, would it be correct if I sum up, if you need repeatability in checking for errors, a very simple example would be the same approach that you would apply for regression tests and regular software. In this case, it makes sense to invest in evals.00:06:54

Because it will make sure that you didn't break something, or you're kind of improving something.00:07:12

Shreya Shankar

Yeah, yeah.00:07:17

katenesmyelova

Thank you.00:07:19

Shreya Shankar

So that's SRP.00:07:22

Oh, you're muted.00:07:28

srp

Sorry about that. So, my question is again on the class, schedule. So, the lab sessions, live coding workshop is starting, tonight my time. It is about 3.30 in the morning. So, are these going to be, different,00:07:32

problems being taken up in each, or is there some continuity? I just wanted to make sure that I'm not missing something, but time-wise, some of the sessions I will have to miss.00:07:49

Shreya Shankar

Yeah, they're supposed to be starting with the first homework, and then just continuing to move to the homework in order. Now, if a bunch of people show up, and it turns out that people are interested in jumping to another homework, or… I don't know what's going to happen. I think Isaac is going to make a judgment call.00:08:00

When folks show up. But our plan is at least to do live coding for as much of the homeworks as we can without losing everybody.00:08:19

katenesmyelova

Perfect.00:08:27

srp

Okay, I think then I will have to use the Discord to communicate with Isaac and get what I need out of it. Okay, got it. Thank you.00:08:28

Shreya Shankar

And you'll be able to watch them, like, they will all be recorded, and they will put on… be put on…00:08:36

srp

Understood. Yeah. Thank you.00:08:42

Shreya Shankar

Francesco.00:08:46

Francesco Lanciana

yeah, I was…00:08:48

I remember, so last week, I asked, you know, is, human in the Loop, like, a special case?00:08:51

Of, you know, how you would kind of, like, would you have to change how you analyze anything, or whatever else? And the answer was no, that's actually, like, a super common case. Which makes sense. I think in, like, Langraph and stuff, there usually will be multiple traces, like, if it is that back and forth.00:08:59

Are you, like…00:09:19

And I know we're kind of supposed to take it as… you kind of go through a trace until you find the first thing that's wrong, because anything after that might be dependent on that first thing being wrong, so you just want to fix that first.00:09:20

Is that… you're still kind of, like, it's not like you're analyzing each trace? Like, if it was a multi-turn thing, you're kind of still starting from the very start and working your way through there, or are they, like, a little bit more independent?00:09:36

Shreya Shankar

Yeah, so it depends on the provider that you're using, so I think if what you're saying is, like, depending on the provider, you might end up logging, after every turn, the entire prefix of messages as well, so there's a lot of duplication in the messages across traces. I would say in those cases, try to log a session ID,00:09:49

And then just take the trace for each unique session ID, but take the last trace, like the trace with the greatest timestamp.00:10:08

So, all you're doing is just looking at the sequence of messages until, like, the user, like, dropped off for that session, or, like, for that chat, or whatever.00:10:17

And then look until, like, first failure.00:10:27

And, you know, that's a heuristic that we say, and, you're gonna get very good at error analysis, and you're gonna realize it's more efficient for you later on, as you've developed this muscle, to, like, note down multiple different points of failure, because you just know that they're independent and isolated incidents.00:10:31

Francesco Lanciana

And you should do that as well. Like, basically.00:10:48

Shreya Shankar

Start out with the simple recipe.00:10:51

Be rigid, like, build your skill, and then, like, error analysis, like, you'll be a machine at it later.00:10:53

Francesco Lanciana

Yeah, okay, that makes a lot of sense. I just had one more… this is a little bit more open-ended. I just started reading, I think it was Chapter 8, on.00:11:00

Shreya Shankar

Remind me, sorry, which one is this? What's the title?00:11:11

Francesco Lanciana

Like, starting… so specific architectures and data modalities, which I think is mainly getting into agents? I could be wrong.00:11:14

Shreya Shankar

Well, there's architecture data modalities, and we, like, talk about long documents, we'll talk about, images briefly, and then I do think there's a little bit of agents, but yeah.00:11:22

Francesco Lanciana

Yeah, okay, cool. With… with agents, is it… is it still fundamentally the same thing, where you have a trace, and it's… you're just, like, it's… you'll have tool calls and… and things, and like, you know, but you're still kind of fundamentally doing the same exercise of, like, classifying failure modes, but the failure mode might be, like.00:11:33

duffed out this door, or, you know.00:11:51

Shreya Shankar

Yes. Yeah, so you'll actually notice on the week one lectures, we do error analysis on an agent for this reason, because first we had a lecture where we did an error analysis on a more rigid workflow, and then everybody asked us a question, oh, but what about agents? And then no matter how much we try to tell people that the process doesn't change when you get to agents, people are so lost.00:11:53

So, now we're just starting with agents.00:12:16

And so folks don't have the question, oh, but what about agents? Because agents is not different, right? You do the same error analysis process. But sometimes I find that, like.00:12:18

if your agent is more agentic, like, it has more tools that it's allowed to call, and you don't have a good conceptual model for, kind of, what it does at what points in the workflow, like, in some workflows, for example, in text-to-SQL, or,00:12:30

agentic retrieval, I have a very good00:12:47

model in my head of, like, okay, first it's gonna try to synthesize a query, run the SQL query, look at the result, re… I, like, kind of know what's gonna happen.00:12:51

So, I can reason about these steps in my head as I do error analysis, like, look to make sure it matches my expectations. Now, other agents, say I was building ChatGBT, I have absolutely no idea what tools it's gonna call, where it's gonna call the tools, whatnot, like, it's wild, wild west, deep end, and, like, agents.00:13:01

in those settings, I find the heat map tool that we talk about in Chapter 8, I think, what you mentioned, very useful, just to understand, like, what are transitions… what are just common patterns of steps that our… my agent is taking?00:13:21

And that's kind of why we talk about this failure heat map as a particular tool you can use. It's not always applicable, but if you find yourself lost in, like, how to do error analysis, or what your agent is doing, then you should00:13:38

use it.00:13:52

Francesco Lanciana

Yeah, okay, that makes sense. And I just brought up one last follow-up, is just,00:13:54

when you're doing… so I'm making an agent at the moment. Do you usually… because you can kind of give a description to each tool and, like, the system prompt? Like, will you find that, like, you're giving a pretty lengthy…00:13:59

prompt, you know, for each tool, on, like, exactly, you know, like, goes into, like, a lot of detail, and then the system prompt might be, like, a little bit shorter, just being like, hey, you kind of have these tools at a high level, that's all you need to know, or is it…00:14:12

you know, yeah, I guess, yeah, what's your experience with that?00:14:28

Shreya Shankar

Yeah, that's what I actually try to do now. I try to make my system tool description very short.00:14:31

Oh, excuse me, sorry, I'm coming off of a cold.00:14:39

And then the tool description and the tools longer, as you said. I try not to have both of them be long, because then I just increase the surface area where I'm conflicting with myself.00:14:46

Francesco Lanciana

Yeah, okay, that's very helpful. Thank you.00:14:57

Shreya Shankar

Pretty?00:15:06

Pardeep

Hey, Shreya, how are you? I think we have probably discussed this in the past,00:15:07

It may not be the proper eval question, could be an architectural question, so…00:15:14

you know, I know somebody who's trying to work on building, like, these financial agents, where they want to make sure the accuracy of responses is significantly higher, like, because you don't want to00:15:22

Give errors when you're talking about, you know, financial things.00:15:35

Now, I think the, one way to think about this is, oh, well, do this, you know, error coding and open coding to see where the errors are.00:15:38

But I feel like the… I think it's a systems problem, and a data problem to start with. For example, if you cannot write a SQL query on your data with the correct output, then probably it's not your… your agent is not going to be able to figure it out, right?00:15:46

So I think the, what is the advice here? Which is, you know, I think there were… there is one thread pulling towards, oh, we should implement something like, you know, GraphRag, which is, you know…00:16:02

relationship between these multiple entities, and it will actually be able to ultimately be able to figure it out. But, you know, there is another chain of thought, which is, you know, me, in a way. I was like, well, if you have a data ingestion, you should have a very clean database, and then expose it as a tool, and so you now have00:16:13

very reliable way of querying it, and you can restrict the queries you want. See, like, this is, like, traditional way of00:16:30

thinking about it, but then there's another pull towards, no, no, no, AI.00:16:37

stuff. So, I think in this, the whole way of thinking, what is the design pattern in general, when solving these type of problems?00:16:41

Shreya Shankar

Am I understanding your question correctly in that, like, how do you express common data processing queries if not all of them can be expressed in SQL? Is that your question?00:16:51

Pardeep

Yeah, sure, fair enough.00:17:03

Shreya Shankar

I don't know, like, you tell me if I'm understanding the question incorrectly.00:17:06

Pardeep

I think the problem, ultimately, is how do I get my agent to respond correctly, right? One chain of thought is give data to the AI agent, build, like, a vector database, graph vector database, and they will have complex relationship already managed.00:17:10

We can query, and it will give us good, good response.00:17:30

The other.00:17:33

Shreya Shankar

Oh, okay.00:17:34

Pardeep

Well, you define your data models very cleanly, so you.00:17:35

Shreya Shankar

I see, like, put it in a relational database, and then, like, do text-to-SQL.00:17:38

Pardeep

into a traditional database. You're like, this is architectural…00:17:41

Shreya Shankar

I see, I see what you're saying.00:17:45

Okay, so I think what I've usually seen in production, everything evolves to some hybrid architecture, where you're taking the union of results from, like, semantic search, as well as some SQL query, or, like, a keyword search, or BM25 baseline,00:17:47

And it just… it ends up getting to this point, because if your application is successful, you want to, like, support all of these kinds of queries, and not all of them can be expressed in every single data, like, query language, and they don't all have, like, the same data model.00:18:07

So you just end up, like, unifying everything. But the order that I, like, try to follow to, like, get to this state of success is first try, like, a basic keyword search baseline with an agent that, you know, just00:18:22

tries to come up with keywords to search against, like, a VM25 or, like, keyword retriever. Sometimes, not all cases, do I use, like, a semantic similarity engine.00:18:36

I don't know, I find that, like, people who start off with a semantic similarity end up going to keyword, and then…00:18:48

Like, it's fine that it's better for some reason, and there's enough, like, research to back that up.00:18:56

For GraphRag, I would say, really, like, I… a lot of people use GraphRag when they don't need to. Hamel has a great talk link for, like, you shouldn't use GraphRag. I don't remember what the name of that talk is. Oh, Hamel's here.00:19:01

Hey, Hamel.00:19:16

Hamel Husain

Put it in the chat, sorry for being…00:19:17

Shreya Shankar

You know, you know what talk I'm talking about?00:19:19

Hamel Husain

Yeah, yeah, I'll get it.00:19:21

Shreya Shankar

Yeah, like, you don't need a graph rag. I'm, like, grossly misremembering the title.00:19:22

Hamel Husain

It's like you don't need a graph, database, I'll get it.00:19:28

Shreya Shankar

Yeah, totally.00:19:31

katenesmyelova

That would be lovely.00:19:32

Pardeep

Yeah, I mean, as soon as I hear something like this, you know, it's super complex in my mind, which is like.00:19:34

Shreya Shankar

Yeah, just don't go there unless you, like… it's really… you're so convinced you've tried other things, they just don't work for you. You have a good reason for doing it. Yeah, but I'd say, like, start simple, build up your complexity, and don't be afraid of just, like, having all of the…00:19:38

Retrievals there, because that's what people end up getting to.00:19:55

katenesmyelova

Yeah, because unfortunately, I think we are all in a very fast-moving community with technologies evolving, and what's working right now might not be working tomorrow, and there's over-engineering, just like…00:20:00

It cost him much.00:20:13

Hamel Husain

You should always be skeptical of complexity, in general.00:20:18

Pardeep

Yeah, yeah, yeah.00:20:21

Anything that, you know, I cannot understand very, very clearly, I just leave it right there.00:20:23

The bar is very, very low.00:20:30

Cool. Thanks, folks.00:20:33

Vikas Pratap Singh

Hey, hello. So, my question is on guardrails.00:20:44

And, in the use case I'm working on, I'm just trying to…00:20:49

I will use the word evaluate. Evaluate the guardrails,00:20:55

And one of the things which I really want to understand is how much latency and overhead00:21:00

these guardrails are adding, right? So, in my use case, there are some, like, schema enforcement, PI detection, like, I have to set those, as kind of rules, and then there are some subtotals that have to tally, and also the taxes cannot be,00:21:06

there are certain types of taxes which should not be missing, right? So those are some rules that I'm adding. So…00:21:27

My question is, like, is there… is there any specific strategy that I have to follow to kind of evaluate the guardrails here in this case? Because00:21:34

Cardinals will be, at the inference time, right? Anytime I'm trying to interact with model. And, another thing which I observe is when I change these models, like, I go from a, let's a 7 billion to a 32 billion,00:21:46

Some of the guardrails are, like, well managed by a 32 billion, then 7 billion, so are there any techniques that you have come across which we can enforce further on the 7 billion to make sure the response is still consistent?00:22:02

Shreya Shankar

Yeah, so, first question is, are you using, like, some black box API for your guardrail, or are you actually able to control, like, serve the model and, like, control the inference engine there? Because I found that, like, with Azure guardrails.00:22:18

it's really hard to measure or, like, model the latency, like, the tail latencies for these guardrails are just so high, and I don't know, I've, like, given up on any hope of, like, understanding what the latencies of these00:22:32

APIs that kind of control. But if you're serving the model.00:22:45

or you have, like, control over this, then I recommend trying to fine-tune your own guardrails, because then you can make them smaller, you can… like, I've gone down to 3B pretty successfully, and, like, you can…00:22:50

So you can, like, try to generate a bunch of training data00:23:03

using the guardrail API. Maybe I shouldn't say, I don't know if it's legal or not. And then train your own.00:23:07

But yeah, I don't know, kind of, what your setup is there.00:23:15

Vikas Pratap Singh

The setup… setup is, I have… I have two models hosted. One is a 7 billion vision model, and the second one is a 32 billion vision model. Both are, from Quinn family, and00:23:18

I hosted them, in my, kind of the AWS account, so…00:23:30

katenesmyelova

Are you using AWS guardrails, or are you using your role model as guardrails?00:23:36

Vikas Pratap Singh

I'm using my own, like, like, right now I have a two-pass, strategy where00:23:42

I first, ask model to give, give the response, and then the second pass, I'm applying all the rules to see if the rules fails, then I will not kind of accept the output.00:23:48

And, kind of just notify the user that there was PIA data detected, or let's say,00:24:01

If the user is giving me00:24:09

an invoice where the taxes were missing, right? So those… those are, like, strict rules that, if failed, then the output should not be sent back to the user. So I'm doing it as a two-part strategy, because if I add these as rules in my initial prompt, the inferencing time is going00:24:11

From, like, 40 seconds to straight away, like, more than 2 minutes.00:24:29

Shreya Shankar

Oh, that's… Weird.00:24:34

katenesmyelova

That's weird.00:24:37

Vikas Pratap Singh

And honestly, I'm not able to find, to be honest, the…00:24:39

root cause behind this, inference issue, like, why if I create, like, a…00:24:42

Like, a prompt with all these rules?00:24:48

somehow the model is taking a lot of time. Both the 7 billion.00:24:50

Shreya Shankar

Are you serving?00:24:54

Vikas Pratap Singh

do you believe.00:24:55

Shreya Shankar

Sorry, maybe I'm misunderstanding. So, the guardrails are separate from the model, right? So, like, you have a $32 billion guardrail, as well as…00:24:56

like…00:25:05

a 32 billion model that you're serving, or you're using, like, an API for the main model, and then the 32B model is for guardrail?00:25:06

Vikas Pratap Singh

No, no, no, both are… both are through API, like a wrapper API, written in… like, I'm using FastAPI to,00:25:14

So, the slash, chat completion, which is, being, given by that, like, VLN serve, so I'm kind of wrapping it, in a, in a fast API endpoint, and then calling that an endpoint, for both,00:25:23

Getting the first response, which is extract.00:25:38

the JSON from the invoice, and in the second pass, I'm kind of enforcing these rules on the response that I've received from first pass. That's how I'm doing it right now.00:25:42

Shreya Shankar

Oh, so it's the same model that you're just, like, reusing as a guardrail.00:25:52

Vikas Pratap Singh

Yes, yes.00:25:56

Shreya Shankar

I see, I see.00:25:57

Yeah, okay,00:25:59

So, something might be up with your VLLM setup. Like, there's a lot of tuning that you need to do to, like, optimize VLLM.00:26:02

They are known for being able to execute things in…00:26:09

batch, so yeah, so I would just, like, check into that, make sure you're doing something correctly there. I'm not actually surprised that if you're, like…00:26:14

Vikas Pratap Singh

Dumpling the size of your prompt.00:26:21

Shreya, I think somehow I'm, like, feeling that the bottleneck is the instant I'm using. I'm using, G5.00:26:24

XLAS, which has a cap of 24, GB, so the 7 billion, model, I think with all the KV cash and, like, there is hardly any…00:26:32

Shreya Shankar

Yeah, that makes sense.00:26:43

Vikas Pratap Singh

Yeah, the GPU is going almost 98-99%, so… so batching is also not working as expected, so… so yeah, like, I've requested for another GPU instance, which will have00:26:44

more GPU, to see if really the GPU is the problem, but I'm still waiting for some approvals to go through, so maybe I will test against00:26:55

deploy the model on that instance and see if I still have these problems.00:27:03

Shreya Shankar

Yeah, I would say, if you… oh.00:27:08

katenesmyelova

Sorry, no, you go.00:27:11

Shreya Shankar

No, like, if you already know your GPU utilization is high, like, it sounds like your GPU is not big enough for the 32B model, just create two instances of it, of the 7B, or, like, try to use a smaller model. Like, I think you probably already know how to debug this, but it's… it sounds like you're, like, in a tough spot.00:27:13

Not having enough memory for what you need.00:27:32

Hamel Husain

And don't hit it Don't shy away from, for guardrails, don't shy away from classic machine learning.00:27:36

Because a lot of times, guardrails…00:27:42

You know, you can have a linear model00:27:46

Yeah, I know it's, like, really00:27:49

Like, have, like, a, whatever, like a…00:27:51

you know, ElasticNet, or whatever, scikit-learn.00:27:54

But… It'll work really well. It'll be super fast.00:27:57

katenesmyelova

Right.00:28:02

Shreya Shankar

And I really say try… train your own guardrails, like, have small guardrails, really, like, don't try to use 32B models for guardrails, because, like, that's gonna waste all your resources, unfortunately.00:28:02

Vikas Pratap Singh

Okay, I think, yeah, that is something I can.00:28:13

katenesmyelova

Have you tried bedrock guardrails?00:28:16

Vikas Pratap Singh

No, I haven't.00:28:19

katenesmyelova

They are pretty good, because right now we are experimenting. Of course, our staff is pretty small, but we are just starting in the… I'm just, like, looking at the latency, it's, like, 300 milliseconds.00:28:21

guardrail processing latency, I have, like, around,00:28:38

Well, a bunch of stuff set up, and it's pretty fast, so…00:28:43

maybe just divorcing guardrails from your main LLM would help.00:28:48

Vikas Pratap Singh

Sure, I will check that as well. I…00:28:54

I know there were some reasons of not using some of the00:28:56

like, AWS stuff, because we were initially using Nova Pro as well, but then it was decided not to use because of some cost constraints, but I will definitely check that and see if… if latency is the main concern, then obviously, then we have to pay the cost.00:29:00

katenesmyelova

Again, I'm experimenting with bedrock guardrails so far, it's okay, we are able to modify them via Terraform, so everything, all the changes are documented, they are going through the PR, so it's…00:29:17

Not a brainer.00:29:30

Vikas Pratap Singh

And Hamil, just a quick follow-up. So, if I understood what you mentioned, you are saying if the output…00:29:33

from the first, call, do not, like, do not pass it again to the model, and you're saying write something, using Cyculent that can still give you, like, the same capability? Is that what he was trying to say?00:29:39

Hamel Husain

Yeah, what I'm trying to say is, like, look, Guardrail is a classifier that says, like, is it pass or fail?00:29:55

And, usually, like, Your… the failure you're trying to detect, a lot of times, is fairly narrow.00:30:01

And sometimes, very simple models are really good at that.00:30:09

You know, and if, like, a classic ML doesn't work, so, like, modern BERT is really good, or BERT variants, or modern versions of BERT. There's even a model called Model… or Modern BERT.00:30:14

Which a bunch of my friends worked on.00:30:25

Which is, like, very small.00:30:27

And… but it's very performant. It's like the workhorse, I would say, of a lot of ML.00:30:31

And you can fine-tune that.00:30:36

you know, On your classification task.00:30:39

And it'll run… Very fast.00:30:43

Vikas Pratap Singh

Garden.00:30:47

Hamel Husain

So, you know, it's, like, more… but you might not even need Modern BERT, you might… you… you know, depending on what the task is, you might be able to get away with just, like.00:30:48

A linear classifier, or something, like, really simple.00:30:58

Vikas Pratap Singh

Yeah. The only reason I… I thought of00:31:04

using the same model as guardrail as well, because there are…00:31:07

Like, many rules, like, so many rules that, that I need to… kind of,00:31:12

News, so… so that was, like, my rational behind,00:31:19

Like, kind of using, the model itself to enforce the cartrid as well.00:31:23

But I can definitely look into the possibility of the linear classifier and see if that helps.00:31:29

Hamel Husain

So you can split it up, you can have these different guardrails be different.00:31:34

be totally separate, and you can run them in parallel, you know? So they can all run concurrently.00:31:38

See, and then… Yeah, you can see which guardrails fire.00:31:44

Which could be useful to you, by the way, like, if you need to do retries, you know exactly what failed.00:31:49

Vikas Pratap Singh

Yeah, so that's… yeah, that's another piece to it. In some cases, when there are some, like, pipers, and we know that model is00:31:56

Especially in, in, like, those 64 characters, it's kind of getting confused between00:32:05

like, a B and a H, so for that… that particular field, we are00:32:12

asking the model again to kind of go back and check, and the second part, it's giving us the correct value. I don't know why the model is doing this, and the first part, it doesn't give us, even though we have00:32:17

put it in the prompt also, but in the second pass, when we just ask for that field, it'll go and give us the correct value. And this is the behavior of the Quinn 32 billion model, the vision model that I'm using.00:32:27

Yeah, I will… I will look into the options that you have shared. Thank you, Amal, for, for answering the question, yeah.00:32:43

padma chandra

Well, I can go next, since everybody's silent.00:33:00

Shreya Shankar

Yeah, sorry, don't do it.00:33:03

padma chandra

Yeah, no worries. I just wanted to say thank you for, you know, that I, I went through your RAG evaluator metrics, and you used traditional metrics such as precision and recall, as00:33:05

you know, benchmark. And, I've been approaching my team members to use traditional metrics, but, they've been throwing… few nerdy AI engineers have been throwing terms like blue, rogue, we should go with these.00:33:21

Instead of precision and recall, and I'm not familiar… too familiar with, these two metrics. I've heard about them, but I was just wondering that when, under what situations or circumstances00:33:37

Would blue and rogue make sense, and are more useful than precision and recall?00:33:50

I mean, I…00:33:56

Shreya Shankar

They're very good for benchmarks. They're good for, like, if you work at OpenAI, I mean, I don't know where you work, so I'm not gonna make assumptions on it, but if there's already a well-established benchmark out there for a particular task, like translation, for, like, summarization, like, and you have00:33:57

reference summaries and, like, this is the only good summary there, then you want to use blue and rouge scores, because, like, that is what ML researchers used to, like.00:34:15

improve on these, like, existing benchmarks that are out there in the academic literature. I find, like, very little to no mapping between these metrics and, like, actually, if you're building an application, and trying to, like, embed that in a product and have some sort of business value. Maybe that's just my, like, very hot take.00:34:27

But I will say, when writing the course reader, I had these metrics, because I wanted to, like, even, like, just mention the difference, and then Hamel said, take it out, it's just gonna confuse everybody. Everybody focuses on the wrong thing, so maybe I'll give it to him to…00:34:46

padma chandra

I mean, I agree completely. It just complicates stuff, and when you can't get things done with precision and recall, you don't want to go with.00:35:04

Shreya Shankar

Yeah, that's not a business… there's no business mapping in my mind, but…00:35:11

Hamel Husain

Yeah, to answer your question, like, about the blue and the rouge, like, is it… When it's almost never…00:35:16

the right… it's never the right metric, like, unless… like, it's… this… it would be very strange if it was the right metric for you. You would have to be training models, like, foundation models, and no one in this course is doing that.00:35:24

So… Maybe somebody is, I don't want to, like… Yeah, maybe somebody is.00:35:39

But I don't think so. That's like an edge case, it's like a really rare edge case that, like, whatever. You know, if you're building a product, you certainly don't need that.00:35:44

Unless you're really sure that, like, string similarity is a good…00:35:55

No, it's not. But even then, like, if someone says, like, rouge.00:36:02

It just triggers me. Because,00:36:07

So there is, like, some… you know, there are some rag…00:36:12

off-the-shelf RAG evaluation frameworks. And if you open this, like, off-the-shelf RAG evaluation frameworks,00:36:16

They have, like, this platter of metrics they'll throw at you, Rouge.00:36:25

Blue, whatever.00:36:32

And it just doesn't make any sense at all, and it just… it's just like, oh, look at my metrics, like, I have my metrics, show and tell.00:36:34

But it doesn't do anything.00:36:43

So, if… Somebody is approaching you and saying, we want to use Blue and Rouge?00:36:45

you know, I would say there probably aren't…00:36:52

they may not understand what the metrics mean at all. They probably don't, because when you look at what they measure.00:36:55

doesn't really… it's… I would be hard-pressed to even imagine a situation00:37:03

from a product perspective, where it would make sense to use it. Like, that's how… and it's actually a negative sign. Like, if that comes up, someone says, okay, blue and rouge, it means, like, oh, we have a lot of work to do to understand retrieval here.00:37:11

like, let's go back to the basics. So I would… I would, like, kind of say, okay, like, be very confident, like, if someone's coming to you with a blue and root, you need to say, oh my goodness, we need to… we need to talk about retrieval. We need to, like… let's… let's have a lesson on retrieval. What does it mean? What are we trying to do? Stuff like that.00:37:25

padma chandra

Yeah, agreed.00:37:44

Shreya Shankar

Don't be afraid to tell them that, like, hey, these are not, like, correlated with the product metrics, like, can you measure product… like, who cares what they… if they want to, like, slack each other the blue score, that's fine, I guess, but…00:37:45

When they, like, reported to the broader team. Yeah.00:37:57

padma chandra

I mean, I completely agree. I mean, none in the leadership recognize these metrics. They're more familiar with precision, which is more explainable. I mean, that's what we want eventually. So, one more question is, you mentioned a little bit about, tool calling.00:38:01

And, I wanted to…00:38:15

ask you… I mean, I know you touched a little bit on the evaluation side. I mean, apart from that, would you have any links, or would you like to… if you could share any additional information that could…00:38:18

Help us with, more deeper evaluation, especially related to tool calling.00:38:32

That would be helpful.00:38:39

Shreya Shankar

I don't know why I'm blanking.00:38:45

padma chandra

It's, it's like one of your complex RAG applications.00:38:49

Shreya Shankar

Yeah.00:38:52

padma chandra

Yeah.00:38:53

Shreya Shankar

Yeah, I don't know…00:38:54

other articles immediately that come to mind? Well, I know Hamel has, like, a RAG course reader…00:38:56

not a reader, like a series of lecture notes on Wrike that could be useful.00:39:03

We look for that.00:39:08

Hamel Husain

So when you say tool calling, are we talking about retrieval, or are we talking about just general…00:39:09

Shreya Shankar

Two of them, gentlemen.00:39:14

padma chandra

Yeah, you use tool calling either to access, let's say, data, or query vector DB, or… I know I shouldn't be mentioning GraphDB here, but GraphDB as well. So, and use that data to, you know,00:39:15

for… feed the data to LLM and answer… to answer questions.00:39:33

Shreya Shankar

Check out the link that I sent for Hamel's rag handbook thing.00:39:43

So just, like, other… reading resource.00:39:48

padma chandra

Okay, thank you.00:39:54

Hamel Husain

Yeah, I'm not sure that's different, really, I mean, because, like, the tool call, like, you know, if you have a tool call that is, like, retrieving data, then it's a retrieval step.00:39:57

And that's…00:40:07

like, precision, recall, all that stuff is the same thing. Like, you have to debug that, like, is the tool, what is it retrieving? Okay, like, how good is the retrieval? And that analysis is the same, regardless of, like…00:40:10

You know, if it's wrapped in a tool.00:40:23

Everything is a tool, right? Like, retrieval… the retrieval itself is always a tool.00:40:26

whether you call it a tool or it's not, like, some code is being invoked to run… to do the retrieval. Whether or not the LLM is doing it, or you are doing it programmatically, or whatever, like, it doesn't matter, is the same thing. So I would say… I would almost say it's not different.00:40:31

The thing, the resource that… Shreya showed…00:40:48

We'll talk a little bit about…00:40:54

like, different… it has a little bit about evals of retrieval, like, some additional things to think about, but it's mostly, like.00:40:57

more advanced rag… More advanced retrieval?00:41:07

Like, how to think about it beyond just vector search.00:41:12

Like, a naive vector search?00:41:16

I mean, we can get into what that thing is, but… yeah, I don't think… I don't know if that thing is…00:41:21

Gonna answer your question, like, directly, which is, hey, like.00:41:27

Just focus on what you know, it's the same thing.00:41:32

padma chandra

Got it. Thank you.00:41:35

Hamel Husain

Sri.00:41:41

shree

Hi. This one is about, Rouge. So we have, an endpoint, where the task is to summarize, into different formats. So, given a text-in paragraph, summarize it into a bullet point or key value.00:41:43

So, I was trying to evaluate different models, how they perform for this particular task. And I asked LLM to give me some metrics to see how good they are.00:42:01

So one of the metrics the ChatGPT suggested was, ROG, ROGEL, to identify the amount of overlap, between the words. It's a very crude way to,00:42:15

to say how good of a summary, that is, but it was a metric, nevertheless. And then there was the cosine similarity, so even though the words did not overlap, that much, the summary had the same semantic meaning.00:42:30

So, that was a little bit better than the ROGEL metric. So, in this context, would using a ROGEL, metric be bad?00:42:50

Shreya Shankar

My question for you is, like, is there a single ground truth summary? So, like, the reference… you cannot compute the metric without the reference summary. So, is that reference summary that you have in your dataset the only reference summary? Like, there aren't other summaries that could be equally good.00:43:02

shree

I mean, I… I don't know if I have a ground summary, I just have… one of the inputs is the summary, and the output is the summary in key-value format, or the bullet point format.00:43:21

Shreya Shankar

okay, so it's just, like, bulletizing the summary.00:43:36

shree

Yeah.00:43:43

Shreya Shankar

There's a test.00:43:43

I see.00:43:44

I think it… I think it's hard to tell, because it… so there's two things. One is if you just do keyword rouge, right, it…00:43:48

Maybe that's good enough, because you're chucking that you're…00:43:56

output has all the keywords that are in the input. Like, it's… it's not actually…00:44:00

using the LLM for any skill beyond creating the bullets, right? It's, like, not synthesizing new knowledge or, like, trying to have understanding of the original knowledge.00:44:05

So maybe the keyword Bruges is fine to use. I would hesitate to use, like, a semantic similarity thing here, because that requires the embedding to capture all the meaning.00:44:15

in the original summary, which you have no idea. The embedding might, you know, not capture one of the concepts mentioned in there, because it just… embeddings cannot capture every single idea.00:44:27

So, yeah, but, like, you don't need to do Rouge for that, right? You can think of it as just, like, keyword recall. Make sure that you've recalled, like, the most important keywords, or, like, the longest keywords, or whatever from your original thing in your output.00:44:39

shree

So, in this case, what I did was I compared, OpenAI APIs and Gemini APIs, and which one… whoever… Rouge was one of the metrics, saying, okay, this one had the most word overlap, but, this one also had a higher cosine similarity score, so even though there was a little bit drop in the, Rouge L, value.00:44:54

The semantic, value was higher in this particular model, so we can00:45:17

technically use this model for production and so on. Would that be a wrong conclusion?00:45:22

Shreya Shankar

I would not trust semantic similarity in this place, because semantic similarity between two pieces of text, it's just, like.00:45:26

it's very arbitrary. It's like, are they somewhat close?00:45:35

in the universe of meaning, right? That doesn't exactly tell you whether this output captures bullet… all the bullets that are in the input. That has no product cor… that has no metric correlation whatsoever. Rouge is the right metric. I don't want to say rouge. Keyword recall is the right metric.00:45:40

To make sure that your output captures the keywords.00:46:00

shree

I see.00:46:03

Hamel Husain

I would want to ask, like, can you make an LLM… can you have something better, like an LLM, and you train an LLM judge to, like, refine it?00:46:04

To say, like, is the summary what you want?00:46:14

Because, like, these metrics, like cosine similarity, rouge, they're very coarse-grained.00:46:16

And, just to give you a little bit more intuition, they're so core, like…00:46:22

They were used… this has been used in machine learning research for a really long time. Really…00:46:26

Far before… like, modern LLMs, when, like.00:46:32

It was a miracle to even get00:46:38

A model to create a sentence.00:46:42

In that regime where it was a miracle for a model to create a sentence, we were like, we thought that was cool. Like, it's producing, like, a coherent sentence. Like, we thought that was magic. Okay, in that regime, like, Rouge was okay, because the bar was, like, so low, like, all the way down here, right?00:46:45

And, like, cosine similarity was, like, okay, because it's like, oh, it's like…00:47:01

I'll take it. The meaning is… whatever. But now, it's like…00:47:06

We're not… we're way past that.00:47:10

And I would feel, like, you know, my intuition is like, this is not gonna measure anything close to what you want. So, we need to be really critical of it. Like, if your goal is to produce good summaries.00:47:12

Then, like, try to use LLM judge, because, like, good summaries is enough.00:47:26

fuzziness in it. Like, I don't even know what we mean by that. You don't know what you mean by that. No one knows. We have to refine, iterate, whatever, and that's a good LM judge thing, maybe, potentially.00:47:32

shree

Oh, cool, this was very useful, thank you both.00:47:42

Francesco Lanciana

I just have kind of a random question, on rags, because I'm… I'm just getting into them. And I was just looking at… so essentially, because I have calendar events, those can have descriptions or notes against them, so you have, like, structured and unstructured data. And I was like, do you…00:47:47

like, are you supposed to vectorize all of that and, like, use RAG on… like, I might be butchering this, because I just don't understand it.00:48:03

like, and use RAG on that, or are you still supposed to kind of, like, operate on the…00:48:12

structured data, like, by, like, making queries against the database, and then, I don't know, just deal with the unstructured data by vectorizing it.00:48:17

Hamel Husain

So… so you have this, like, sea of data, and the idea with data is, like, you want multiple loci, as many loci as you can00:48:26

conjure. Or, like.00:48:38

Not as many as you can conjure, but as many that you think make sense. Like, there's different ways of representing data.00:48:40

You know,00:48:47

And, like, you don't necessarily just want one embedding of that data. There's different types of… there's many different embeddings, and that's when, that's what… you might be interested in the Shreya's link that she linked in the chat. It will take you through that journey of what I'm talking about.00:48:48

But essentially, like, yeah, there's different representations that you can think of.00:49:05

For, like, a document that can focus on different…00:49:11

types of meaning. It's hard to, like, encapsulate meaning in just one vector, right? So, you can have…00:49:14

Many different representations, they can be vectors, and then one way to…00:49:21

Let's, quote, represent one loci into document is, like, keyword search.00:49:26

Or, another one is, like, if it's structured data, is it running a SQL query?00:49:32

And it's like, you want to think about, okay, your data, like, what are all the ways you might want to find that data?00:49:36

Okay, it could be semantic search, it could be keyword search, it could be…00:49:43

Whatever, could be very… it could be whatever you are doing, like, as a human being, like, how would you find it? You want to give the LLM00:49:46

all of that, the same facilities that you have to find that data, plus more. Plus, like…00:49:56

The semantic stuff, or whatever. And you want to stitch it together.00:50:02

So that, if you're asking your LLM a question, It has all the tools.00:50:07

to, like, decide, okay, I'm gonna try, like, these different ways to, like, search for this data.00:50:14

Francesco Lanciana

So you can kind of leave it up to it to be like, yeah, you've given it the primitives to be like, well, yeah, I might want to do keyword search on this, because I think it'll be enough, or I might want to…00:50:21

just look, yeah, like, just look at the event name, or I might want to blah, or I might actually need to look in, like, this feels like a higher-level meaning thing, where I might need to look at, like, the notes across a bunch of them.00:50:30

Hamel Husain

Yeah, I mean, it's not like you're giving it the steering wheel so much as it sounds, it's like, you will have to tune things, you know? There's a lot of knobs that you can…00:50:42

Effect here, like…00:50:52

You know, so, when it comes to, you know, semantic search, there's a lot of different knobs that you can tune.00:50:54

Of… and then, like, how do you… how do you, coalesce these results? How do you rank them? How do you figure out which ones are, relevant? Which ones are not relevant? How do you filter it correctly? How do you prompt your agent to, like.00:51:02

think about what it should do, how can you be smart about it? There's a lot that goes into it.00:51:18

So I don't want to make it sound like magic.00:51:25

Shreya Shankar

Yeah, it's very query-dependent. I would say sometimes people's data queries are point queries. So, like, when they're looking in their sea of data, there's, like, one or two hits for queries.00:51:28

So something like that, you might use, like, a keyword retrieval or semantic similarity retrieval. Sometimes people want aggregations, like, they want the sum of the sales, or they want to group by, all of their different store locations, and then figure out what products are, like, are not sold as much. This is a smell of, like, we should use a SQL query, like, we should store things in a table, and then00:51:39

write a SQL query with an aggregation to compute that. Or, sorry, not we, the agent should do that. Sort of, like, think about, kind of, how much of the data your…00:52:03

queries need, and, like, how that's kind of located, for lack of a better term, in your data, and then design the retrieval mechanism that's, like, best suited for that.00:52:13

Francesco Lanciana

Gotcha. And you might be putting in the prompt, like, how, like, how to think about it for the agent, like, you know, it's like, you know, if it is that aggregation thing, I don't want you to use this tool and come up with blah, and… okay.00:52:23

Shreya Shankar

Yep.00:52:36

Francesco Lanciana

Cool. Also, what was… what's loci?00:52:36

Hamel Husain

Loci is just, like,00:52:40

Maybe I should look up definitions so I don't butcher it, like, it's intuitive to me. So it's like,00:52:43

So the definition in the dictionary is a particular position, point, or place. Basically, like, Okay, it's,00:52:50

the best way that I can describe it is… you might… the Brian Bischoff, our TA and our friend, has this talk, the map is not the territory, and it's just this idea that, like, you can have…00:52:58

different points.00:53:10

In your… you can have different hooks.00:53:12

Like, in your data that allow you to say, like… you know.00:53:15

They'll allow you… so, like, in memory. So, like, one place where this is used a lot is memory, like, if you study the way humans remember things, it's, like, how many associations can you make with something?00:53:21

like, if you have, like, the more associations you have with something, the higher likelihood you are to remember it, right? And so,00:53:36

you know, and it's, like, loci. How, you know, can you, like, have a lot of loci, you know.00:53:46

Francesco Lanciana

Yeah, and so, and so it's kind of like.00:53:53

Hamel Husain

I feel like that is, like, you know, just saying, okay, it kind of, like, works for, in this… this is an analogy, it's kind of similar. It's like, how many associations can you…00:53:55

You know, make with that document to allow you to retrieve that document, because you might not find it00:54:05

you know, you might not use the right keyword, but you might say something similar, or you might describe, like, Brian has this project called semantic.art, which is, like, how can you find artwork? Which is, like, a very extreme example.00:54:10

Of, like, this concept, because a lot of times with art, like, there is no keyword you can think of, it's just a feeling.00:54:25

So it's, like, super semantic, right? It's like, how do you find artwork with a feeling? Well, okay, you need to make sure that you have lots of different loci into the artwork, right? Like, because you, like…00:54:34

It's just so, fuzzy.00:54:46

Francesco Lanciana

Hmm.00:54:50

Hamel Husain

And so, I hope that gives intuition. That might be a better way of explaining it.00:54:50

Francesco Lanciana

No, it does, thank you.00:54:54

katenesmyelova

Actually, Hamel, I have another question. I really enjoyed your… I skimmed through your patterns for building an LLM-based system and products, and it was literally… it's a gold mine for me. Love it. Do you have anything similar to how you are…00:55:00

creating traceability across everything that you have in your agentic system, from… I don't know.00:55:16

the health check on your Docker containers, to evals, to actual stuff, so, like, if something happens, our biggest problem usually is that we need to troubleshoot.00:55:25

And to troubleshoot, we need kind of very clear traceability across everything to be… and I also remember you mentioned that racks should be not a black box, REC should be quite transparent. So, the thing is, we need to be able to troubleshoot where exactly we failed.00:55:38

But do you have any best practices on how to make it consistent, how to make it most effective, where to start?00:55:56

Hamel Husain

Yeah, with the tracing stuff, I try not to be overzealous,00:56:08

with it, and just, like, respect the engineering that… because, like.00:56:12

You know, so a lot of, like, software already has telemetry?00:56:18

I mean, it depends what your situation is. Like, if you have some software that has telemetry already, like, let's say you're using Datadog.00:56:24

Or honeycomb, or, I don't know, whatever you're using.00:56:31

I just leave it there. I don't want to, like, refactor everything to be so…00:56:36

AI-centric. I try… I don't try to fight that battle. And I say, okay, like, if I do have these separate systems,00:56:43

You know, can I just put links where it…00:56:52

make sense to the other system as part of the metadata of the trace, but I don't try to, like…00:56:55

create a master view of everything, necessarily. I try to work within the…00:57:02

The bounds of whatever is there, so…00:57:09

You know, if I try to do that, then it would kick me out.00:57:13

So…00:57:16

katenesmyelova

No, actually, I think this kind of partially answers my question, because I think adding this trace…00:57:18

Paying as part of the trace metadata is a really brilliant thing.00:57:26

Thank you. That was my question.00:57:31

Hamel Husain

Yeah, and like, I guess what you could do is,00:57:35

Yeah, like, when you build your own trace viewer?00:57:40

You know, this is something you could render, if it makes sense? Like, if you find, like, hey, it's, like, really useful to know something about… like, if your particular system has some…00:57:44

idiosyncrasies, like…00:57:54

you know, you want… you really need to know what the container environment logs are, or something, I don't know, I'm just making it up. Then, like, maybe you render that in your custom viewer.00:57:57

But keep it simple, like, don't overdo it, unless you feel like you really need it. Even though it sounds cool.00:58:09

katenesmyelova

No, it's not about be… it's not about sounding cool, but what we kind of sometimes struggle with our… even with our application, and we are not adding the agent in the works yet, is when we have a problem, sometimes it's hard for us to troubleshoot.00:58:16

Because we have too many data sources, and there is not much, kind of.00:58:33

Not much connection, but it's hard to understand, because… again, our software, it's been built like,00:58:38

a typical business-to-business, which means that lots of interesting… interesting variants, and sometimes it's really hard to troubleshoot all this stuff, and I want to avoid this same pattern when I am building. On top of it, I'm also adding agent to the works.00:58:48

So, that's a problem that I want to avoid. I want to create something… I don't want to create something that won't be able to troubleshoot easily.00:59:05

Hamel Husain

Makes sense, yeah. Yeah, try the metadata approach, see…00:59:16

katenesmyelova

Yeah, that's… that's a really good thing. Thank you.00:59:19

Shreya Shankar

Alright.00:59:33

Whoa?00:59:34

Sleeping time.00:59:36

katenesmyelova

Yeah, awesome.00:59:38

Shreya Shankar

Thank you so much. No other questions.00:59:40

katenesmyelova

spending time with us, it's really useful, and I just absolutely love your course, and00:59:43

I… I love how you explained things, so… which is brilliant. Thank you so much.00:59:50

Shreya Shankar

Oh, thank you. It's super fun!00:59:56

And, we'll see everyone on Discord 24-7, so…00:59:58

katenesmyelova

Cheers, bye.01:00:05

Francesco Lanciana

Bye.01:00:06

Live session where instructors will address questions. Instructors may present answers to common questions, followed by live Q&A

[

Home

](/parlance-labs/evals/2025-3/home)[

Community

](/parlance-labs/evals/2025-3)