Testing and Evals
TODO
principles:
- unit tests are no different to any other app, just
TestModel
orFunctionModel
, we know how to do unit tests, there's no magic just good practice - evals are more like benchmarks, they never "pass" although they do "fail", you care mostly about how they change over time, we (and we think most other people) don't really know what a "good" eval is, we provide some useful tools, we'll improve this if/when a common best practice emerges, or we think we have something interesting to say