eval_dataset()
and TypeScript evalDataset()
functions run a series of evaluations against a dataset using a provided interaction()
function.
These functions must be called within the context of an experiment()
and automatically create individual test spans for each test case in the dataset.
The Gentrace SDK automatically configures OpenTelemetry when you
call
init()
. If you have an existing OpenTelemetry setup or need
custom configuration, see the manual setup
guide.Basic usage
Overview
Dataset evaluation functions allow you to:- Run batch evaluations against multiple test cases from a dataset
- Validate inputs using optional schema validation (Pydantic for Python, Zod or any Standard Schema-compliant schema library for TypeScript)
- Trace each test case as individual OpenTelemetry spans within the experiment context
- Handle errors gracefully with automatic span error recording
- Process results from both synchronous and asynchronous data providers and interaction functions
Parameters
Function signature
Parameters
Configuration object containing data provider, interaction function, and optional schema
TestInput Type
Return Value
Returns a sequence of results from the interaction function. Failed test cases (due to validation errors) will haveNone
values in the corresponding positions.
Advanced usage
With schema validation
Schema validation ensures that your test cases have the correct structure and data types before being passed to your interaction function. Use Zod for TypeScript or Pydantic for Python to define your input schemas.Custom data providers
You can provide test cases from any source by implementing a custom data provider function. Each data point must conform to theTestInputs
structure from above.
This is useful when you want to use test cases from the Gentrace API via the test cases SDK or directly defining them in-line.
Error handling
The dataset evaluation functions handle errors gracefully and automatically associate all errors and exceptions with OpenTelemetry spans. When validation or interaction errors occur, they are captured as span events and attributes.OTEL span error integration
When errors occur during dataset evaluation:- Automatic Capture: All validation errors and interaction exceptions are automatically captured as span events
- Individual Spans: Each test case gets its own span, so errors are isolated to specific test cases
- Continued Processing: Failed test cases don’t stop the evaluation of other test cases
- Error Attributes: Error messages, types, and metadata are recorded as span attributes
- Span Status: Individual test case spans are marked with
ERROR
status when exceptions occur - Error Handling: See OpenTelemetry error recording and exception recording for more details
Example span structure
Requirements
- Gentrace SDK Initialization: Must call
init()
with a valid API key. The SDK automatically configures OpenTelemetry for you. For custom OpenTelemetry setups, see the manual setup guide - Experiment Context: Must be called within an
experiment()
function - Valid Data Provider: The
data
function must return an array of test cases
Related functions
init()
- Initialize the Gentrace SDKinteraction()
- Instrument AI functions for tracing within experimentsexperiment()
- Create experiment context for dataset evaluationseval()
/evalOnce()
- Run individual test cases within an experimenttraced()
- Alternative approach for tracing functions