eval()
and TypeScript evalOnce()
functions create individual test cases within an experiment()
. These functions capture the execution of specific test logic, automatically creating OpenTelemetry spans with detailed tracing information, and associating results with the parent experiment.
These functions must be called within the context of an experiment()
and automatically create individual test spans for each evaluation.
The Gentrace SDK automatically configures OpenTelemetry when you
call
init()
. If you have an existing OpenTelemetry setup or need
custom configuration, see the manual setup
guide.Basic usage
Overview
Individual evaluations in Gentrace represent specific test cases or validation steps within your AI pipeline testing. Theeval()
and evalOnce()
functions provide comprehensive test execution capabilities:
Key features
Generate OpenTelemetry
spans with
detailed execution tracing for each test case
Record the return value of
eval()
or @eval()
-decorated functions
as the output for each test spanLink test cases to the parent experiment context for proper grouping
and analysis
Preserve full error information in traces while allowing tests to
continue running
Support both synchronous and asynchronous test functions seamlessly
Parameters
Multiple test cases with different evaluation types
OTEL span error integration
When errors occur during individual evaluations:- Automatic Capture: All validation errors and interaction exceptions are automatically captured as span events
- Individual Spans: Each evaluation gets its own span, so errors are isolated to specific test cases
- Continued Processing: Failed evaluations don’t stop the execution of other evaluations in the experiment
- Error Attributes: Error messages, types, and metadata are recorded as span attributes
- Span Status: Individual evaluation spans are marked with
ERROR
status when exceptions occur - Error Handling: See OpenTelemetry error recording and exception recording for more details
Requirements
- Gentrace SDK Initialization: Must call
init()
with a valid API key. The SDK automatically configures OpenTelemetry for you. For custom OpenTelemetry setups, see the manual setup guide - Experiment Context: Must be called within an
experiment()
function - Valid Pipeline ID: When using an explicit pipeline ID, it must be valid
Context requirements
Botheval()
and evalOnce()
must be called within an active experiment context. They automatically:
- Retrieve experiment context from the parent
experiment()
function - Associate spans with the experiment ID for proper grouping
- Inherit experiment metadata and configuration
OpenTelemetry integration
The evaluation functions create rich OpenTelemetry spans with comprehensive tracing information:Span attributes
Links the evaluation to its parent experiment
The name provided to the evaluation function
Automatic error type and message capture
Span events
Records function outputs for result tracking
Automatic exception recording with full stack traces
Example span structure
Related functions
init()
- Initialize the Gentrace SDKinteraction()
- Instrument AI functions for tracing within experimentsevalDataset()
/eval_dataset()
- Run tests against a dataset within an experimentexperiment()
- Create experiment contexts for grouping evaluationstraced()
- Alternative approach for tracing functions