Skip to main content
Version: 4.7.56

experiment()

The experiment() function creates a testing context for grouping related evaluations and tests. It manages the lifecycle of a Gentrace experiment, automatically starting and finishing the experiment while providing context for evaluation functions like eval() and evalDataset().

Overview

An experiment in Gentrace represents a collection of test cases or evaluations run against your AI pipeline. The experiment() function:

  1. Creates an experiment run in Gentrace with a unique experiment ID
  2. Provides context for evaluation functions to associate their results with the experiment
  3. Manages lifecycle by automatically starting and finishing the experiment
  4. Captures metadata and organizes test results for analysis

Basic Usage

typescript
import { init, experiment, evalOnce } from 'gentrace';
init({
apiKey: process.env.GENTRACE_API_KEY,
});
const PIPELINE_ID = process.env.GENTRACE_PIPELINE_ID!;
// Basic experiment usage
experiment(PIPELINE_ID, async () => {
await evalOnce('simple-test', async () => {
// Your test logic here
return 'test result';
});
});

Parameters

Function Signature

typescript
function experiment<T>(
pipelineId: string,
callback: () => T | Promise<T>,
options?: ExperimentOptions
): Promise<T>

Parameters

  • pipelineId (string, required): The UUID of the Gentrace pipeline to associate with this experiment
  • callback (function, required): The function containing your experiment logic and test cases
  • options (ExperimentOptions, optional): Additional configuration options

ExperimentOptions

typescript
type ExperimentOptions = {
metadata?: Record<string, any>;
}
  • metadata (object, optional): Custom metadata to associate with the experiment run

Advanced Usage

With Metadata

typescript
import { experiment, evalOnce } from 'gentrace';
experiment(
PIPELINE_ID,
async () => {
await evalOnce('model-comparison-test', async () => {
// Test logic comparing different models
return { accuracy: 0.95, latency: 120 };
});
},
{
metadata: {
model: 'gpt-4o',
temperature: 0.7,
version: '1.2.0',
environment: 'staging'
}
}
);

Multiple Test Cases

typescript
import { experiment, evalOnce, evalDataset, testCases } from 'gentrace';
experiment(PIPELINE_ID, async () => {
// Individual test cases
await evalOnce('accuracy-test', async () => {
const result = await myAIFunction('test input');
return { accuracy: calculateAccuracy(result) };
});
await evalOnce('latency-test', async () => {
const start = Date.now();
await myAIFunction('test input');
const latency = Date.now() - start;
return { latency };
});
// Dataset evaluation
await evalDataset({
data: async () => {
const DATASET_ID = process.env.GENTRACE_DATASET_ID!;
const testCasesList = await testCases.list({ datasetId: DATASET_ID });
return testCases.data;
},
interaction: myAIFunction,
});
});

Context and Lifecycle

The experiment() function manages the experiment lifecycle automatically:

  1. Start: Creates a new experiment run in Gentrace
  2. Context: Provides experiment context to nested evaluation functions
  3. Execution: Runs your experiment callback/function
  4. Finish: Marks the experiment as complete in Gentrace

Accessing Experiment Context

typescript
import { getCurrentExperimentContext } from 'gentrace';
experiment(PIPELINE_ID, async () => {
const context = getCurrentExperimentContext();
console.log('Experiment ID:', context?.experimentId);
console.log('Pipeline ID:', context?.pipelineId);
// Your test logic here
});

Error Handling

The experiment function handles errors gracefully and automatically associates all errors and exceptions with the OpenTelemetry span. When an error occurs within an experiment or evaluation, it is captured as span events and attributes, providing full traceability in your observability stack.

typescript
experiment(PIPELINE_ID, async () => {
try {
await evalOnce('test-that-might-fail', async () => {
// Test logic that might throw an error
// This error will be automatically captured in the OTEL span
throw new Error('Test failed');
});
} catch (error) {
console.log('Test failed as expected:', error.message);
// The error is already recorded in the span with full stack trace
}
// Experiment will still finish properly
// All error information is preserved in the OpenTelemetry trace
});

OTEL Span Error Integration

When errors occur within experiments:

  • Automatic Capture: All Error objects (TypeScript) and exceptions (Python) are automatically captured as span events
  • Stack Traces: Full stack traces are preserved in the span attributes for debugging
  • Error Attributes: Error messages, types, and metadata are recorded as span attributes
  • Span Status: The span status is automatically set to ERROR when unhandled exceptions occur

This integration ensures that failed experiments and evaluations are fully observable and debuggable through your OpenTelemetry-compatible monitoring tools.

Best Practices

1. Use Descriptive Names and Metadata

typescript
// Good: Descriptive metadata
experiment(PIPELINE_ID, async () => {
// tests...
}, {
metadata: {
model: 'gpt-4o',
prompt_version: 'v2.1',
test_suite: 'regression',
branch: 'feature/new-prompts'
}
});

Organize related test cases within a single experiment:

typescript
experiment(PIPELINE_ID, async () => {
// All accuracy-related tests
await evalOnce('accuracy-basic', async () => { /* ... */ });
await evalOnce('accuracy-edge-cases', async () => { /* ... */ });
// All performance-related tests
await evalOnce('latency-test', async () => { /* ... */ });
await evalOnce('throughput-test', async () => { /* ... */ });
});

3. Handle Async Operations Properly

typescript
// Ensure all async operations are awaited
experiment(PIPELINE_ID, async () => {
await evalOnce('test-1', async () => { /* ... */ });
await evalOnce('test-2', async () => { /* ... */ });
// Run tests in parallel if they're independent
await Promise.all([
evalOnce('parallel-test-1', async () => { /* ... */ }),
evalOnce('parallel-test-2', async () => { /* ... */ }),
]);
});

Requirements

  • OpenTelemetry Setup: The experiment() function requires OpenTelemetry to be configured for tracing
  • Valid Pipeline ID: Must provide a valid UUID for an existing Gentrace pipeline
  • API Key: Gentrace API key must be configured via init()

See Also