The test cases SDK provides programmatic access to Gentrace test cases. While commonly used with evalDataset() for batch evaluations, test cases can also be accessed and managed independently.
Basic usage
import { init, testCases } from 'gentrace';
init({
apiKey: process.env.GENTRACE_API_KEY,
});
// List test cases from a dataset
const testCasesList = await testCases.list({
datasetId: 'your-dataset-id',
});
// Access the test cases
for (const testCase of testCasesList.data) {
console.log(testCase.name);
console.log(testCase.inputs);
}
Overview
The SDK is built by Stainless and provides type-safe access to Gentrace entities. The testCases object exposes methods to list, create, update, and delete test cases.
Test case structure
Each test case contains:
Human-readable name for the test case
Dictionary/object containing the input data for your AI function
Optional expected outputs for validation
UUID of the dataset this test case belongs to
Resource methods
Create a test case
const testCase = await testCases.create({
datasetId: 'your-dataset-id',
inputs: { query: 'What is AI?' },
name: 'Basic AI question',
expectedOutputs: { answer: 'Artificial Intelligence is...' }, // optional
});
Retrieve a test case
const testCase = await testCases.retrieve('test-case-id');
console.log(testCase.inputs);
Delete a test case
await testCases.delete('test-case-id');
List with filters
// Filter by pipeline
const testCasesList = await testCases.list({
pipelineId: 'pipeline-id',
// or use pipelineSlug: 'pipeline-slug'
});
Common usage with evalDataset()
Test cases are frequently used with evalDataset() for running batch evaluations:
await evalDataset({
data: async () => {
const testCasesList = await testCases.list({
datasetId: DATASET_ID,
});
return testCasesList.data;
},
interaction: yourAIFunction, // See interaction() docs
});
The interaction parameter should be a function wrapped with interaction() for proper OpenTelemetry tracing within experiments.
See also
evalDataset() - Common usage pattern for batch evaluations
- Datasets - Managing datasets that contain test cases