Version: 4.7.66

Test Cases

🛑Alpha

OpenTelemetry support is currently in alpha and may undergo significant changes.

Test Cases

The test cases SDK provides programmatic access to Gentrace test cases. While commonly used with evalDataset() for batch evaluations, test cases can also be accessed and managed independently.

Overview

The SDK is built by Stainless and provides type-safe access to Gentrace entities. The testCases object exposes methods to list, create, update, and delete test cases.

Basic usage

TypeScript
Python

typescript
import { init, testCases } from 'gentrace';
init({
  apiKey: process.env.GENTRACE_API_KEY,
});
// List test cases from a dataset
const testCasesList = await testCases.list({ 
  datasetId: 'your-dataset-id' 
});
// Access the test cases
for (const testCase of testCasesList.data) {
  console.log(testCase.name);
  console.log(testCase.inputs);
}

python
import os
from gentrace import init, test_cases
init(api_key=os.environ["GENTRACE_API_KEY"])
# List test cases from a dataset
test_case_list = await test_cases.list(
    dataset_id="your-dataset-id"
)
# Access the test cases
for test_case in test_case_list.data:
    print(test_case.name)
    print(test_case.inputs)

Test case structure

Each test case contains:

name (optional): Human-readable name for the test case
id (optional): Unique identifier
inputs: Dictionary/object containing the input data for your AI function

Resource methods

Create a test case

TypeScript
Python

typescript
const testCase = await testCases.create({
  datasetId: 'your-dataset-id',
  inputs: { query: 'What is AI?' },
  name: 'Basic AI question',
  expectedOutputs: { answer: 'Artificial Intelligence is...' } // optional
});

python
test_case = await test_cases.create(
    dataset_id="your-dataset-id",
    inputs={"query": "What is AI?"},
    name="Basic AI question",
    expected_outputs={"answer": "Artificial Intelligence is..."}  # optional
)

Retrieve a test case

TypeScript
Python

typescript
const testCase = await testCases.retrieve('test-case-id');
console.log(testCase.inputs);

python
test_case = await test_cases.retrieve("test-case-id")
print(test_case.inputs)

Delete a test case

TypeScript
Python

typescript
await testCases.delete('test-case-id');

python
await test_cases.delete("test-case-id")

List with filters

TypeScript
Python

typescript
// Filter by pipeline
const testCasesList = await testCases.list({
  pipelineId: 'pipeline-id',
  // or use pipelineSlug: 'pipeline-slug'
});

python
# Filter by pipeline
test_case_list = await test_cases.list(
    pipeline_id="pipeline-id",
    # or use pipeline_slug="pipeline-slug"
)

Common usage with evalDataset()

Test cases are frequently used with evalDataset() for running batch evaluations:

TypeScript
Python

typescript
await evalDataset({
  data: async () => {
    const testCasesList = await testCases.list({ datasetId: DATASET_ID });
    return testCasesList.data;
  },
  interaction: yourAIFunction, // See interaction() docs
});

python
async def fetch_test_cases():
    test_case_list = await test_cases.list(dataset_id=DATASET_ID)
    return test_case_list.data
await eval_dataset(
    data=fetch_test_cases,
    interaction=your_ai_function, # See interaction() docs
)

The interaction parameter should be a function wrapped with interaction() for proper OpenTelemetry tracing within experiments.

Test Cases

Overview​

Basic usage​

Test case structure​

Resource methods​

Create a test case​

Retrieve a test case​

Delete a test case​

List with filters​

Common usage with evalDataset()​

See also​

Overview

Basic usage

Test case structure

Resource methods

Create a test case

Retrieve a test case

Delete a test case

List with filters

Common usage with evalDataset()

See also