Gentrace Documentation

Datasets in Gentrace are collections of test cases used to evaluate your AI models and pipelines. They provide a structured way to organize your test data, track performance over time, and ensure consistent evaluation across different model versions.

What are datasets?

A dataset is a container that holds multiple test cases for a specific pipeline. Each test case consists of:

Inputs - The data passed to your AI model
Expected outputs - The desired response from your model
Name - A descriptive identifier for the test case
Metadata - Additional context and information

Datasets enable you to:

Run systematic evaluations across multiple test cases
Compare model performance between different versions
Track evaluation metrics over time
Import and export test data in various formats

Creating datasets

You can create datasets through the Gentrace web interface or programmatically using the SDK.

Creating via web interface

Navigate to your pipeline in the Gentrace dashboard
Click “New Dataset” to create a dataset for your pipeline
Give your dataset a descriptive name and description

Creating via SDK

import { init, datasets } from 'gentrace';

init({
  apiKey: process.env.GENTRACE_API_KEY,
});

const dataset = await datasets.create({
  name: 'Customer Support Evaluation',
  description: 'Test cases for customer support chatbot responses',
  pipelineId: 'your-pipeline-id', // or use pipelineSlug
});

console.log('Created dataset:', dataset.id);

Managing test cases

Test cases are the individual data points within your dataset. You can create them one at a time or in bulk.

Creating single test cases

import { init, testCases } from 'gentrace';

init({
  apiKey: process.env.GENTRACE_API_KEY,
});

const testCase = await testCases.create({
  datasetId: 'your-dataset-id',
  name: 'Billing inquiry',
  inputs: {
    query: 'How do I cancel my subscription?',
    context: 'Customer has active premium plan',
  },
  expectedOutputs: {
    response:
      'To cancel your subscription, please visit your account settings...',
    confidence: 0.9,
  },
});

Listing test cases

// List all test cases in a dataset
const testCasesList = await testCases.list({
  datasetId: 'your-dataset-id',
});

// Access the test cases
for (const testCase of testCasesList.data) {
  console.log(testCase.name);
  console.log(testCase.inputs);
}

Updating and deleting test cases

// Retrieve a specific test case
const testCase = await testCases.retrieve('test-case-id');

// Delete a test case
await testCases.delete('test-case-id');

Managing datasets

Listing datasets

// List all datasets
const datasetList = await datasets.list();

// Filter by pipeline
const filteredDatasets = await datasets.list({
  pipelineId: 'your-pipeline-id',
  archived: false, // Exclude archived datasets
});

// Access the datasets
for (const dataset of datasetList.data) {
  console.log(dataset.name);
  console.log(dataset.description);
}

Updating datasets

// Update dataset properties
const dataset = await datasets.update('dataset-id', {
  name: 'Updated dataset name',
  description: 'New description',
  isArchived: false,
  isGolden: true, // Mark as golden dataset
});

Importing data

You can import test cases from CSV, JSON, or JSONL files to quickly populate your datasets.

CSV import

The easiest way to import large amounts of test data is through CSV files. Your CSV should have columns for the test case name, inputs, and expected outputs. Example CSV structure:

name,input_query,input_context,expected_response
"Billing question","How much does the premium plan cost?","New customer inquiry","The premium plan costs $29/month..."
"Technical issue","Login not working","Existing customer","Please try clearing your browser cache..."

To import via the web interface:

Navigate to your dataset in the Gentrace dashboard
Click “Import” and select your CSV file
Map the CSV columns to your dataset fields
Review and confirm the import

JSON/JSONL import

You can also import JSON or JSONL files with structured test case data:

[
  {
    "name": "Billing question",
    "inputs": {
      "query": "How much does the premium plan cost?",
      "context": "New customer inquiry"
    },
    "expectedOutputs": {
      "response": "The premium plan costs $29/month and includes..."
    }
  }
]

Using datasets in evaluation

Once you have datasets with test cases, you can use them to evaluate your models and pipelines.

Running evaluations with evalDataset()

Datasets integrate seamlessly with Gentrace’s evaluation system using evalDataset():

import { evalDataset, testCases } from 'gentrace';

await evalDataset({
  data: async () => {
    const testCasesList = await testCases.list({
      datasetId: 'your-dataset-id',
    });
    return testCasesList.data;
  },
  interaction: yourAIFunction, // See interaction() docs
});

Golden datasets

Each pipeline can have a “golden dataset” - a special dataset that represents your core test cases. You can mark a dataset as golden when creating or updating it:

// Mark a dataset as golden during creation
const goldenDataset = await datasets.create({
  name: 'Golden Test Cases',
  description: 'Core evaluation test cases',
  pipelineId: 'your-pipeline-id',
  isGolden: true,
});

// Or update an existing dataset to be golden
await datasets.update('dataset-id', {
  isGolden: true,
});

Best practices

Organize by use case - Create separate datasets for different types of evaluations (accuracy, safety, performance)
Use descriptive names - Make test case names clear and searchable
Include edge cases - Test boundary conditions and error scenarios
Version your datasets - Keep historical versions as your use cases evolve
Regular updates - Add new test cases based on real-world usage patterns
Archive old datasets - Use the isArchived flag to maintain clean dataset lists

Next steps

Learn about experiments to compare model performance
Set up unit tests for continuous evaluation
Explore dataset tests for comprehensive evaluation workflows
Check out the Test Cases SDK reference for more advanced usage

Getting started

Error analysis

Tracing

Evaluation

Integrations

Administration

Reference

SDK Entities

Datasets

What are datasets?

Creating datasets

Creating via web interface

Creating via SDK

Managing test cases

Creating single test cases

Listing test cases

Updating and deleting test cases

Managing datasets

Listing datasets

Updating datasets

Importing data

CSV import

JSON/JSONL import

Using datasets in evaluation

Running evaluations with evalDataset()

Golden datasets

Best practices

Next steps

Getting started

Error analysis

Tracing

Evaluation

Integrations

Administration

Reference

SDK Entities

​What are datasets?

​Creating datasets

​Creating via web interface

​Creating via SDK

​Managing test cases

​Creating single test cases

​Listing test cases

​Updating and deleting test cases

​Managing datasets

​Listing datasets

​Updating datasets

​Importing data

​CSV import

​JSON/JSONL import

​Using datasets in evaluation

​Running evaluations with evalDataset()

​Golden datasets

​Best practices

​Next steps

What are datasets?

Creating datasets

Creating via web interface

Creating via SDK

Managing test cases

Creating single test cases

Listing test cases

Updating and deleting test cases

Managing datasets

Listing datasets

Updating datasets

Importing data

CSV import

JSON/JSONL import

Using datasets in evaluation

Running evaluations with evalDataset()

Golden datasets

Best practices

Next steps