Datasets in Gentrace are collections of test cases used to evaluate your AI models and pipelines. They provide a structured way to organize your test data, track performance over time, and ensure consistent evaluation across different model versions.

What are datasets?

A dataset is a container that holds multiple test cases for a specific pipeline. Each test case consists of:
  • Inputs - The data passed to your AI model
  • Expected outputs - The desired response from your model
  • Name - A descriptive identifier for the test case
  • Metadata - Additional context and information
Datasets enable you to:
  • Run systematic evaluations across multiple test cases
  • Compare model performance between different versions
  • Track evaluation metrics over time
  • Import and export test data in various formats

Creating datasets

You can create datasets through the Gentrace web interface or programmatically using the SDK.

Creating via web interface

  1. Navigate to your pipeline in the Gentrace dashboard
  2. Click “New Dataset” to create a dataset for your pipeline
  3. Give your dataset a descriptive name and description
Creating a new dataset

Creating via SDK

import { init, datasets } from 'gentrace';

init({
  apiKey: process.env.GENTRACE_API_KEY,
});

const dataset = await datasets.create({
  name: 'Customer Support Evaluation',
  description: 'Test cases for customer support chatbot responses',
  pipelineId: 'your-pipeline-id' // or use pipelineSlug
});

console.log('Created dataset:', dataset.id);

Managing test cases

Test cases are the individual data points within your dataset. You can create them one at a time or in bulk.

Creating single test cases

import { init, testCases } from 'gentrace';

init({
  apiKey: process.env.GENTRACE_API_KEY,
});

const testCase = await testCases.create({
  datasetId: 'your-dataset-id',
  name: 'Billing inquiry',
  inputs: {
    query: 'How do I cancel my subscription?',
    context: 'Customer has active premium plan'
  },
  expectedOutputs: {
    response: 'To cancel your subscription, please visit your account settings...',
    confidence: 0.9
  }
});

Listing test cases

// List all test cases in a dataset
const testCasesList = await testCases.list({ 
  datasetId: 'your-dataset-id' 
});

// Access the test cases
for (const testCase of testCasesList.data) {
  console.log(testCase.name);
  console.log(testCase.inputs);
}

Updating and deleting test cases

// Retrieve a specific test case
const testCase = await testCases.retrieve('test-case-id');

// Delete a test case
await testCases.delete('test-case-id');

Managing datasets

Listing datasets

// List all datasets
const datasetList = await datasets.list();

// Filter by pipeline
const filteredDatasets = await datasets.list({
  pipelineId: 'your-pipeline-id',
  archived: false // Exclude archived datasets
});

// Access the datasets
for (const dataset of datasetList.data) {
  console.log(dataset.name);
  console.log(dataset.description);
}

Updating datasets

// Update dataset properties
const dataset = await datasets.update('dataset-id', {
  name: 'Updated dataset name',
  description: 'New description',
  isArchived: false,
  isGolden: true // Mark as golden dataset
});

Importing data

You can import test cases from CSV, JSON, or JSONL files to quickly populate your datasets.

CSV import

The easiest way to import large amounts of test data is through CSV files. Your CSV should have columns for the test case name, inputs, and expected outputs. Example CSV structure:
name,input_query,input_context,expected_response
"Billing question","How much does the premium plan cost?","New customer inquiry","The premium plan costs $29/month..."
"Technical issue","Login not working","Existing customer","Please try clearing your browser cache..."
To import via the web interface:
  1. Navigate to your dataset in the Gentrace dashboard
  2. Click “Import” and select your CSV file
  3. Map the CSV columns to your dataset fields
  4. Review and confirm the import
Importing CSV data

JSON/JSONL import

You can also import JSON or JSONL files with structured test case data:
[
  {
    "name": "Billing question",
    "inputs": {
      "query": "How much does the premium plan cost?",
      "context": "New customer inquiry"
    },
    "expectedOutputs": {
      "response": "The premium plan costs $29/month and includes..."
    }
  }
]

Using datasets in evaluation

Once you have datasets with test cases, you can use them to evaluate your models and pipelines.

Running evaluations with evalDataset()

Datasets integrate seamlessly with Gentrace’s evaluation system using evalDataset():
import { evalDataset, testCases } from 'gentrace';

await evalDataset({
  data: async () => {
    const testCasesList = await testCases.list({ datasetId: 'your-dataset-id' });
    return testCasesList.data;
  },
  interaction: yourAIFunction, // See interaction() docs
});

Golden datasets

Each pipeline can have a “golden dataset” - a special dataset that represents your core test cases. You can mark a dataset as golden when creating or updating it:
// Mark a dataset as golden during creation
const goldenDataset = await datasets.create({
  name: 'Golden Test Cases',
  description: 'Core evaluation test cases',
  pipelineId: 'your-pipeline-id',
  isGolden: true
});

// Or update an existing dataset to be golden
await datasets.update('dataset-id', {
  isGolden: true
});

Best practices

  • Organize by use case - Create separate datasets for different types of evaluations (accuracy, safety, performance)
  • Use descriptive names - Make test case names clear and searchable
  • Include edge cases - Test boundary conditions and error scenarios
  • Version your datasets - Keep historical versions as your use cases evolve
  • Regular updates - Add new test cases based on real-world usage patterns
  • Archive old datasets - Use the isArchived flag to maintain clean dataset lists

Next steps