Skip to main content
Version: 4.7.28

Manage test data with datasets and test cases

Test cases are examples that your generative AI pipeline might encounter. Test cases are uniquely associated with a pipeline and a dataset. A test case contains:

  • A unique name
  • Inputs that will be passed to your AI pipeline
  • Expected output (optional, depending on the evaluators that you need)
  • Expected output steps (optional, also depending on the evaluators that you need)

Datasets are used to organize test cases into groups within a pipeline. Each dataset contains multiple test cases.

Schema

This section breaks down the test case schema in more detail.

Name

Simple, human-readable name for your test case.

Inputs

Inputs are the parameters to your AI pipeline, expressed as a JSON string.

Let's say you have a simple AI pipeline (a function with an OpenAI invocation) that composes an email. The function accepts a sender, receiver, and query as string.

typescript
import { init } from "@gentrace/core"
import { OpenAI } from "@gentrace/openai";
 
init({
apiKey: 'my-gentrace-api-key', // TODO: Add your Gentrace API key
})
 
const openai = new OpenAI({
apiKey: 'my-open-ai-api-key', // TODO: Add your OpenAI API key
});
 
export const compose = async ({
sender,
receiver,
query,
}: {
sender: string;
receiver: string;
query: string;
}) => {
const response = await openai.chat.completions.create({
pipelineId: "draft",
model: "gpt-3.5-turbo",
temperature: 0,
messages: [
{
role: "system",
content: `Write a concise and complete email from ${sender} to ${receiver} ${query}.`,
},
],
});
return {
content: response.choices[0]!.message!.content,
pipelineRunId: response.pipelineRunId!,
};
};
 

The below JSON object input string could represent the parameters to this pipeline.

json
{
"query": "bragging about superiority",
"sender": "[email protected]",
"receiver": "[email protected]"
}
Exact match required

Each key from the inputs should exactly match what your AI pipeline expects. The inputs must be a JSON object. Arrays or primitive types (e.g. number, strings, booleans) are not permitted.

Expected outputs (optional)

This object captures the expected, ideal outputs of your pipeline.

Referring to the code example in the prior section, the expected output would be the ideal chat completion string returned from the function. Here's an example string that could work well as the expected output for the case.

Dear Joker,
It has come to our attention that instances of bragging about superiority with respect to the
Justice League have been made, and we want to emphasize that such behavior is not condoned
or representative of our organization's values of justice, respect, and collaboration.
Best,
Superman

This would need to be inserted as a JSON structure, eg { "value": "Dear Joker..." }

Managing datasets and test cases

First, navigate to "Datasets" for the Pipeline. You can create a new dataset by selecting "New Dataset" or choose an existing dataset from the list. Within each dataset, you can add test cases.

New dataset

In the UI (small datasets)

If you only need to specify a few test cases, you can create them directly from the UI by selecting "New test case".

New test case

Alternatively, you can use the golden dataset for the pipeline.

From CSV

You can also bulk import test cases from a CSV. Navigate to the dataset view and click the Import from CSV button in the top right.

Select import CSV

Select the relevant CSV file.

Import CSV modal

Pay attention to the in-app instructions for dealing with headers.

With API/SDK

We expose methods to perform CRUD operations on both test cases and datasets. This is helpful for creating internal workflows to manage your data.

Test Case Operations

Refer to the following SDK methods for test cases:

And corresponding API endpoints for test cases:

Dataset Operations

For dataset operations, refer to these SDK methods:

And these corresponding API endpoints:

When using the SDK to retrieve test cases, you can either get all test cases for a pipeline's golden dataset or specify a particular dataset:

typescript
import { init, getTestCases } from "@gentrace/core";
 
init({
apiKey: process.env.GENTRACE_API_KEY,
});
 
async function main() {
// If no dataset ID is provided, the golden dataset will be selected by default
const cases = await getTestCases("main");
// To specify a particular dataset, you can provide its ID:
// const cases = await getTestCasesForDataset("123e4567-e89b-12d3-a456-426614174000")
}
 
main();
 

Adding images / other files for multi-modal

If you're evaluating multi-modal pipelines, you can upload images or other files to Gentrace and use them as inputs to your test cases.

Alternatively, you can link to them instead (see Option 2 below).

Option 1: Upload files

If your pipeline depends on file inputs (e.g. images, PDFs), you can upload your files to our object storage. We then return an authenticated URL to add to your test case inputs.

python
import os
import gentrace
from dotenv import load_dotenv
load_dotenv()
gentrace.init(
api_key=os.getenv("GENTRACE_API_KEY")
)
GENTRACE_PIPELINE_SLUG = "main"
with open("/home/user/gentrace-icon.png", "rb") as f:
# This SDK method receives a file handle and returns an authenticated URL
url = gentrace.upload_file(f)
print("Gentrace file URL: ", url)
gentrace.create_test_case(
# Pipeline slug
GENTRACE_PIPELINE_SLUG,
{
"name": 'Gentrace Icon',
'inputs': {
# Any Gentrace-uploaded URL will be rendered in a pretty format in the UI
'iconUrl': url,
},
"expectedOutputs": {
"value": "Gentrace logo"
}
}
)

When viewing the test cases with Gentrace file URLs, we detect image extensions and render them directly in the UI.

Images render in UI

Uploading file content bytes

If your file does not have a presence on the filesystem, you can directly upload the bytes to Gentrace. This approach requires you to name your file.

Python only

Uploading file content bytes is only supported in Python at this time.

python
import gentrace
gentrace.init(
api_key=os.getenv("GENTRACE_API_KEY")
)
with open("examples/files/gentrace-icon.png", "rb") as f:
# Remember to specify the file extension! The Gentrace UI relies on the file
# extension to render file contents correctly.
url = gentrace.upload_bytes("gentrace-icon.png", f.read())
print("Uploaded URL: ", url)
gentrace.create_test_case(
"main",
{
"name": 'Gentrace Icon',
'inputs': {
'imageUrl': url,
},
"expectedOutputs": {
"value": "Gentrace logo"
}
}
)

Retrieving files

All uploaded files require authentication by an API key. When pulling test cases for a Gentrace pipeline, you need to construct an authenticated HTTP request to download the files associated with each case.

Here's a script that:

  • Pulls test cases for a Gentrace pipeline
  • Downloads the Gentrace-uploaded images from the input URL
  • Runs the image data through our AI business logic
  • Submits the outputs for grading
python
import os
from urllib.parse import urlparse
import gentrace
import requests
from dotenv import load_dotenv
# Import your AI pipeline
from ai.pipelines import image_to_words
GENTRACE_PIPELINE_SLUG = "main"
load_dotenv()
gentrace.init(
api_key=os.getenv("GENTRACE_API_KEY")
)
cases = gentrace.get_test_cases(pipeline_slug=GENTRACE_PIPELINE_SLUG)
outputs = []
for case in cases:
image_url = case.get("inputs").get("imageUrl")
# Image URLs are authenticated. You must provide an API key as the bearer token.
headers = {
'Authorization': 'Bearer {}'.format(os.getenv("GENTRACE_API_KEY"))
}
response = requests.get(image_url, headers=headers)
# Run the AI pipeline on the raw image content
image_description = image_to_words(response.content)
outputs.append({
"value": image_description
})
response = gentrace.submit_test_result(GENTRACE_PIPELINE_SLUG, cases, outputs)
print(response["resultId"])

In order to link to external images or files and have them render in Gentrace, you need to authorize external domains.

Administrators can do this by navigating to security settings, scrolling to "Authorized file URL domains," and pressing "Add domain."

Once you've added a domain, you can place URLs from that domain in your test case inputs. These URLs will render in the UI and be accessible as URLs in your pipeline.