Manage test data with datasets and test cases
Test cases are examples that your generative AI pipeline might encounter. Test cases are uniquely associated with a pipeline and a dataset. A test case contains:
- A unique name
- Inputs that will be passed to your AI pipeline
- Expected output (optional, depending on the evaluators that you need)
- Expected output steps (optional, also depending on the evaluators that you need)
Datasets are used to organize test cases into groups within a pipeline. Each dataset contains multiple test cases.
Schema
This section breaks down the test case schema in more detail.
Name
Simple, human-readable name for your test case.
Inputs
Inputs are the parameters to your AI pipeline, expressed as a JSON string.
Let's say you have a simple AI pipeline (a function with an OpenAI invocation) that composes an email. The function accepts a sender, receiver, and query as string.
- TypeScript
- Python
typescript
import {init } from "@gentrace/core"import {OpenAI } from "@gentrace/openai";init ({apiKey : 'my-gentrace-api-key', // TODO: Add your Gentrace API key})constopenai = newOpenAI ({apiKey : 'my-open-ai-api-key', // TODO: Add your OpenAI API key});export constcompose = async ({sender ,receiver ,query ,}: {sender : string;receiver : string;query : string;}) => {constresponse = awaitopenai .chat .completions .create ({pipelineId : "draft",model : "gpt-3.5-turbo",temperature : 0,messages : [{role : "system",content : `Write a concise and complete email from ${sender } to ${receiver } ${query }.`,},],});return {content :response .choices [0]!.message !.content ,pipelineRunId :response .pipelineRunId !,};};
python
import openaidef compose(sender, receiver, query):prompt = f"Write a concise and complete email from {sender} to {receiver} {query}."chat_completion = openai.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role": "user", "content": prompt}])return chat_completion.choices[0].message.content
The below JSON object input string could represent the parameters to this pipeline.
json
{"query": "bragging about superiority",}
Each key from the inputs should exactly match what your AI pipeline expects. The inputs must be a JSON object. Arrays or primitive types (e.g. number, strings, booleans) are not permitted.
Expected outputs (optional)
This object captures the expected, ideal outputs of your pipeline.
Referring to the code example in the prior section, the expected output would be the ideal chat completion string returned from the function. Here's an example string that could work well as the expected output for the case.
Dear Joker,It has come to our attention that instances of bragging about superiority with respect to theJustice League have been made, and we want to emphasize that such behavior is not condonedor representative of our organization's values of justice, respect, and collaboration.Best,Superman
This would need to be inserted as a JSON structure, eg { "value": "Dear Joker..." }
Managing datasets and test cases
First, navigate to "Datasets" for the Pipeline. You can create a new dataset by selecting "New Dataset" or choose an existing dataset from the list. Within each dataset, you can add test cases.
In the UI (small datasets)
If you only need to specify a few test cases, you can create them directly from the UI by selecting "New test case".
Alternatively, you can use the golden dataset for the pipeline.
From CSV
You can also bulk import test cases from a CSV. Navigate to the dataset view and click the Import from CSV
button in the top right.
Select the relevant CSV file.
Pay attention to the in-app instructions for dealing with headers.
With API/SDK
We expose methods to perform CRUD operations on both test cases and datasets. This is helpful for creating internal workflows to manage your data.
Test Case Operations
Refer to the following SDK methods for test cases:
And corresponding API endpoints for test cases:
Dataset Operations
For dataset operations, refer to these SDK methods:
And these corresponding API endpoints:
When using the SDK to retrieve test cases, you can either get all test cases for a pipeline's golden dataset or specify a particular dataset:
- TypeScript
- Python
typescript
import {init ,getTestCases } from "@gentrace/core";init ({apiKey :process .env .GENTRACE_API_KEY ,});async functionmain () {// If no dataset ID is provided, the golden dataset will be selected by defaultconstcases = awaitgetTestCases ("main");// To specify a particular dataset, you can provide its ID:// const cases = await getTestCasesForDataset("123e4567-e89b-12d3-a456-426614174000")}main ();
python
import osimport gentracegentrace.init()PIPELINE_SLUG = "main"def main():# If no dataset ID is provided, the golden dataset will be selected by defaultcases = await gentrace.get_test_cases(pipeline_slug=PIPELINE_SLUG)# To specify a particular dataset, you can provide its ID.# cases = await gentrace.get_test_cases(dataset_id="123e4567-e89b-12d3-a456-426614174000")main()
Adding images / other files for multi-modal
If you're evaluating multi-modal pipelines, you can upload images or other files to Gentrace and use them as inputs to your test cases.
Alternatively, you can link to them instead (see Option 2 below).
Option 1: Upload files
If your pipeline depends on file inputs (e.g. images, PDFs), you can upload your files to our object storage. We then return an authenticated URL to add to your test case inputs.
- TypeScript
- Python
python
import osimport gentracefrom dotenv import load_dotenvload_dotenv()gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))GENTRACE_PIPELINE_SLUG = "main"with open("/home/user/gentrace-icon.png", "rb") as f:# This SDK method receives a file handle and returns an authenticated URLurl = gentrace.upload_file(f)print("Gentrace file URL: ", url)gentrace.create_test_case(# Pipeline slugGENTRACE_PIPELINE_SLUG,{"name": 'Gentrace Icon','inputs': {# Any Gentrace-uploaded URL will be rendered in a pretty format in the UI'iconUrl': url,},"expectedOutputs": {"value": "Gentrace logo"}})
When viewing the test cases with Gentrace file URLs, we detect image extensions and render them directly in the UI.
Uploading file content bytes
If your file does not have a presence on the filesystem, you can directly upload the bytes to Gentrace. This approach requires you to name your file.
Uploading file content bytes is only supported in Python at this time.
- Python
python
import gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))with open("examples/files/gentrace-icon.png", "rb") as f:# Remember to specify the file extension! The Gentrace UI relies on the file# extension to render file contents correctly.url = gentrace.upload_bytes("gentrace-icon.png", f.read())print("Uploaded URL: ", url)gentrace.create_test_case("main",{"name": 'Gentrace Icon','inputs': {'imageUrl': url,},"expectedOutputs": {"value": "Gentrace logo"}})
typescript
import {init ,uploadBuffer ,createTestCase } from "@gentrace/core";importfs from "fs/promises";init ({apiKey :process .env .GENTRACE_API_KEY ,});async functionupload () {constbuffer = awaitfs .readFile ("./icon.png");// This SDK method receives a file name + buffer and returns an authenticated URLconsturl = awaituploadBuffer ("icon.png",buffer );constcaseId = awaitcreateTestCase ({pipelineSlug : "main",name : "Gentrace Icon",inputs : {// Any Gentrace-uploaded URL will be rendered in a pretty format in the UIiconUrl :url },expectedOutputs : {value : "Gentrace Icon"},});console .log ("Case ID",caseId );}upload ();
When viewing the test cases with Gentrace file URLs, we detect image extensions and render them directly in the UI.
Retrieving files
All uploaded files require authentication by an API key. When pulling test cases for a Gentrace pipeline, you need to construct an authenticated HTTP request to download the files associated with each case.
Here's a script that:
- Pulls test cases for a Gentrace pipeline
- Downloads the Gentrace-uploaded images from the input URL
- Runs the image data through our AI business logic
- Submits the outputs for grading
- TypeScript
- Python
python
import osfrom urllib.parse import urlparseimport gentraceimport requestsfrom dotenv import load_dotenv# Import your AI pipelinefrom ai.pipelines import image_to_wordsGENTRACE_PIPELINE_SLUG = "main"load_dotenv()gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))cases = gentrace.get_test_cases(pipeline_slug=GENTRACE_PIPELINE_SLUG)outputs = []for case in cases:image_url = case.get("inputs").get("imageUrl")# Image URLs are authenticated. You must provide an API key as the bearer token.headers = {'Authorization': 'Bearer {}'.format(os.getenv("GENTRACE_API_KEY"))}response = requests.get(image_url, headers=headers)# Run the AI pipeline on the raw image contentimage_description = image_to_words(response.content)outputs.append({"value": image_description})response = gentrace.submit_test_result(GENTRACE_PIPELINE_SLUG, cases, outputs)print(response["resultId"])
typescript
import {init ,submitTestResult ,getTestCases } from "@gentrace/core";import {imageToWords } from "../api/pipelines";constGENTRACE_PIPELINE_SLUG = "main";constGENTRACE_API_KEY =process .env .GENTRACE_API_KEY ;init ({apiKey :GENTRACE_API_KEY ,});async functionrunTest () {constcases = awaitgetTestCases (GENTRACE_PIPELINE_SLUG );constoutputs :Record <string, any>[] = [];for (consttestCase ofcases ) {consturl =testCase .inputs .imageUrl ;constresponse = awaitfetch (url , {method : "GET",headers : {Authorization : `Bearer ${GENTRACE_API_KEY }`,},});constblob = awaitresponse .blob ();constwords = awaitimageToWords (blob );outputs .push ({value :words ,});}constresponse = awaitsubmitTestResult (GENTRACE_PIPELINE_SLUG ,cases ,outputs ,);console .log ("Result ID:",response .resultId );}runTest ();
Option 2: Link to files
In order to link to external images or files and have them render in Gentrace, you need to authorize external domains.
Administrators can do this by navigating to security settings, scrolling to "Authorized file URL domains," and pressing "Add domain."
Once you've added a domain, you can place URLs from that domain in your test case inputs. These URLs will render in the UI and be accessible as URLs in your pipeline.