Manage test data with test cases
Test cases are examples that your generative AI pipeline might encounter. Test cases are uniquely associated with a pipeline. A test case contains:
- A unique name
- Inputs that will be passed to your AI pipeline
- Expected output (optional, depending on the evaluators that you need)
- Expected output steps (optional, also depending on the evaluators that you need)
Schema
This section breaks down the test case schema in more detail.
Name
Simple, human-readable name for your test case.
Inputs
Inputs are the parameters to your AI pipeline, expressed as a JSON string.
Let's say you have a simple AI pipeline (a function with an OpenAI invocation) that composes an email. The function accepts a sender, receiver, and query as string.
- TypeScript
- Python
typescript
import {init } from "@gentrace/core"import {OpenAI } from "@gentrace/openai";init ({apiKey : 'my-gentrace-api-key', // TODO: Add your Gentrace API key})constopenai = newOpenAI ({apiKey : 'my-open-ai-api-key', // TODO: Add your OpenAI API key});export constcompose = async ({sender ,receiver ,query ,}: {sender : string;receiver : string;query : string;}) => {constresponse = awaitopenai .chat .completions .create ({pipelineId : "draft",model : "gpt-3.5-turbo",temperature : 0,messages : [{role : "system",content : `Write a concise and complete email from ${sender } to ${receiver } ${query }.`,},],});return {content :response .choices [0]!.message !.content ,pipelineRunId :response .pipelineRunId !,};};
python
import openaidef compose(sender, receiver, query):prompt = f"Write a concise and complete email from {sender} to {receiver} {query}."chat_completion = openai.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role": "user", "content": prompt}])return chat_completion.choices[0].message.content
The below JSON object input string could represent the parameters to this pipeline.
json
{"query": "bragging about superiority",}
Each key from the inputs should exactly match what your AI pipeline expects. The inputs must be a JSON object. Arrays or primitive types (e.g. number, strings, booleans) are not permitted.
Expected outputs (optional)
This object captures the expected, ideal outputs of your pipeline.
Referring to the code example in the prior section, the expected output would be the ideal chat completion string returned from the function. Here's an example string that could work well as the expected output for the case.
Dear Joker,It has come to our attention that instances of bragging about superiority with respect to theJustice League have been made, and we want to emphasize that such behavior is not condonedor representative of our organization's values of justice, respect, and collaboration.Best,Superman
This would need to be inserted as a JSON structure, eg { "value": "Dear Joker..." }
Managing test cases
In the UI (small datasets)
If you only need to specify a few test cases, you can create them directly from the UI by selecting "New test case".
From CSV
You can also bulk import test cases from a CSV. Navigate to the test cases page and click the Import from CSV
button.
Select the relevant CSV file.
Pay attention to the in-app instructions for dealing with headers.
With API/SDK
We expose a few methods to perform CRUD operations on test cases. This is helpful for creating internal workflows to manage test cases.
Refer to the following SDK methods:
And corresponding API endpoints:
Adding images / other files for multi-modal
If you're evaluating multi-modal pipelines, you can upload images or other files to Gentrace and use them as inputs to your test cases.
Alternatively, you can link to them instead (see Option 2 below).
Option 1: Upload files
If your pipeline depends on file inputs (e.g. images, PDFs), you can upload your files to our object storage. We then return an authenticated URL to add to your test case inputs.
- TypeScript
- Python
python
import osimport gentracefrom dotenv import load_dotenvload_dotenv()gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"), host="http://localhost:3000/api/v1")GENTRACE_PIPELINE_SLUG = "main"with open("/home/user/gentrace-icon.png", "rb") as f:# This SDK method receives a file handle and returns an authenticated URLurl = gentrace.upload_file(f)print("Gentrace file URL: ", url)gentrace.create_test_case(# Pipeline slugGENTRACE_PIPELINE_SLUG,{"name": 'Gentrace Icon','inputs': {# Any Gentrace-uploaded URL will be rendered in a pretty format in the UI'iconUrl': url,},"expectedOutputs": {"value": "Gentrace logo"}})
When viewing the test cases with Gentrace file URLs, we detect image extensions and render them directly in the UI.
Uploading file content bytes
If your file does not have a presence on the filesystem, you can directly upload the bytes to Gentrace. This approach requires you to name your file.
Uploading file content bytes is only supported in Python at this time.
- Python
python
import gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))with open("examples/files/gentrace-icon.png", "rb") as f:# Remember to specify the file extension! The Gentrace UI relies on the file# extension to render file contents correctly.url = gentrace.upload_bytes("gentrace-icon.png", f.read())print("Uploaded URL: ", url)gentrace.create_test_case("main",{"name": 'Gentrace Icon','inputs': {'imageUrl': url,},"expectedOutputs": {"value": "Gentrace logo"}})
typescript
import {init ,uploadBuffer ,createTestCase } from "@gentrace/core";importfs from "fs/promises";init ({apiKey :process .env .GENTRACE_API_KEY ,});async functionupload () {constbuffer = awaitfs .readFile ("./icon.png");// This SDK method receives a file name + buffer and returns an authenticated URLconsturl = awaituploadBuffer ("icon.png",buffer );constcaseId = awaitcreateTestCase ({pipelineSlug : "main",name : "Gentrace Icon",inputs : {// Any Gentrace-uploaded URL will be rendered in a pretty format in the UIiconUrl :url },expectedOutputs : {value : "Gentrace Icon"},});console .log ("Case ID",caseId );}upload ();
When viewing the test cases with Gentrace file URLs, we detect image extensions and render them directly in the UI.
Retrieving files
All uploaded files require authentication by an API key. When pulling test cases for a Gentrace pipeline, you need to construct an authenticated HTTP request to download the files associated with each case.
Here's a script that:
- Pulls test cases for a Gentrace pipeline
- Downloads the Gentrace-uploaded images from the input URL
- Runs the image data through our AI business logic
- Submits the outputs for grading
- TypeScript
- Python
python
import osfrom urllib.parse import urlparseimport gentraceimport requestsfrom dotenv import load_dotenv# Import your AI pipelinefrom ai.pipelines import image_to_wordsGENTRACE_PIPELINE_SLUG = "main"load_dotenv()gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))cases = gentrace.get_test_cases(pipeline_slug=GENTRACE_PIPELINE_SLUG)outputs = []for case in cases:image_url = case.get("inputs").get("imageUrl")# Image URLs are authenticated. You must provide an API key as the bearer token.headers = {'Authorization': 'Bearer {}'.format(os.getenv("GENTRACE_API_KEY"))}response = requests.get(image_url, headers=headers)# Run the AI pipeline on the raw image contentimage_description = image_to_words(response.content)outputs.append({"value": image_description})response = gentrace.submit_test_result(GENTRACE_PIPELINE_SLUG, cases, outputs)print(response["resultId"])
typescript
import {init ,submitTestResult ,getTestCases } from "@gentrace/core";import {imageToWords } from "../api/pipelines";constGENTRACE_PIPELINE_SLUG = "main";constGENTRACE_API_KEY =process .env .GENTRACE_API_KEY ;init ({apiKey :GENTRACE_API_KEY ,});async functionrunTest () {constcases = awaitgetTestCases (GENTRACE_PIPELINE_SLUG );constoutputs :Record <string, any>[] = [];for (consttestCase ofcases ) {consturl =testCase .inputs .imageUrl ;constresponse = awaitfetch (url , {method : "GET",headers : {Authorization : `Bearer ${GENTRACE_API_KEY }`,},});constblob = awaitresponse .blob ();constwords = awaitimageToWords (blob );outputs .push ({value :words ,});}constresponse = awaitsubmitTestResult (GENTRACE_PIPELINE_SLUG ,cases ,outputs ,);console .log ("Result ID:",response .resultId );}runTest ();
Option 2: Link to files
In order to link to external images or files and have them render in Gentrace, you need to authorize external domains.
Administrators can do this by navigating to security settings, scrolling to "Authorized file URL domains," and pressing "Add domain."
Once you've added a domain, you can place URLs from that domain in your test case inputs. These URLs will render in the UI and be accessible as URLs in your pipeline.