Version: 2.0.0

Result basics

Test results are the reports generated on your generative AI pipeline.

Creating test results

After your code pulls the cases and generates the outputs for each case, you submit the outputs using our SDK. A test result represents a single submission of a group of test cases and their outputs. The result is then evaluated by the evaluators you defined.

TypeScript
Python

typescript
import { init, getTestCases, submitTestResult } from "@gentrace/core";
import { pipeline } from "../src/pipeline"; // TODO: REPLACE WITH YOUR PIPELINE
 
init({ apiKey: process.env.GENTRACE_API_KEY });
 
const PIPELINE_SLUG = "your-pipeline-slug";
 
async function main() {
  const testCases = await getTestCases(PIPELINE_SLUG);
 
  const promises = testCases.map((test) => pipeline(test));
  
  const outputs = await Promise.all(promises);
 
  // This SDK method creates a single result entity on our servers
  const response = await submitTestResult(PIPELINE_SLUG, testCases, outputs);
}
 
main();

python
import gentrace
import os
from dotenv import load_dotenv
from pipeline import pipeline
load_dotenv()
gentrace.init(os.getenv("GENTRACE_API_KEY"))
PIPELINE_SLUG = "your-pipeline-slug"
def main():
    test_cases = gentrace.get_test_cases(PIPELINE_SLUG)
    outputs = [] 
    for test_case in test_cases:
        inputs = test_case["inputs"]
        result = pipeline(inputs)
        outputs.append(result)
    # This SDK method creates a single result entity on our servers
    response = gentrace.submit_test_result(PIPELINE_SLUG, test_cases, outputs)
   
    print("Run ID: ", response["runId"]);
main()

Once the test result has submitted, your evaluators start grading your results. You can view the in-progress and finished results in the Gentrace UI. In the below example, we have several evaluators which score the result.

Result basics

Get test results from the SDK

We expose a few methods to retrieve your submitted test results.

Get test result (single)

TypeScript
Python

typescript
import { getTestResult } from "@gentrace/core";
 
const result = await getTestResult("ede8271a-699f-4db7-a198-2c51a99e2dab");
 
console.log("Created at: ", result.createdAt);

python
import gentrace
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY")
)
result = gentrace.get_test_result(result_id="ede8271a-699f-4db7-a198-2c51a99e2dab")
print("Created at: ", result["createdAt"])

This test result endpoint returns a deeply expanded object that includes the runs, pipeline, evaluations, and evaluators associated with that test result. This entity allows you to perform deeper analysis on a given test result.

Since the returned object is more complex, consult the API response reference for this route here to learn how to traverse the structure. Select the 200 status code button once you reach the page.

Get test results (multiple)

TypeScript
Python

typescript
import { getTestResults } from "@gentrace/core";
 
const results = await getTestResults("guess-the-year");
 
for (const result of results) {
  console.log("Result ID: ", result.id);
}

python
import gentrace
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY")
)
results = gentrace.get_test_results(pipeline_slug="guess-the-year")
for result in results:
    print(result.get("id"))

This test result endpoint returns a list of test result objects. Each test result object is purposefully thin, containing only the test result metadata (e.g. the name, branch, commit information). Use the gentrace.get_test_result() SDK method to pull detailed information for a given test result.

To traverse this structure, consult the API response reference for this route here. Select the 200 status code button once you reach the page.

Setting a main Git branch

To track progress, teams often compare their generative AI performance against a baseline. This baseline is typically the performance of the code on the main Git branch.

Gentrace allows you to make this comparison easier by tagging results with branches and commits. For example, the below image compares the results of a recent test result against the current performance on the main branch.

Result comparison

To configure this feature, firstly define your main branch name [in your pipeline settings] (pipeline/basics#setting-a-comparison-git-main-branch). Then, modify your scripts to accept two environmental variables, $GENTRACE_BRANCH and $GENTRACE_COMMIT. Our SDK will check for the presence of these variables and automatically tag your branches.

TypeScript
Python

typescript
import { init, getTestCases, submitTestResult } from "@gentrace/core";
import { pipeline } from "../src/pipeline"; // TODO: REPLACE WITH YOUR PIPELINE
 
console.log("Branch name: ", process.env.GENTRACE_BRANCH);
console.log("Commit: ", process.env.GENTRACE_COMMIT);
 
init({ apiKey: process.env.GENTRACE_API_KEY });
 
const PIPELINE_SLUG = "your-pipeline-slug";
 
async function main() {
  const testCases = await getTestCases(PIPELINE_SLUG);
 
  const promises = testCases.map((test) => pipeline(test));
  
  const outputs = await Promise.all(promises);
 
  // This SDK method creates a single result entity on our servers. Since
  // $GENTRACE_BRANCH AND $GENTRACE_COMMIT were defined, they will be
  // automatically used to tag test results.
  const response = await submitTestResult(PIPELINE_SLUG, testCases, outputs);
}
 
main();

python
import gentrace
import os
from dotenv import load_dotenv
from pipeline import pipeline
load_dotenv()
print("Branch name: ", os.getenv("GENTRACE_BRANCH"));
print("Commit: ", os.getenv("GENTRACE_COMMIT"));
gentrace.init(os.getenv("GENTRACE_API_KEY"))
PIPELINE_SLUG = "your-pipeline-slug"
def main():
    test_cases = gentrace.get_test_cases(pipeline_slug=PIPELINE_SLUG)
    outputs = [] 
    for test_case in test_cases:
        inputs = test_case["inputs"]
        result = pipeline(inputs)
        outputs.append(result)
    # This SDK method creates a single result entity on our servers. Since
    # $GENTRACE_BRANCH AND $GENTRACE_COMMIT were defined, they will be
    # automatically used to tag test runs.
    response = gentrace.submit_test_result(PIPELINE_SLUG, test_cases, outputs)
   
    print("Result ID: ", response["resultId"]);
main()

Attaching metadata

You can attach metadata to your test results. For example, If you want to associate a URL to an external service or a prompt variant, you can use metadata to tag your result.

TypeScript
Python

typescript
import { init, getTestCases, submitTestResult } from "@gentrace/core";
import { pipeline } from "../src/pipeline"; // TODO: REPLACE WITH YOUR PIPELINE
 
console.log("Branch name: ", process.env.GENTRACE_BRANCH);
console.log("Commit: ", process.env.GENTRACE_COMMIT);
 
init({ apiKey: process.env.GENTRACE_API_KEY });
 
const PIPELINE_SLUG = "your-pipeline-slug";
 
async function main() {
  const testCases = await getTestCases(PIPELINE_SLUG);
 
  const promises = testCases.map((test) => pipeline(test));
  
  const outputs = await Promise.all(promises);
 
  const response = await submitTestResult(
    PIPELINE_SLUG, 
    testCases,
    outputs,
    {
      metadata: {
        // We support multiple metadata keys within the metadata object
        externalServiceUrl: {
          // Every metadata object must have a "type" key
          type: "url",
          url: "https://external-service.example.com",
          text: "External service"
        },
        gitSha: {
          type: "string",
          value: "71548c22b939f3ef4ab3a5f067e1a5746fa352ff"
        }
      }
    }
  );
}
 
main();

python
import gentrace
gentrace.init(os.getenv("GENTRACE_API_KEY"))
PIPELINE_SLUG = "your-pipeline-slug"
def main():
    test_cases = gentrace.get_test_cases(PIPELINE_SLUG)
    outputs = [] 
    for test_case in test_cases:
        inputs = test_case["inputs"]
        result = pipeline(inputs)
        outputs.append(result)
    # This SDK method creates a single result entity on our servers
    response = gentrace.submit_test_result(PIPELINE_SLUG, test_cases, outputs, {
      "metadata": {
        "externalServiceUrl": {
          # Every metadata object must have a "type" key
          "type": "url",
          "url": "https://external-service.example.com",
          "text": "External service"
        },
        "promptVariantId": {
          "type": "string",
          "value": "SGVsbG8sIFdvcmxkIQ=="
        },
      }
    })
    
    print("Result ID: ", response["resultId"])
main()

The value object for each metadata value is a union type. We only support two value types at this time. We will add more upon request.

type: url - if specified as a URL, you must specify url and text fields.

type: string - if specified as a string, you must specify a value field.

Malformed metadata

If your metadata is not properly formatted with the specified type, our API will reject your request.

When viewing the detailed result page, the supplied metadata will render in the sidebar in the detailed result page.

Attaching result metadata

You can create additional metadata keys and adjust the associated values for each result directly in the UI.

Edit result metadata

Edit result metadata in-line

Setting a name

Setting custom names for test results is useful to better identify different test results.

If you are testing between gpt-4 and gpt-3.5-turbo models, you could create test results with the names Experiment: GPT-4 and Experiment: GPT-3.5 Turbo, respectively.

Gentrace allows you to set names through your code or our UI.

SDK

You can use the SDKs to set result names.

TypeScript
Python

typescript
import { init, getTestCases, submitTestResult } from "@gentrace/core";
import { pipeline } from "../src/pipeline"; // TODO: REPLACE WITH YOUR PIPELINE
 
const PIPELINE_SLUG = "your-pipeline-slug";
 
init({
  apiKey: process.env.GENTRACE_API_KEY,
  // Set the run name when initializing the SDK.
  resultName: "try gpt-4"
});
 
// Alternatively, you can set the run name with the GENTRACE_RESULT_NAME env variable. Our 
// library will access this if `resultName` was not passed to the init() function. If both 
// values are supplied, the `resultName` parameter is given precedence.
console.log("Result name: ", process.env.GENTRACE_RESULT_NAME);
 
async function main() {
  const testCases = await getTestCases(PIPELINE_SLUG);
 
  const promises = testCases.map((test) => pipeline(test));
  
  const outputs = await Promise.all(promises);
 
  // Upon submission, the test result will have the provided run name in the Gentrace UI.
  const response = await submitTestResult(PIPELINE_SLUG, testCases, outputs);
}

python
import gentrace
import os
from dotenv import load_dotenv
from pipeline import pipeline
load_dotenv()
print("Branch name: ", os.getenv("GENTRACE_BRANCH"));
print("Commit: ", os.getenv("GENTRACE_COMMIT"));
gentrace.init(
  api_key=os.getenv("GENTRACE_API_KEY"), 
  run_name="try gpt-4"
)
# Alternatively, you can set the run name with the GENTRACE_RUN_NAME env variable. Our
# library will access the variable if `run_name` was not passed to the init() function. If both 
#  values are supplied, the `run_name` parameter is given preference.
console.log("Run name: ", os.getenv("GENTRACE_RUN_NAME"));
PIPELINE_SLUG = "your-pipeline-slug"
def main():
    test_cases = gentrace.get_test_cases(pipeline_slug=PIPELINE_SLUG)
    outputs = [] 
    for test_case in test_cases:
        inputs = test_case["inputs"]
        result = pipeline(inputs)
        outputs.append(result)
    # Upon submission, the test result will have the provided run name in the Gentrace UI.
    response = gentrace.submit_test_result(PIPELINE_SLUG, test_cases, outputs)
   
    print("Result ID: ", response["resultId"]);
main()

UI

Once your test result has been submit, you can also adjust the name by selecting the settings gear on the test result page (in the top right).

Edit result name settings

Edit result name input

Creating test results​

Get test results from the SDK​

Get test result (single)​

Get test results (multiple)​

Setting a main Git branch​

Attaching metadata​

Setting a name​

SDK​

UI​