Skip to main content
Version: 4.7.23

Test runners - Get and Submit

The getTestRunners() and submitTestRunners() functions allow for more flexible use of test runs, which are instances of the PipelineRun class.

When used together, these functions allow for a parallelized version of Run test to be implemented.

Parallelization Example

typescript
import {
init,
Pipeline,
PipelineRunTestCaseTuple,
getTestRunners,
submitTestRunners,
} from '@gentrace/core';
 
function exampleResponse(inputs: any) {
return 'This is a generated response from the AI';
}
 
// utility function to enable parallelism
export const enableParallelism = async <T, U>(
items: T[],
callbackFn: (t: T) => Promise<U>,
{ parallelThreads = 10 }: { parallelThreads?: number } = {},
) => {
const results = Array<U>(items.length);
 
const iterator = items.entries();
const doAction = async (iterator: IterableIterator<[number, T]>) => {
for (const [index, item] of iterator) {
results[index] = await callbackFn(item);
}
};
const workers = Array(parallelThreads).fill(iterator).map(doAction);
await Promise.allSettled(workers);
return results;
};
 
async function main() {
init({
apiKey: process.env.GENTRACE_API_KEY ?? '',
});
 
const PIPELINE_SLUG = 'guess-the-year';
 
// get the existing pipeline (if already exists)
const pipelineBySlug = new Pipeline({
slug: PIPELINE_SLUG,
});
 
// example pipeline by ID
const pipelineById = new Pipeline({
id: 'c10408c7-abde-5c19-b339-e8b1087c9b64',
});
 
const pipeline = pipelineBySlug;
 
const exampleHandler = async ([
runner,
testCase,
]: PipelineRunTestCaseTuple) => {
await runner.measure(
(inputs: any) => {
return {
example: exampleResponse(inputs),
};
},
[testCase.inputs],
);
};
 
const pipelineRunTestCases = await getTestRunners(pipeline);
 
await enableParallelism(pipelineRunTestCases, exampleHandler, {
parallelThreads: 5,
});
 
const response = await submitTestRunners(pipeline, pipelineRunTestCases);
console.log(response);
}
 
main();

getTestRunners(): Arguments

pipeline: Pipeline class

datasetId?: string

Optional ID of the dataset to use. If not specified, the golden dataset for the pipeline will be used.

typescript
const datasetId = '550e8400-e29b-41d4-a716-446655440000';
const pipelineRunTestCases = await getTestRunners(pipeline, datasetId);

caseFilter?: (testCase: TestCase) => boolean

Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.

getTestRunners(): Return value

Returns an array of PipelineRunTestCaseTuples.

Each item in this array is a tuple of [PipelineRun, TestCase].

submitTestRunners(): Arguments

pipeline: Pipeline class

pipelineRunTestCases: Array<PipelineRunTestCaseTuple>

options?: { context?: { name: string, metadata: MetadataValueObject }, caseFilter?: (testCase: TestCase) => boolean } An optional object containing:

  • context?: An object with metadata for the test run.
    • name: A string to name the test run.
    • metadata: An object with additional metadata for the test run.
  • caseFilter?: A function to filter test cases.
  • triggerRemoteEvals?: A boolean to trigger evaluations hosted by Gentrace (defaults to true). Setting this to false is common when creating local evaluations

Example usage:

typescript
const options = {
context: {
name: "example-name",
metadata: {
model: {
type: 'string',
value: 'gpt-4'
}
}
},
caseFilter: (testCase) => testCase.name.startsWith('priority-'),
// If you're opting to use local evaluations
triggerRemoteEvals: false
};
const result = await submitTestRunners(pipeline, pipelineRunTestCases, options);

submitTestRunners(): Return value

Returns a simple object with the test result ID as a UUID string. Here's an example response structure.

json
{
"resultId": "FACB6642-4725-4FAE-9323-634E72533C89"
}

You can then use this ID to retrieve the test result using the getTestResult() function or check the status with the getTestResultStatus() function.