Test runners - Get and Submit
- TypeScript
- Python
The getTestRunners()
and submitTestRunners()
functions allow for more flexible use of test runs, which are instances of the PipelineRun
class.
The get_test_runners()
and submit_test_runners()
functions allow for more flexible use of test runs, which are instances of the PipelineRun
class.
When used together, these functions allow for a parallelized version of Run test to be implemented.
Parallelization Example
typescript
import {init ,Pipeline ,PipelineRunTestCaseTuple ,getTestRunners ,submitTestRunners ,} from '@gentrace/core';functionexampleResponse (inputs : any) {return 'This is a generated response from the AI';}// utility function to enable parallelismexport constenableParallelism = async <T ,U >(items :T [],callbackFn : (t :T ) =>Promise <U >,{parallelThreads = 10 }: {parallelThreads ?: number } = {},) => {constresults =Array <U >(items .length );constiterator =items .entries ();constdoAction = async (iterator :IterableIterator <[number,T ]>) => {for (const [index ,item ] ofiterator ) {results [index ] = awaitcallbackFn (item );}};constworkers =Array (parallelThreads ).fill (iterator ).map (doAction );awaitPromise .allSettled (workers );returnresults ;};async functionmain () {init ({apiKey :process .env .GENTRACE_API_KEY ?? '',});constPIPELINE_SLUG = 'guess-the-year';// get the existing pipeline (if already exists)constpipelineBySlug = newPipeline ({slug :PIPELINE_SLUG ,});// example pipeline by IDconstpipelineById = newPipeline ({id : 'c10408c7-abde-5c19-b339-e8b1087c9b64',});constpipeline =pipelineBySlug ;constexampleHandler = async ([runner ,testCase ,]:PipelineRunTestCaseTuple ) => {awaitrunner .measure ((inputs : any) => {return {example :exampleResponse (inputs ),};},[testCase .inputs ],);};constpipelineRunTestCases = awaitgetTestRunners (pipeline );awaitenableParallelism (pipelineRunTestCases ,exampleHandler , {parallelThreads : 5,});constresponse = awaitsubmitTestRunners (pipeline ,pipelineRunTestCases );console .log (response );}main ();
python
import osimport gentracefrom multiprocessing.pool import ThreadPoolfrom dotenv import load_dotenvload_dotenv()def example_response(inputs):return "This is a generated response from the AI";def enable_parallelism(function, inputs, parallel_threads):# modify this parallelization approach as neededpool = ThreadPool(processes=parallel_threads)outputs = pool.map(function, inputs)return outputsgentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))# example existing pipelinesPIPELINE_SLUG = "guess-the-year"pipeline_by_slug = gentrace.Pipeline(PIPELINE_SLUG)PIPELINE_ID = "c10408c7-abde-5c19-b339-e8b1087c9b64"pipeline_by_id = gentrace.Pipeline(id=PIPELINE_ID)pipeline = pipeline_by_slugdef example_handler(pipeline_run_test_case):(runner, test_case) = pipeline_run_test_caserunner.measure(lambda inputs: example_response(inputs),inputs=test_case.get("inputs"))pipeline_run_test_cases = gentrace.get_test_runners(pipeline)enable_parallelism(example_handler, pipeline_run_test_cases, 4)result = gentrace.submit_test_runners(pipeline, pipeline_run_test_cases)print("Result: ", result)
getTestRunners(): Arguments
pipeline:
Pipeline
class
datasetId?: string
Optional ID of the dataset to use. If not specified, the golden dataset for the pipeline will be used.
typescript
const datasetId = '550e8400-e29b-41d4-a716-446655440000';const pipelineRunTestCases = await getTestRunners(pipeline, datasetId);
caseFilter?: (testCase: TestCase) => boolean
Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.
get_test_runners(): Arguments
pipeline:
Pipeline
class
dataset_id: str
Optional ID of the dataset to use. If not specified, the golden dataset for the pipeline will be used.
python
dataset_id = "550e8400-e29b-41d4-a716-446655440000"pipeline_run_test_cases = gentrace.get_test_runners(pipeline, dataset_id=dataset_id)
case_filter: Callable[[dict], bool]
Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.
submitTestRunners(): Arguments
pipeline:
Pipeline
class
pipelineRunTestCases: Array<PipelineRunTestCaseTuple>
options?: { context?: { name: string, metadata: MetadataValueObject }, caseFilter?: (testCase: TestCase) => boolean }
An optional object containing:
context?
: An object with metadata for the test run.name
: A string to name the test run.metadata
: An object with additional metadata for the test run.
caseFilter?
: A function to filter test cases.triggerRemoteEvals?
: A boolean to trigger evaluations hosted by Gentrace (defaults totrue
). Setting this tofalse
is common when creating local evaluations
Example usage:
typescript
const options = {context: {name: "example-name",metadata: {model: {type: 'string',value: 'gpt-4'}}},caseFilter: (testCase) => testCase.name.startsWith('priority-'),// If you're opting to use local evaluationstriggerRemoteEvals: false};const result = await submitTestRunners(pipeline, pipelineRunTestCases, options);
submit_test_runners(): Arguments
pipeline:
Pipeline
class
pipelineRunTestCases: List[Tuple[PipelineRun, TestCase]]
context: { "metadata": MetadataValueObject }
Optional
case_filter: Callable[[dict], bool]
Optional
result_name: str
Optional
Returns a simple object with the test result ID as a UUID string. Here's an example response structure.
json
{"resultId": "FACB6642-4725-4FAE-9323-634E72533C89"}
You can then use this ID to retrieve the test result using the getTestResult()
function or check the status with the getTestResultStatus()
function.