Version: 4.7.66

Test results - Run test

TypeScript
Python

The runTest() function creates a test result and simplifies the operation by pulling test cases and submitting the test result in a single function.

Alternatively, you can use the runTestWithDataset() function if you want to run a test with a specific dataset. This function allows you to specify a dataset ID along with the pipeline slug.

The run_test() function also creates a test result but simplifies the operation by pulling test cases and submitting the test result in a single function. You can also specify a dataset ID along with the pipeline slug.

As part of this process, you specify a PipelineRun class instance that captures intermediate generative steps that will be associated with the test result.

Learn more about how to use this function in a guided way in the tracing docs.

Example

typescript
python

typescript
import { init, runTest, Pipeline } from "@gentrace/core";
import { generateAiResponse } from "../pipelines";
 
const PIPELINE_SLUG = "my-pipeline";
 
init({
  apiKey: process.env.GENTRACE_API_KEY,
});
 
const pipeline = new Pipeline({
  slug: PIPELINE_SLUG,
})
 
await runTest(
  PIPELINE_SLUG,
  async (testCase) => {
    const runner = pipeline.start();
 
    const outputs = await runner.measure(
      (inputs) => {
        return {
          example: generateAiResponse(inputs),
        };
      },
      [testCase.inputs],
    );
 
    await runner.submit();
 
    // 🚧 Passing the runner back from this function is very important
    return [outputs, runner];
  },
);

You can also run a test for a specific dataset using the runTestWithDataset() function. This allows you to execute tests on a predefined set of test cases from a particular dataset.

typescript
const result = await runTestWithDataset(
  PIPELINE_SLUG,
  DATASET_ID,
  async (testCase) => {
    const runner = pipeline.start();
 
    const outputs = await runner.measure(
      (inputs) => {
        return {
          example: generateAiResponse(inputs),
        };
      },
      [testCase.inputs],
    );
 
    await runner.submit();
 
    return [outputs, runner];
  },
);
 
console.log("Test result:", result);

python
import os
import gentrace
from pipelines import generate_ai_response # TODO import your pipeline
from dotenv import load_dotenv
load_dotenv()
PIPELINE_SLUG = "my-pipeline"
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY"),
)
pipeline = gentrace.Pipeline(PIPELINE_SLUG)
def generate(test_case):
    runner = pipeline.start()
    
    output = runner.measure(
        lambda inputs: generate_ai_response(inputs),
        inputs=test_case.get("inputs")
    )
   
    # 🚧 Passing the runner back from this function is very important 
    return [output, runner]
result = gentrace.run_test(PIPELINE_SLUG, generate)
print("Result: ", result)

To run a test against a specific dataset, you can supply the dataset_id keyword parameter:

python
import os
import gentrace
from pipelines import generate_ai_response # TODO import your pipeline
from dotenv import load_dotenv
load_dotenv()
PIPELINE_SLUG = "my-pipeline"
DATASET_ID = "your-dataset-id"  # Replace with your actual dataset ID
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY"),
)
pipeline = gentrace.Pipeline(PIPELINE_SLUG)
def generate(test_case):
    runner = pipeline.start()
    
    output = runner.measure(
        lambda inputs: generate_ai_response(inputs),
        inputs=test_case.get("inputs")
    )
   
    # 🚧 Passing the runner back from this function is very important 
    return [output, runner]
result = gentrace.run_test(PIPELINE_SLUG, generate, dataset_id=DATASET_ID)
print("Result: ", result)

Arguments

typescript
python

pipelineSlug: string

The slug of the pipeline to run the test for.

datasetId: string

The ID of the dataset to run the test against.

info

This parameter is only applicable when using the runTestWithDataset() method.

pipeline_slug: str

The slug of the pipeline to run the test for.

dataset_id: str

Optional. The ID of the dataset to run the test against.

typescript
python

testFunction: (testCase: TestCase) => Promise<[any, PipelineRun]>

test_function: Callable[[dict], Tuple[dict, PipelineRun]]

This function accepts a test case as a parameter. The return value of this function is an array where the first element is the output of the test case and the second element is the PipelineRun class instance that captures intermediate generative steps.

typescript
python

typescript
await runTest(PIPELINE_SLUG, 
  async (testCase) => {
    const runner = pipeline.start();
  
    const outputs = await runner.measure(
      (inputs) => {
        return {
          example: generateAiResponse(inputs),
        };
      },
      [testCase.inputs],
    );
  
    await runner.submit();
  
    // 🚧 Passing the runner back from this function is very important
    return [outputs, runner];
  }
);

python
import os
import gentrace
PIPELINE_SLUG = "guess-the-year"
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY"),
)
pipeline = gentrace.Pipeline(
    PIPELINE_SLUG,
    openai_config={
        "api_key": os.getenv("OPENAI_KEY"),
    },
)
def create_embedding_callback(test_case):
    runner = pipeline.start()
    openai_handle = runner.get_openai()
    output = openai_handle.embeddings.create(
        input="Standard value: 42", model="text-similarity-davinci-001"
    )
    return [output, runner]
result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback)
print("Result: ", result)

typescript
python

context?: { name: String, metadata: MetadataValueObject }

typescript
await runTest(
  PIPELINE_SLUG,
  async (testCase) => {
    const runner = pipeline.start();
 
    const outputs = await runner.measure(
      (inputs) => {
        console.log("inputs", inputs);
        // Simply return inputs as outputs
        return {
          example:
            "<h1>Example</h1><div>This is an <strong>example</strong></div>",
        };
      },
      [testCase.inputs],
      {
        context: {
          render: {
            type: "html",
            key: "example",
          },
        },
      },
    );
 
    await runner.submit();
 
    return [outputs, runner];
  },
  {
    name: "Rendering HTML",
    metadata: {
      promptString: {
        type: "string",
        value: "What is the basic unit of life?",
      },
    }
  }
);

context?: { "metadata": MetadataValueObject }

python
result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback, context={
    "metadata": {
        "promptString": {
            "type": "string",
            "value": "What is the basic unit of life?"
        }
    }
})
print("Result: ", result)

result_name?: str

python
result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback, result_name="Version with embedding created")
print("Result: ", result)

typescript
python

caseFilter: (testCase: TestCase) => boolean

Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.

typescript
await runTest(
  PIPELINE_SLUG,
  async (testCase) => {
    const runner = pipeline.start();
 
    const outputs = await runner.measure(
      (inputs) => {
        return {
          yourOutputKey: "Your output value",
        }
      },
      [testCase.inputs],
    );
 
    await runner.submit();
 
    return [outputs, runner];
  },
  (testCase) => testCase.name.startsWith("Production test case:")
);

case_filter: Callable[[dict], bool]

Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.

python
import os
import gentrace
PIPELINE_SLUG = "guess-the-year"
gentrace.init(
    api_key=os.getenv("GENTRACE_API_KEY"),
)
pipeline = gentrace.Pipeline(
    PIPELINE_SLUG,
    openai_config={
        "api_key": os.getenv("OPENAI_KEY"),
    },
)
def create_embedding_callback(test_case):
    runner = pipeline.start()
    openai_handle = runner.get_openai()
    output = openai_handle.embeddings.create(
        input="sample text", model="text-similarity-davinci-001"
    )
    return [output, runner]
result = gentrace.run_test(
    PIPELINE_SLUG,
    create_embedding_callback,
    case_filter=lambda x: x.get("name").startswith("Production test case:")
)
print("Result: ", result)

Return value

This endpoint returns a simple object with the test result ID as a UUID string. Here's an example response structure.

json
{
  "resultId": "FACB6642-4725-4FAE-9323-634E72533C89"
}

You can then use this ID to retrieve the test result using the getTestResult() function or check the status with the getTestResultStatus() function.

Types

🛠️ MetadataValueObject

type: string

{ [key: string]: any }

Example​

Arguments​

Return value​

Types​

Example

Arguments

Return value

Types