Test results - Run test
- TypeScript
- Python
The runTest()
function creates a test result and simplifies the operation by pulling test cases and submitting
the test result in a single function.
Alternatively, you can use the runTestWithDataset()
function if you want to run a test with a specific dataset. This function allows you to specify a dataset ID along with the pipeline slug.
The run_test()
function also creates a test result but simplifies the operation by pulling test cases and submitting
the test result in a single function. You can also specify a dataset ID along with the pipeline slug.
As part of this process, you specify a PipelineRun
class instance that captures intermediate generative
steps that will be associated with the test result.
Learn more about how to use this function in a guided way in the tracing docs.
Example
typescript
import {init ,runTest ,Pipeline } from "@gentrace/core";import {generateAiResponse } from "../pipelines";constPIPELINE_SLUG = "my-pipeline";init ({apiKey :process .env .GENTRACE_API_KEY ,});constpipeline = newPipeline ({slug :PIPELINE_SLUG ,})awaitrunTest (PIPELINE_SLUG ,async (testCase ) => {construnner =pipeline .start ();constoutputs = awaitrunner .measure ((inputs ) => {return {example :generateAiResponse (inputs ),};},[testCase .inputs ],);awaitrunner .submit ();// 🚧 Passing the runner back from this function is very importantreturn [outputs ,runner ];},);
You can also run a test for a specific dataset using the runTestWithDataset()
function. This allows you to execute tests on a predefined set of test cases from a particular dataset.
typescript
constresult = awaitrunTestWithDataset (PIPELINE_SLUG ,DATASET_ID ,async (testCase ) => {construnner =pipeline .start ();constoutputs = awaitrunner .measure ((inputs ) => {return {example :generateAiResponse (inputs ),};},[testCase .inputs ],);awaitrunner .submit ();return [outputs ,runner ];},);console .log ("Test result:",result );
python
import osimport gentracefrom pipelines import generate_ai_response # TODO import your pipelinefrom dotenv import load_dotenvload_dotenv()PIPELINE_SLUG = "my-pipeline"gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)pipeline = gentrace.Pipeline(PIPELINE_SLUG)def generate(test_case):runner = pipeline.start()output = runner.measure(lambda inputs: generate_ai_response(inputs),inputs=test_case.get("inputs"))# 🚧 Passing the runner back from this function is very importantreturn [output, runner]result = gentrace.run_test(PIPELINE_SLUG, generate)print("Result: ", result)
To run a test against a specific dataset, you can supply the dataset_id
keyword parameter:
python
import osimport gentracefrom pipelines import generate_ai_response # TODO import your pipelinefrom dotenv import load_dotenvload_dotenv()PIPELINE_SLUG = "my-pipeline"DATASET_ID = "your-dataset-id" # Replace with your actual dataset IDgentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)pipeline = gentrace.Pipeline(PIPELINE_SLUG)def generate(test_case):runner = pipeline.start()output = runner.measure(lambda inputs: generate_ai_response(inputs),inputs=test_case.get("inputs"))# 🚧 Passing the runner back from this function is very importantreturn [output, runner]result = gentrace.run_test(PIPELINE_SLUG, generate, dataset_id=DATASET_ID)print("Result: ", result)
Arguments
pipelineSlug: string
The slug of the pipeline to run the test for.
datasetId: string
The ID of the dataset to run the test against.
This parameter is only applicable when using the runTestWithDataset()
method.
pipeline_slug: str
The slug of the pipeline to run the test for.
dataset_id: str
Optional. The ID of the dataset to run the test against.
testFunction: (testCase: TestCase) => Promise<[any, PipelineRun]>
test_function: Callable[[dict], Tuple[dict, PipelineRun]]
This function accepts a test case as a parameter. The return value of this function is an array where the first element is the
output of the test case and the second element is the PipelineRun
class instance that captures
intermediate generative steps.
typescript
awaitrunTest (PIPELINE_SLUG ,async (testCase ) => {construnner =pipeline .start ();constoutputs = awaitrunner .measure ((inputs ) => {return {example :generateAiResponse (inputs ),};},[testCase .inputs ],);awaitrunner .submit ();// 🚧 Passing the runner back from this function is very importantreturn [outputs ,runner ];});
python
import osimport gentracePIPELINE_SLUG = "guess-the-year"gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)pipeline = gentrace.Pipeline(PIPELINE_SLUG,openai_config={"api_key": os.getenv("OPENAI_KEY"),},)def create_embedding_callback(test_case):runner = pipeline.start()openai_handle = runner.get_openai()output = openai_handle.embeddings.create(input="Standard value: 42", model="text-similarity-davinci-001")return [output, runner]result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback)print("Result: ", result)
context?: { name: String, metadata: MetadataValueObject }
typescript
awaitrunTest (PIPELINE_SLUG ,async (testCase ) => {construnner =pipeline .start ();constoutputs = awaitrunner .measure ((inputs ) => {console .log ("inputs",inputs );// Simply return inputs as outputsreturn {example :"<h1>Example</h1><div>This is an <strong>example</strong></div>",};},[testCase .inputs ],{context : {render : {type : "html",key : "example",},},},);awaitrunner .submit ();return [outputs ,runner ];},{name : "Rendering HTML",metadata : {promptString : {type : "string",value : "What is the basic unit of life?",},}});
context?: { "metadata": MetadataValueObject }
python
result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback, context={"metadata": {"promptString": {"type": "string","value": "What is the basic unit of life?"}}})print("Result: ", result)
result_name?: str
python
result = gentrace.run_test(PIPELINE_SLUG, create_embedding_callback, result_name="Version with embedding created")print("Result: ", result)
caseFilter: (testCase: TestCase) => boolean
Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.
typescript
awaitrunTest (PIPELINE_SLUG ,async (testCase ) => {construnner =pipeline .start ();constoutputs = awaitrunner .measure ((inputs ) => {return {yourOutputKey : "Your output value",}},[testCase .inputs ],);awaitrunner .submit ();return [outputs ,runner ];},(testCase ) =>testCase .name .startsWith ("Production test case:"));
case_filter: Callable[[dict], bool]
Optional filter function that is called for each test case. For example, you can define a function to only run test cases that have a certain name prefix.
python
import osimport gentracePIPELINE_SLUG = "guess-the-year"gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)pipeline = gentrace.Pipeline(PIPELINE_SLUG,openai_config={"api_key": os.getenv("OPENAI_KEY"),},)def create_embedding_callback(test_case):runner = pipeline.start()openai_handle = runner.get_openai()output = openai_handle.embeddings.create(input="sample text", model="text-similarity-davinci-001")return [output, runner]result = gentrace.run_test(PIPELINE_SLUG,create_embedding_callback,case_filter=lambda x: x.get("name").startswith("Production test case:"))print("Result: ", result)
Return value
This endpoint returns a simple object with the test result ID as a UUID string. Here's an example response structure.
json
{"resultId": "FACB6642-4725-4FAE-9323-634E72533C89"}
You can then use this ID to retrieve the test result using the getTestResult()
function or check the status
with the getTestResultStatus()
function.
Types
🛠️ MetadataValueObject
type: string
{ [key: string]: any }