Skip to main content
Version: 4.5.0


Processors are designed:

  • To transform steps from tracing and/or outputs before evaluation
  • To selectively run evaluators

They are basic JavaScript functions that transform steps and outputs into a single processed object for evaluators.

Use processors to cleanup messy output data

Let's say we have a basic AI feature that composes an email with OpenAI, as shown below.

const emailDraftResponse = await{
model: "gpt-3.5-turbo",
temperature: 0.8,
messages: [
role: "system",
content: `Write an email on behalf of ${sender} to ${receiver}: ${query}`,

When Gentrace receives this information, the data will have this clunky structure.

"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1691678980,
"model": "gpt-3.5-turbo-0613",
"choices": [
"index": 0,
"message": {
"role": "assistant",
"content": "<Email Content>"
"finish_reason": "stop"
"usage": {
"prompt_tokens": 531,
"completion_tokens": 256,
"total_tokens": 787

The output contains mostly unnecessary information. We only care about the email draft content nested at choices[0].message.content. Ideally, we would pre-compute this information and store it in a way that's easy for our evaluator to access.

Create the processor

To create a processor, navigate to the new evaluator creation flow for your desired pipeline.

Creating processor

Press the add button under the processor section to open the processor creation modal. Then, define your transformation as a JavaScript function. The function will be passed:

  • outputs object which contains the final raw output from the pipeline
  • steps array which contains the full list of intermediate steps taken by the pipeline.
  • inputs object which contains the inputs to the pipeline
  • expectedOutputs object which contains the expected outputs to the pipeline

For this example, we created a simple transformation to access the message content and store it on the emailDraft key on the object.

Test and create processor

Simple processor function

Once you're done writing the function, test that it works correctly on the existing pipeline data (using the data dropdown) and create the processor.

Use processed data in evaluators

All processed data returned by the function is available to evaluators under the processed key.

Processed values in AI evaluator

Processed values in heuristic evaluator

Process interim steps


Each step must be captured using our tracing integration to be available to processors.

Let's imagine we are building a feature that drafts emails in two OpenAI calls:

  1. Drafts an initial email draft
  2. Simplifies the email draft

Create the evaluator

Let's create an evaluator that compares the initial draft email and simplified email on an evaluation rubric.

We will use an AI evaluator to compare the steps.

Define a processor

In the above image, we first use a basic processor called "Extract Steps" to transform the two OpenAI completions from the pipeline to variables processed.initialDraft and processed.simplification that can be interpolated into the AI model-graded evaluation.

function process({ outputs, steps }) {
const processedOutputs = {
// Convert the output JSON object returned by OpenAI to a string value
// of only the completion.
initialDraft: steps[0].outputs.choices[0].message.content,
simplification: steps[1].outputs.choices[0].message.content,
return processedOutputs;

The returned value from the processor is made available to the AI evaluator as the processed object.

Processor runs in Gentrace

Note that the processor runs within Gentrace, not within your own code. Here's how the processor function looks within our UI.

In-context processor within Gentrace

This means that you can use our Python or TypeScript SDKs to write your evaluation code, but the processor must be written in JavaScript.

With this specific processor, two keys will be made available to the evaluator: processed.initialDraft (the initial draft as a string) and processed.simplification (the simplified draft as a string).

Define a prompt

We then use the two keys (processed.initialDraft and processed.simplification) to interpolate into the following prompt.

You are comparing two emails. Here is the data:
[Task]: Are these emails semantically similar?
[First Email]: {{ processed.initialDraft }}
[Second Email]: {{ processed.simplification }}
Select between one of the two options below:
(A) The first email is essentially semantically identical to the second email
(B) The first email is fundamentally semantically different from the second email

Test the evaluator

To test that your evaluator is working correctly, you can select a previously observed pipeline output and then press "Evaluate". You will see how the evaluator response in the far right pane.

Once you're done testing, you can press finish to create the evaluator.

Use processors to selectively run evaluators

Let's say we have a pipeline where test case inputs are written either in Spanish or English. We want to run:

  • A Spanish evaluator on Spanish inputs
  • An English evaluator on English inputs.

Each test case has a language field that specifies the language input. We can then use a processor to selectively run evaluators based on the language.


"language": "es",
"query": "Hola, ¿cómo estás?"

Defining the processor

Create a processor using the methods described earlier. Then, define a Spanish-language processor that returns false if the input is not in Spanish.

function process({ inputs, outputs, steps }) {
return inputs.language === 'es'

Filtered evaluators will not be run if the processor returns false.