Version: 4.7.43

Processors

Processors are designed:

To transform steps from tracing and/or outputs before evaluation
To selectively run evaluators

They are functions that transform steps and outputs into a single processed object for evaluators. These functions can be written in either JavaScript or Python.

How to use processors to clean up output data

Let's say we have a basic AI feature that composes an email with OpenAI, as shown below.

JavaScript
Python

javascript
const emailDraftResponse = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  temperature: 0.8,
  messages: [
    {
      role: "system",
      content: `Write an email on behalf of ${sender} to ${receiver}: ${query}`,
    },
  ],
});

python
def compose(sender, receiver, query):
    email_draft_response = openai.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": f"Write an email on behalf of {sender} to {receiver}: {query}"
            },
        ],
        model="gpt-3.5-turbo"
    )

When Gentrace receives this information, the data will have this clunky structure.

json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1691678980,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<Email Content>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 531,
    "completion_tokens": 256,
    "total_tokens": 787
  }
}

The output contains mostly unnecessary information. We only care about the email draft content nested at choices[0].message.content. Ideally, we would pre-compute this information and store it in a way that's easy for our evaluator to access.

Create the processor

A processor can be added to a new or existing evaluator.

Let's create a new evaluator for checking grammar. We click on the "Evaluators" section of a pipeline, click the "New evaluator" button, and then choose the "Grammar (CoT + Classification)" template to customize.

Customizing evaluator

The "Customize" page for the new evaluator includes a "Processor" section. Press the + button in this section to add a new processor.

Adding new processor

The "Edit processor" popup that appears allow you to define the processor as either a JavaScript or Python function.

Editing processor function

The function will be passed:

outputs object which contains the final raw output from the pipeline
inputs object which contains the inputs to the pipeline
expectedOutputs object which contains the expected outputs to the pipeline
steps array which contains the full list of intermediate steps taken by the pipeline.

For this example, we created a simple transformation to access the message content and store it on the emailDraft key on the object.

JavaScript
Python

javascript
/**
 * @param {Object} attributes - The attributes that are available during this evaluation.
 * @param {Object} attributes.outputs - Output object from the generation that needs to be evaluated.
 * @param {Object} attributes.inputs - An object of all inputs used in the test case.
 * @param {Object} attributes.expectedOutputs - The expected results.
 * @param {List} attributes.steps - A list of all steps generated by this pipeline.
 * @returns {Object | boolean} - Returning false will cause the evaluator to skip this result.
 */
function process({ inputs, expectedOutputs, outputs, steps }) { 
  const processedOutputs = { 
    emailDraft: outputs.choices[0].message.content
  }
  return processedOutputs;
}

python
def process(attributes):
    """
    attributes - dict of attributes that are available during this evaluation.
    attributes['outputs'] - Output dict from the generation that needs to be evaluated.
    attributes['inputs'] - A dict of all inputs used in the test case.
    attributes['expectedOutputs'] - The expected results.
    attributes['steps'] - A list of all steps generated by this pipeline.
    """
    processed_outputs = {
      'emailDraft': attributes['outputs']['choices'][0]['message']['content']
    }
    return processed_outputs

Once you're done writing the function, test that it works correctly on the existing pipeline data (using the "Test case" dropdown and the "Process" button), and save the processor.

Use processed data in evaluators

All processed data returned by the function is available to evaluators under the processed key.

Here's an example of how processed.emailDraft can be used within the "Prompt" of an AI evaluator.

Processed values in AI evaluator

The example processed.emailDraft can also be used within a heuristic evaluator as part of an evaluate function.

JavaScript
Python

javascript
/**
 * Evaluates the given object based on certain conditions.
 *
 * @param {Object} attributes - The attributes that are available during this evaluation.
 * @param {Object} attributes.outputs - Output object from the generation that needs to be evaluated.
 * @param {Object} attributes.inputs - An object of all inputs used in the test case.
 * @param {Object} attributes.expectedOutputs - The expected results.
 * @param {Object} attributes.processed - Output object of the processor if one exists.
 * @param {List} attributes.steps - A list of all steps generated by this pipeline.
 * @returns {string} - Returns the enum value that you specify in the evaluator options above.
 */
function evaluate({ outputs, inputs, expectedOutputs, processed, steps }) {
  if (processed.emailDraft.includes("Subject:")) {
    return "A";
  }
  return "B";
}

python
def evaluate(attributes):
    """
    Evaluates the given object based on certain conditions.
    attributes - dict of attributes that are available during this evaluation.
    attributes['outputs'] - Output dict from the generation that needs to be evaluated.
    attributes['inputs'] - A dict of all inputs used in the test case.
    attributes['expectedOutputs'] - The expected results.
    attributes['processed'] - Output dict of the processor if one exists.
    attributes['steps'] - A list of all steps generated by this pipeline.
    """
    if "Subject:" in attributes['processed']['emailDraft']:
      return "A";
    return "B";

More information on heuristic functions and how they can be defined and tested can be found on the evaluators page.

Process interim steps

info

Each step must be captured using our tracing integration to be available to processors.

Let's imagine we are building a feature that drafts emails in two OpenAI calls:

Drafts an initial email draft
Simplifies the email draft

Create the evaluator

Let's create an evaluator that compares the initial draft email and simplified email on an evaluation rubric.

We will use an AI evaluator to compare the steps.

Define a processor

In the above image, we first use a basic processor called "Extract Steps" to transform the two OpenAI completions from the pipeline to variables processed.initialDraft and processed.simplification that can be interpolated into the AI model-graded evaluation.

JavaScript
Python

javascript
function process({ inputs, expectedOutputs, outputs, steps }) { 
  const processedOutputs = { 
    initialDraft: steps[0].outputs.choices[0].message.content,
    simplification: steps[1].outputs.choices[0].message.content
  };
  return processedOutputs;
}

python
def process(attributes):
  processed_outputs = {
    'initialDraft': attributes['steps'][0]['outputs']['choices'][0]['message']['content'],
    'simplification': attributes['steps'][1]['outputs']['choices'][0]['message']['content']
  }
  return processed_outputs

The returned value from the processor is made available to the AI evaluator as the processed object.

Processor runs in Gentrace

Note that the processor runs within Gentrace, not within your own code.

Define a prompt

We then use the two keys (processed.initialDraft and processed.simplification) to interpolate into the following prompt.

handlebars
You are comparing two emails. Here is the data:
[BEGIN DATA]
************
[Task]: Are these emails semantically similar?
************
[First Email]: {{ processed.initialDraft }}
************
[Second Email]: {{ processed.simplification }} 
************
[END DATA]
Select between one of the two options below:
(A) The first email is essentially semantically identical to the second email
(B) The first email is fundamentally semantically different from the second email

Test the evaluator

To test that your evaluator is working correctly, you can select a previously observed pipeline output and then press "Evaluate". You will see how the evaluator response in the far right pane.

Once you're done testing, you can press finish to create the evaluator.

Use processors to selectively run evaluators

Let's say we have a pipeline where test case inputs are written either in Spanish or English. We want to run:

A Spanish evaluator on Spanish inputs
An English evaluator on English inputs.

Each test case has a language field that specifies the language input. We can then use a processor to selectively run evaluators based on the language.

Example

json
{
  "language": "es",
  "query": "Hola, ¿cómo estás?"
}

Defining the processor

Create a processor using the methods described earlier. Then, define a Spanish-language processor that returns false if the input is not in Spanish.

JavaScript
Python

javascript
function process({ inputs, expectedOutputs, outputs, steps }) { 
  return inputs.language === 'es'
}

python
def process(attributes):
  return attributes['inputs']['language'] == 'es'

Filtered evaluators will not be run if the processor returns false.

How to use processors to clean up output data​

Create the processor​

Use processed data in evaluators​

Process interim steps​

Create the evaluator​

Define a processor​

Define a prompt​

Test the evaluator​

Use processors to selectively run evaluators​

Defining the processor​