Processors
Processors are designed:
- To transform steps from tracing and/or outputs before evaluation
- To selectively run evaluators
They are functions that transform steps
and outputs
into a single processed
object for evaluators. These functions can be written in either JavaScript or Python.
How to use processors to clean up output data
Let's say we have a basic AI feature that composes an email with OpenAI, as shown below.
- JavaScript
- Python
javascript
constemailDraftResponse = awaitopenai .chat .completions .create ({model : "gpt-3.5-turbo",temperature : 0.8,messages : [{role : "system",content : `Write an email on behalf of ${sender } to ${receiver }: ${query }`,},],});
python
def compose(sender, receiver, query):email_draft_response = openai.chat.completions.create(messages=[{"role": "system","content": f"Write an email on behalf of {sender} to {receiver}: {query}"},],model="gpt-3.5-turbo")
When Gentrace receives this information, the data will have this clunky structure.
json
{"id": "chatcmpl-...","object": "chat.completion","created": 1691678980,"model": "gpt-3.5-turbo-0613","choices": [{"index": 0,"message": {"role": "assistant","content": "<Email Content>"},"finish_reason": "stop"}],"usage": {"prompt_tokens": 531,"completion_tokens": 256,"total_tokens": 787}}
The output contains mostly unnecessary information. We only care about the email draft content nested at choices[0].message.content
. Ideally, we would pre-compute this information and store it in a way that's easy for our evaluator to access.
Create the processor
A processor can be added to a new or existing evaluator.
Let's create a new evaluator for checking grammar. We click on the "Evaluators" section of a pipeline, click the "New evaluator" button, and then choose the "Grammar (CoT + Classification)" template to customize.
The "Customize" page for the new evaluator includes a "Processor" section. Press the +
button in this section to add a new processor.
The "Edit processor" popup that appears allow you to define the processor as either a JavaScript or Python function.
The function will be passed:
outputs
object which contains the final raw output from the pipelineinputs
object which contains the inputs to the pipelineexpectedOutputs
object which contains the expected outputs to the pipelinesteps
array which contains the full list of intermediate steps taken by the pipeline.
For this example, we created a simple transformation to access the message content and store it on the emailDraft
key on the object.
- JavaScript
- Python
javascript
/*** @param {Object} attributes - The attributes that are available during this evaluation.* @param {Object} attributes.outputs - Output object from the generation that needs to be evaluated.* @param {Object} attributes.inputs - An object of all inputs used in the test case.* @param {Object} attributes.expectedOutputs - The expected results.* @param {List} attributes.steps - A list of all steps generated by this pipeline.* @returns {Object | boolean} - Returning false will cause the evaluator to skip this result.*/functionprocess ({inputs ,expectedOutputs ,outputs ,steps }) {constprocessedOutputs = {emailDraft :outputs .choices [0].message .content }returnprocessedOutputs ;}
python
def process(attributes):"""attributes - dict of attributes that are available during this evaluation.attributes['outputs'] - Output dict from the generation that needs to be evaluated.attributes['inputs'] - A dict of all inputs used in the test case.attributes['expectedOutputs'] - The expected results.attributes['steps'] - A list of all steps generated by this pipeline."""processed_outputs = {'emailDraft': attributes['outputs']['choices'][0]['message']['content']}return processed_outputs
Once you're done writing the function, test that it works correctly on the existing pipeline data (using the "Test case" dropdown and the "Process" button), and save the processor.
Use processed data in evaluators
All processed data returned by the function is available to evaluators under the processed
key.
Here's an example of how processed.emailDraft
can be used within the "Prompt" of an AI evaluator.
The example processed.emailDraft
can also be used within a heuristic evaluator as part of an evaluate
function.
- JavaScript
- Python
javascript
/*** Evaluates the given object based on certain conditions.** @param {Object} attributes - The attributes that are available during this evaluation.* @param {Object} attributes.outputs - Output object from the generation that needs to be evaluated.* @param {Object} attributes.inputs - An object of all inputs used in the test case.* @param {Object} attributes.expectedOutputs - The expected results.* @param {Object} attributes.processed - Output object of the processor if one exists.* @param {List} attributes.steps - A list of all steps generated by this pipeline.* @returns {string} - Returns the enum value that you specify in the evaluator options above.*/functionevaluate ({outputs ,inputs ,expectedOutputs ,processed ,steps }) {if (processed .emailDraft .includes ("Subject:")) {return "A";}return "B";}
python
def evaluate(attributes):"""Evaluates the given object based on certain conditions.attributes - dict of attributes that are available during this evaluation.attributes['outputs'] - Output dict from the generation that needs to be evaluated.attributes['inputs'] - A dict of all inputs used in the test case.attributes['expectedOutputs'] - The expected results.attributes['processed'] - Output dict of the processor if one exists.attributes['steps'] - A list of all steps generated by this pipeline."""if "Subject:" in attributes['processed']['emailDraft']:return "A";return "B";
More information on heuristic functions and how they can be defined and tested can be found on the evaluators page.
Process interim steps
Each step must be captured using our tracing integration to be available to processors.
Let's imagine we are building a feature that drafts emails in two OpenAI calls:
- Drafts an initial email draft
- Simplifies the email draft
Create the evaluator
Let's create an evaluator that compares the initial draft email and simplified email on an evaluation rubric.
We will use an AI evaluator to compare the steps.
Define a processor
In the above image, we first use a basic processor called "Extract Steps" to transform the two OpenAI completions from the pipeline to variables processed.initialDraft
and processed.simplification
that can be interpolated into the AI model-graded evaluation.
- JavaScript
- Python
javascript
functionprocess ({inputs ,expectedOutputs ,outputs ,steps }) {constprocessedOutputs = {initialDraft :steps [0].outputs .choices [0].message .content ,simplification :steps [1].outputs .choices [0].message .content };returnprocessedOutputs ;}
python
def process(attributes):processed_outputs = {'initialDraft': attributes['steps'][0]['outputs']['choices'][0]['message']['content'],'simplification': attributes['steps'][1]['outputs']['choices'][0]['message']['content']}return processed_outputs
The returned value from the processor is made available to the AI evaluator as the processed
object.
Note that the processor runs within Gentrace, not within your own code.
Define a prompt
We then use the two keys (processed.initialDraft
and processed.simplification
) to interpolate into the following prompt.
handlebars
You are comparing two emails. Here is the data:[BEGIN DATA]************[Task]: Are these emails semantically similar?************[First Email]: {{ processed.initialDraft }}************[Second Email]: {{ processed.simplification }}************[END DATA]Select between one of the two options below:(A) The first email is essentially semantically identical to the second email(B) The first email is fundamentally semantically different from the second email
Test the evaluator
To test that your evaluator is working correctly, you can select a previously observed pipeline output and then press "Evaluate". You will see how the evaluator response in the far right pane.
Once you're done testing, you can press finish to create the evaluator.
Use processors to selectively run evaluators
Let's say we have a pipeline where test case inputs are written either in Spanish or English. We want to run:
- A Spanish evaluator on Spanish inputs
- An English evaluator on English inputs.
Each test case has a language
field that specifies the language input. We can then use a processor to selectively run
evaluators based on the language.
Example
json
{"language": "es","query": "Hola, ¿cómo estás?"}
Defining the processor
Create a processor using the methods described earlier. Then, define a Spanish-language processor that returns false
if the input is not in Spanish.
- JavaScript
- Python
javascript
functionprocess ({inputs ,expectedOutputs ,outputs ,steps }) {returninputs .language === 'es'}
python
def process(attributes):return attributes['inputs']['language'] == 'es'
Filtered evaluators will not be run if the processor returns false
.