Derivations are snippets of code that run against traces and experiments to extract data, monitor for errors, and more. Derivations can optionally run an agent to analyze the trace (a variant of LLM-as-judge that we call Agent-as-judge). They show up as columns in the Gentrace UI. Derivations example

Structure of a derivation

Language

Write derivations in Python or JavaScript.

Return type

All derivations must return a typed value. The type is specified in the dropdown at the bottom of the derivation and must match the return type of the function. Some types can be marked as “eval”. Eval derivations are averaged to compute a trace’s score. Derivation return type dropdown

Function signature and arguments

Derivations are functions written in Python or JavaScript. Derivations receive the following arguments:
  • The trace
  • (If available) The source test case from the test dataset
  • All other derivations in the same view
function evaluate({
  // The trace data to analyze
  trace,
  // The source test case from the test dataset (if available)
  testCase,
  // All other derivations in the same view, spread as individual
  // camelCase properties
  ...otherDerivations
}) {

  ...

  // The return type must match the type specified in the
  // dropdown at the bottom of the derivation.
  return ...
}

LLM-as-judge (Agent-as-judge)

Derivations can use an LLM to analyze traces with callAgent() / call_agent(). The function accepts parameters for instructions, resources (like the trace), output schema, and optionally images.
const { count } = await callAgent({
  instructions: "How many r's in strawberry?",
  jsonSchema: {
    type: 'object',
    properties: {
      count: { type: 'number' },
    },
    required: ['count'],
  },
});

// Analyzing a user message
const { sentiment } = await callAgent({
  instructions: 'Analyze the sentiment of: ' + userMessage,
  jsonSchema: {
    type: 'object',
    properties: {
      sentiment: {
        type: 'string',
        enum: ['positive', 'neutral', 'negative'],
      },
      confidence: {
        type: 'number',
        minimum: 0,
        maximum: 1,
      },
    },
    required: ['sentiment', 'confidence'],
  },
});

// Analyzing a trace
const { longestMessage } = await callAgent({
  instructions:
    'Get the sentiment of the longest user message in this trace',
  resources: [{ type: 'trace' }],
  jsonSchema: {
    type: 'object',
    properties: {
      sentiment: {
        type: 'string',
        enum: ['positive', 'neutral', 'negative'],
      },
      confidence: {
        type: 'number',
        minimum: 0,
        maximum: 1,
      },
      longestMessage: { type: 'string' },
    },
    required: ['sentiment', 'confidence', 'longestMessage'],
  },
});

Running derivations

Derivations run in the context of a view. Derivations are run in three ways:
  • Automatically by Gentrace Chat
  • Automatically on trace ingest when sampled according to the view’s auto-run settings
  • Manually, by:
    • Pressing “Run last 10” or “Run last 100” in the top bar of the view
    • Right clicking on a column header in the table
    • Right clicking on a row or cell in the traces table
    • Pressing “Run” with a derivation selected
Manual run

Example derivations

Use the prompts below in Gentrace Chat to analyze your traces.

Understand agent execution

Summarize the entire trace as a series of steps
Extract the user message that triggered the agent
Extract the final assistant message (if present)
Show which tools were used, and how many times

Understand user experience

Extract the user's name and organization (if present in the trace)
Rate the user's frustration level based on the trace

Measure cost and performance

Show the total number of LLM calls
Show the number of input, output, and/or total tokens across the trace
Show the total number of tool calls

Monitor for errors

Did the agent satisfy the user's request?
Show the number of failed tool calls
Show the number of failed LLM calls

Write evaluations with LLM-as-judge

Compare the factualness of the assistant's response to expected output
Show the percentage of assertions that pass in the trace