Announcing our Series A

Tracing, evaluation, and error analysis for agents

Chat with AI to debug agent traces, create smart monitoring columns, and build out tailored evaluations.

Get Started Book a demo

Error analysis

Find and fix AI issues

Chat with your traces

Agent trace data is huge and hard to read. Gentrace Chat, inspired by Cursor, has full context of what's on your screen, allowing you to quickly answer questions like:

What issues happened here?

Was the user frustrated here?

Were there any failed tool calls here?

Learn more about Gentrace Chat

Generate custom monitoring code with AI

Generate monitoring code tailored to your use case - simple heuristics or intelligent LLM analysis - that automatically runs on every trace to spot issues in your AI output.

Tool errors

User frustration

Token monitoring

Never miss critical AI issues

Get notified instantly when issues arise and receive regular quality summaries to track your AI performance

Add Slack notifications

Coming soon

critical report visual

Tracing

Best practice monitoring with easy install

Easy install

Gentrace provides a minimal tracing SDK for quickly tracing your AI agent.

api key visual

npm

yarn

pnpm

npm install gentrace

yarn add gentrace

pnpm add gentrace

index.ts

1import { init, interaction } from 'gentrace';
2
3init()
4
5const haiku = interaction(‘haiku’, () => {
6	return myLlm.invoke('make a haiku');
7})
8
9haiku();

bash

1GENTRACE_API_KEY=YOUR_API_KEY npx tsx index.ts

1

Generate API Key

Click the "Generate API Key" button to generate your unique API key for using Gentrace.

Generate API Key

2

Authenticate

Install the Gentrace SDK using npm. 

Learn more about how to instrument LLM calls.

npm

yarn

pnpm

npm install gentrace

yarn add gentrace

pnpm add gentrace

index.ts

1import { init, interaction } from 'gentrace';
2
3init()
4
5const haiku = interaction(‘haiku’, () => {
6	return myLlm.invoke('make a haiku');
7})
8
9haiku();

3

Initialize in Your Project

Use the following TypeScript code to initialize the SDK and define an LLM interaction.

bash

1GENTRACE_API_KEY=YOUR_API_KEY npx tsx index.ts

Widespread compatibility

Gentrace works with most common agent frameworks and LLMs.

AI SDK

View Docs

Pydantic AI SDK

View Docs

OpenAI Agents

View Docs

Mastra

View Docs

Next.JS

View Docs

LangGraph

Docs soon

OpenAI (Python)

Docs soon

OpenAI (JS)

Docs soon

TypeScript

Python

See all integrations

Built on open standards

Built on OpenTelemetry, the industry standard for observability, ensuring compatibility with any monitoring stack

1# This wraps the function in an OpenTelemetry span
2# for submission to Gentrace.
3@interaction(name="simple_example")
4def my_agent(input: str, user_id: str) -> Dict[str, str]:
5    # You can access the current span using the OpenTelemetry API.
6    span = get_current_span()
7    span.set_attribute("user_id", user_id)
8    return my_agent_inner(input)

Evaluations

Capture regressions before they go live

Powerful evals, lightweight setup

Begin with lightweight evaluations that deliver immediate insights, then expand to comprehensive testing workflows as your requirements evolve.

Get started tracing your agent

Unit Test

Dataset Test

Typescript

Python

1// Run a "unit test" evaluation
2await evalOnce('rs-in-strawberry', async () => {
3  const response = await openai.chat.completions.create({
4     model: 'gpt-o4-mini',
5     messages: [{ role: 'user', content: 'How many rs in
6	 strawberry? Return only the number.'}],
7  });
8  const output = response.choices[0].message.content;
9  if (output !== '3') {
10     throw new Error('Output is not 3: ${output}’ );
11  }
12});

1// Run pyth "unit test" evaluation
2await evalOnce('rs-in-strawberry', async () => {
3  const response = await openai.chat.completions.create({
4     model: 'gpt-o4-mini',
5     messages: [{ role: 'user', content: 'How many rs in
6	 strawberry? Return only the number.'}],
7  });
8  const output = response.choices[0].message.content;
9  if (output !== '3') {
10     throw new Error('Output is not 3: ${output}’ );
11  }
12});

Typescript

Python

1// Run a "dataset" evaluation
2await evalDataset({
3  data: async () => (await testCases.list()).data,
4  inputSchema: z.object({ query: z.string() }),
5  interaction: async (case) => {
6    return await runMyAgent(case.inputs.query);
7  }
8});

1// Run pyth "dataset" evaluation
2await evalDataset({
3  data: async () => (await testCases.list()).data,
4  inputSchema: z.object({ query: z.string() }),
5  interaction: async (case) => {
6    return await runMyAgent(case.inputs.query);
7  }
8});

Turn experiments into insights

Use AI to analyze results, compare performance across experiments, and integrate with your current scoring methods

Flexible dataset management

Store test data in Gentrace or your codebase, organize it efficiently with built-in management tools, and write experiments directly in code for maximum flexibility.

Learn more about datasets

datasets visual

Gentrace was the right product for us because it allowed us to implement our own custom evaluations, which was crucial for our unique use cases. It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations.

Madeline Gilbert

Staff Machine Learning Engineer at Quizlet

Role-based
access control

Self-hosted infrastructure

SOC 2 Type II

GDPR

ISO 27001

SSO and SCIM provisioning

Security

Enterprise ready

Enterprise-level security through SOC 2 Type II and ISO 27001 compliance. Choose cloud or self-hosted deployment, and connect your existing login systems with SSO/SCIM.

Ready to debug smarter and ship faster?

Get Started Book a demo