Blog
Announcing our Series A

Tracing, evaluation, and error analysis for agents

Chat with AI to debug agent traces, create smart monitoring columns, and build out tailored evaluations.

Error analysis

Find and fix AI issues

Chat with your traces

Agent trace data is huge and hard to read. Gentrace Chat, inspired by Cursor, has full context of what's on your screen, allowing you to quickly answer questions like:

Learn more about Gentrace Chat

Generate custom monitoring code with AI

Generate monitoring code tailored to your use case - simple heuristics or intelligent LLM analysis - that automatically runs on every trace to spot issues in your AI output.

Never miss critical AI issues

Get notified instantly when issues arise and receive regular quality summaries to track your AI performance

Gentrace allows our ML engineers to work cohesively with other engineering teams, product managers, and coaches. Combining AI and human evaluation really helps us move faster and be more confident in our deployment of AI to benefit our customers and learners.

Anna X. Wang
Head of AI at Multiverse
Get started
Tracing

Best practice monitoring with easy install

Easy install

Gentrace provides a minimal tracing SDK for quickly tracing your AI agent.

npm install gentrace
yarn add gentrace
pnpm add gentrace
1import { init, interaction } from 'gentrace';
2
3init()
4
5const haiku = interaction(‘haiku’, () => {
6	return myLlm.invoke('make a haiku');
7})
8
9haiku();
1GENTRACE_API_KEY=YOUR_API_KEY npx tsx index.ts

Widespread compatibility

Gentrace works with most common agent frameworks and LLMs.
AI SDK
View Docs
Pydantic AI SDK
View Docs
OpenAI Agents
View Docs
Mastra
View Docs
Next.JS
View Docs
LangGraph
Docs soon
OpenAI (Python)
Docs soon
OpenAI (JS)
Docs soon
TypeScript
Python

Built on open standards

Built on OpenTelemetry, the industry standard for observability, ensuring compatibility with any monitoring stack

Get started
1# This wraps the function in an OpenTelemetry span
2# for submission to Gentrace.
3@interaction(name="simple_example")
4def my_agent(input: str, user_id: str) -> Dict[str, str]:
5    # You can access the current span using the OpenTelemetry API.
6    span = get_current_span()
7    span.set_attribute("user_id", user_id)
8    return my_agent_inner(input)

Gentrace makes evals a team sport at Webflow. With support for multimodal outputs and running experiments, Gentrace is an essential part of our AI engineering stack. Gentrace helps us bring product and engineering teams together for last-mile tuning so we can build AI features that delight our users.

Bryant Chou
Co-founder and chief architect at Webflow
Get started
Evaluations

Capture regressions before they go live

Powerful evals, lightweight setup

Begin with lightweight evaluations that deliver immediate insights, then expand to comprehensive testing workflows as your requirements evolve.

Get started tracing your agent
1// Run a "unit test" evaluation
2await evalOnce('rs-in-strawberry', async () => {
3  const response = await openai.chat.completions.create({
4     model: 'gpt-o4-mini',
5     messages: [{ role: 'user', content: 'How many rs in
6	 strawberry? Return only the number.'}],
7  });
8  const output = response.choices[0].message.content;
9  if (output !== '3') {
10     throw new Error('Output is not 3: ${output}’ );
11  }
12});
1// Run a "dataset" evaluation
2await evalDataset({
3  data: async () => (await testCases.list()).data,
4  inputSchema: z.object({ query: z.string() }),
5  interaction: async (case) => {
6    return await runMyAgent(case.inputs.query);
7  }
8});

Turn experiments into insights

Use AI to analyze results, compare performance across experiments, and integrate with your current scoring methods

Flexible dataset management

Store test data in Gentrace or your codebase, organize it efficiently with built-in management tools, and write experiments directly in code for maximum flexibility.

Learn more about datasets

Gentrace was the right product for us because it allowed us to implement our own custom evaluations, which was crucial for our unique use cases. It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations.

Madeline Gilbert
Staff Machine Learning Engineer at Quizlet
Get started
Role-based
access control
Self-hosted infrastructure
SOC 2 Type II
GDPR
ISO 27001
SSO and SCIM provisioning
Security

Enterprise ready

Enterprise-level security through SOC 2 Type II and ISO 27001 compliance. Choose cloud or self-hosted deployment, and connect your existing login systems with SSO/SCIM.

Ready to debug smarter and ship faster?