Announcing our Series A

This year, we've been fortunate to help our customers at Quizlet, Webflow, Multiverse, and a Fortune 100 retailer ship high-quality LLM products. Along the way, we've gained the support of world-class engineering and product leaders as angel investors: Yuhki Yamashita (CPO at Figma), Garrett Lord (Co-founder and CEO at Handshake), Bryant Chou (Co-founder at Webflow), Tuomas Artman (Co-founder, CTO at Linear), Martin Mao (Co-founder, CEO at Chronosphere), David Cramer (Co-founder at Sentry), Ben Sigelman (Co-founder at Lightstep, OpenTelemetry), Steve Bartel (Co-founder, CEO at Gem), Cai GoGwilt (Co-founder at Ironclad), Manu Kumar (Co-founder at Carta), Linda Tong (CEO at Webflow), Cristina Cordova (COO at Linear), and more.

Tackling LLM product development challenges together sharpened our vision:

Evals are the core of of a successful LLM product, but they're too hard to build and frequently aren't reliable in practice.

At Gentrace, we help teams build evals that are:

tightly tailored to your application
sourced from multiple stakeholders, not just engineering
easy to create, maintain, and expand

In other words, evals that actually work.

With our funding round and the Experiments launch, Gentrace is becoming the first collaborative testing environment for LLM product and engineering teams.

Involve stakeholders to build evals that work

To take an LLM application from POC to production, teams need systematic testing. Most AI engineering teams we meet have (thankfully) moved past vibe checks, but they still end up with an eval system that's unreliable or not trusted by stakeholders.

Achieving the right level of testing for production is easier with collaboration. Our testing UI connects to your actual staging and production environments, so you can:

Build a reliable test suite with 1000s of evals, in code or UI
Keep evals fresh and relevant by involving multiple contributors
Incorporate domain expertise from product managers and subject experts, including human grading
Test the entire application represented in code, not just isolated prompts, snippets, or no-code flows

Quizlet grew testing by 40x using Gentrace, preventing issues before they impact users.

Introducing Experiments, the first collaborative testing environment

Today, we're announcing Experiments, the first collaborative testing environment helping you iterate on LLM products. Now you can run test jobs from the Gentrace UI overriding any parameter (prompt, model, top-k, reranking) in your application code across any environment (local, staging, or production).

You can think of Experiments as knobs for last-mile tuning.

Experiments are more powerful than a prompt playground because you can:

Test changes to your app in real-time with actual outputs from your system
Evaluate changes using your existing eval suite, ensuring consistent testing
Access and override your AI system config, including all parameters (all prompts, models, top-k, etc), in your real application

This means that, from a UI, you can easily:

Tune retrieval systems by adjusting top-K parameters
Edit prompts and measure their end-to-end impact
Upgrade models easily by testing end-to-end impact

We'd love to shape the future of Experiments with you. Reach out to join our beta program, or check out the guide.

Our customers

Thank you to our customers for working with us to make Gentrace awesome.

It's inspiring to see how your LLM products and development workflows have grown with Gentrace. We couldn't have done this without your trust and collaboration.

Every LLM product needs evals. Gentrace makes evals a team sport at Webflow. With support for multimodal outputs and running experiments, Gentrace is an essential part of our AI engineering stack.
- Bryant Chou, Co-founder and chief architect at Webflow

Testing LLM products for Fortune 100 companies demands a robust system and coordination across many stakeholders. Gentrace gives us the best of both worlds: it integrates seamlessly with our complex enterprise environments and provides intuitive workflows that many teams can easily adopt.
- Tim Wee, Enterprise AI engineering consultant

What's next for Gentrace

This funding is just the beginning. We're kicking off this next chapter by hiring engineers and other roles to help the world build LLM products reflecting quality and craft.

We're also excited to build out our 2025 roadmap, including:

Threshold-based experiments: Run many experiments at once across a range of values
Prompt auto-optimization: Auto-optimize prompts within Experiments
Dataset LLM expand: Expand datasets by selecting examples and then using an LLM query
Typed, custom datasets: Pull custom data sources and models into datasets

If you're interested in building Gentrace with us, we'd love to meet you.

To our customers, partners, investors, and team: thank you for believing in what we're building. This milestone is a shared one, and we're excited to keep helping teams push the boundaries of what LLM products can do.

Announcing our Series A

Involve stakeholders to build evals that work

Introducing Experiments, the first collaborative testing environment

Our customers

What's next for Gentrace

Building datasets for LLM product evaluations

Building datasets for LLM product evaluations

Securing Microservices with Istio: A Self-Hosted Journey

Simplifying task queues with PostgreSQL

Press Release: Gentrace Raises $8M Series A to Transform Generative AI Testing, Making LLM Development More Accessible and Reliable