This year, we've been fortunate to help our customers at Quizlet, Webflow, Multiverse, and a Fortune 100 retailer ship high-quality LLM products. Along the way, we've gained the support of world-class engineering and product leaders as angel investors: Yuhki Yamashita (CPO at Figma), Garrett Lord (Co-founder and CEO at Handshake), Bryant Chou (Co-founder at Webflow), Tuomas Artman (Co-founder, CTO at Linear), Martin Mao (Co-founder, CEO at Chronosphere), David Cramer (Co-founder at Sentry), Ben Sigelman (Co-founder at Lightstep, OpenTelemetry), Steve Bartel (Co-founder, CEO at Gem), Cai GoGwilt (Co-founder at Ironclad), Manu Kumar (Co-founder at Carta), Linda Tong (CEO at Webflow), Cristina Cordova (COO at Linear), and more.
Tackling LLM product development challenges together sharpened our vision:
Evals are the core of of a successful LLM product, but they're too hard to build and frequently aren't reliable in practice.
At Gentrace, we help teams build evals that are:
- tightly tailored to your application
- sourced from multiple stakeholders, not just engineering
- easy to create, maintain, and expand
In other words, evals that actually work.
With our funding round and the Experiments launch, Gentrace is becoming the first collaborative testing environment for LLM product and engineering teams.
Involve stakeholders to build evals that work
To take an LLM application from POC to production, teams need systematic testing. Most AI engineering teams we meet have (thankfully) moved past vibe checks, but they still end up with an eval system that's unreliable or not trusted by stakeholders.
Achieving the right level of testing for production is easier with collaboration. Our testing UI connects to your actual staging and production environments, so you can:
- Build a reliable test suite with 1000s of evals, in code or UI
- Keep evals fresh and relevant by involving multiple contributors
- Incorporate domain expertise from product managers and subject experts, including human grading
- Test the entire application represented in code, not just isolated prompts, snippets, or no-code flows
Quizlet grew testing by 40x using Gentrace, preventing issues before they impact users.
Introducing Experiments, the first collaborative testing environment
Today, we're announcing Experiments, the first collaborative testing environment helping you iterate on LLM products. Now you can run test jobs from the Gentrace UI overriding any parameter (prompt, model, top-k, reranking) in your application code across any environment (local, staging, or production).
You can think of Experiments as knobs for last-mile tuning.
Experiments are more powerful than a prompt playground because you can:
- Test changes to your app in real-time with actual outputs from your system
- Evaluate changes using your existing eval suite, ensuring consistent testing
- Access and override your AI system config, including all parameters (all prompts, models, top-k, etc), in your real application
This means that, from a UI, you can easily:
- Tune retrieval systems by adjusting top-K parameters
- Edit prompts and measure their end-to-end impact
- Upgrade models easily by testing end-to-end impact
We'd love to shape the future of Experiments with you. Reach out to join our beta program, or check out the guide.
Our customers
Thank you to our customers for working with us to make Gentrace awesome.
It's inspiring to see how your LLM products and development workflows have grown with Gentrace. We couldn't have done this without your trust and collaboration.
Every LLM product needs evals. Gentrace makes evals a team sport at Webflow. With support for multimodal outputs and running experiments, Gentrace is an essential part of our AI engineering stack.
- Bryant Chou, Co-founder and chief architect at Webflow
Testing LLM products for Fortune 100 companies demands a robust system and coordination across many stakeholders. Gentrace gives us the best of both worlds: it integrates seamlessly with our complex enterprise environments and provides intuitive workflows that many teams can easily adopt.
- Tim Wee, Enterprise AI engineering consultant
What's next for Gentrace
This funding is just the beginning. We're kicking off this next chapter by hiring engineers and other roles to help the world build LLM products reflecting quality and craft.
We're also excited to build out our 2025 roadmap, including:
- Threshold-based experiments: Run many experiments at once across a range of values
- Prompt auto-optimization: Auto-optimize prompts within Experiments
- Dataset LLM expand: Expand datasets by selecting examples and then using an LLM query
- Typed, custom datasets: Pull custom data sources and models into datasets
If you're interested in building Gentrace with us, we'd love to meet you.
To our customers, partners, investors, and team: thank you for believing in what we're building. This milestone is a shared one, and we're excited to keep helping teams push the boundaries of what LLM products can do.