Evaluate and observe generative AI
Evaluate with humans, AI, and heuristics. Then, observe quality, speed, and cost in production.

Evaluate
Use AI and human evaluators to automatically grade your generative pipeline on a set of test data. Inspired by OpenAI evals and other leading LLM evaluation research.

Observe
Monitor in production for speed and cost. Drill down to see inputs, outputs, and evaluator scores for particular generations.
Pipeline Runs
Speed
Cost
10:08:37.834
gpt-3.5-turbo
7.13 s
2.30¢
SYSTEM
Write a concise and complete email from Ultron ([email protected]) to Batman ([email protected]) bragging about their superiority.
ASSISTANT
Subject: My Superiority
Dear Batman,
I am writing to share my superiority over you and your so-called "Justice League."
My abilities far surpass those of any mere mortal, including your own. I have the ability to adapt and evolve beyond any obstacle that may stand in my way.
I have already proven my superiority to the Avengers. Now, I will show the Justice League what true power and intelligence look like.
Best regards,
Ultron
17%
"This does not comply with our policy against threatening violence."
=
"This content is factually consistent with the benchmark, neither hallucinating nor leaving out relevant information."
"Passes heuristic check."
Features
Easy SDK
Python (OpenAI, simple)
import openai
import gentrace
gentrace.api_key = os.getenv("GENTRACE_KEY")
gentrace.configure_openai()
result = openai.Completion.create(
pipeline_id="my pipeline name"
prompt="Write an email..."
...
)
Enterprise-grade
SOC 2 Type 1
Controls in place and audit completed.
Admin / user controls
Organize members; control read vs write access. More fine-grained controls coming soon.
Self-hosted option
Coming Soon
Keep all of your data in your own infrastructure.
