Blog | Gentrace - LLM evaluation for AI teams

Building datasets for LLM product evaluations

In this post, I provide a system for building, maintaining, and scaling datasets for modern LLM product workflows.

blog author photo

February 14, 2025

blog post cover image

Securing Microservices with Istio: A Self-Hosted Journey

Our Kubernetes-based architecture included several components that needed to be secured when deployed to customer environments. Learn how we improved self-hosted deployments by adopting Istio as our service mesh.

blog author photo

January 23, 2025

Simplifying task queues with PostgreSQL

Gentrace's task queue system processes thousands of evaluation tasks daily. Learn how we simplified our task queue architecture by using PostgreSQL.

blog author photo

January 9, 2025

Press Release: Gentrace Raises $8M Series A to Transform Generative AI Testing, Making LLM Development More Accessible and Reliable

Developer platform breaks down technical barriers with industry-first experimentation tool for cross-functional AI testing

blog author photo

December 10, 2024

Announcing our Series A

Today, we're announcing: $8M in Series A funding led by Kojo Osei at Matrix Ventures and our new Experiments feature to reimagine LLM product testing.

blog author photo

December 10, 2024

Unfair advantages - a framework for building LLM-as-a-judge evaluations that reliably work

LLM-as-a-judge evaluation uses an LLM to grade an output from an AI system, augmenting or replacing manual, human evaluation.

blog author photo

November 12, 2024

Incident 7/29/2024: Evaluator outage

Gentrace suffered an evaluator outage that affected all customers who ran tests from Friday, July 26th at 1:42PM PT to Monday, July 29th at ~4:30 PM PT.

blog author photo

Security advisory: XSS vulnerability patched

TLDR: A XSS vulnerability has been patched. We investigated and found no known affected users or recommended actions for Gentrace users.

blog author photo

January 28, 2024

How to test for AI hallucination

AI hallucinations occur when an LLM returns incoherent and/or factually inaccurate information in response to a query.

blog author photo

January 24, 2024

How to test RAG systems

One of the most common patterns in modern LLM development is Retrieval-Augmented Generation (RAG).

blog author photo

January 11, 2024

2023 in review

Over the course of 2023, you helped Gentrace grow from an internal tool to the generative AI evaluation platform for a number of technology companies building with generative AI.

blog author photo

December 14, 2023