Quizlet’s goal: make learning faster and easier
Quizlet is a leading education technology company that provides online study tools for students and teachers.
To help students and teachers spend less time building materials and more time learning, Quizlet introduced their AI Study Era initiative, which aims to transform unstructured notes and materials into effective study tools (flash cards, study guides, practice tests) using generative AI.
However, as Quizlet developed these AI tools, they struggled with a lack of predictability. Even minor changes, such as adding a comma in a prompt, could significantly impact the downstream results.
From a home-grown testing stack to Gentrace
To systematically develop better generative AI features, Quizlet initially developed a homegrown evaluation setup, but the evaluation process was cumbersome. It relied on multiple Google Sheets and Colab notebooks, which made testing slow and inefficient.
Quizlet then chose Gentrace to address these challenges for several reasons:
- Custom evaluations: Gentrace allowed Quizlet to implement their own custom evaluations, which was crucial for their specific use cases.
- Visualization and analysis: The platform provided easy visualization of results, allowing the team to dig into different views and quickly identify issues.
- Efficient testing: Gentrace streamlined the testing process, making it faster and more accessible for any size of change.
Madeline Gilbert, Staff Machine Learning Engineer at Quizlet, explains:
"Gentrace was the right product for us because it allowed us to implement our own custom evaluations, which was crucial for our unique use cases. The ability to easily visualize what was going wrong and dig into the results with different types of views has been invaluable. It's dramatically improved our ability to predict the impact of even small changes in our LLM implementations."
Quizlet integrated Gentrace into their development and testing workflow:
- Pre-merge testing: Before merging any changes, the team runs Gentrace to compare the main branch with the proposed changes, quantifying the impact.
- Evaluators: Quizlet uses a combination of AI (80%) and heuristic (20%) evaluators within Gentrace.
More testing, better study materials
The implementation of Gentrace has led to significant improvements in Quizlet's AI development process:
- Testing time: Reduced from approximately 1 hour to 1 minute or less.
- Increased confidence: Better predictability and confidence in the impact of changes.
- Testing frequency: Increased from 2 times per month to over 20 times per week.
- Faster iteration: Significantly improved speed of iteration with testing.
With higher frequency testing and rapid iteration, Quizlet upgraded models with confidence and shipped higher quality outputs, allowing them to hit their deadline of launching significant improvements before the start of the 2024-2025 school year.