Threading
A thread is a back-and-forth exchange between your generative AI pipeline and an external actor (typically an end user). A prime example is the conversation that occurs between ChatGPT and a user. Typically, a thread is composed of multiple AI generations.
Within our data model, we represent threads as a singly-linked list of pipeline runs. You can specify that a run belongs to a thread by specifying the prior pipeline run while generating the next run.
Example
Let's say you have a chat feature for your end user. The feature initially performs a chat completion request with our SDK.
- TypeScript
- Python
typescript
import {init } from "@gentrace/core";import {OpenAI } from "@gentrace/openai";init ({apiKey :process .env .GENTRACE_API_KEY ,});constopenai = newOpenAI ({apiKey :process .env .OPENAI_KEY ,});// This would normally be populated from an HTTP requestconstendUserQuestion = "Hello OpenAI! How are you doing today?";constchatCompletionResponse = awaitopenai .chat .completions .create ({messages : [{role : "user",content :endUserQuestion },],model : "gpt-3.5-turbo",pipelineSlug : "introduction",});construnId =chatCompletionResponse .pipelineRunId ;// We have omitted logic:// - to store this run in your desired persistent storage (e.g. PostgRES// - to return the chat response to the user
python
import gentraceimport osgentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))# This would normally be populated from an HTTP requestendUserQuestion = "Hello OpenAI! How are you doing today?"result = openai.chat.completions.create(pipeline_slug="introduction",messages=[{"role": "user","content": endUserQuestion}],model="gpt-3.5-turbo")runId = result.pipelineRunId# We have omitted logic:# - to store this run in your desired persistent storage (e.g. PostgRES# - to return the chat response to the user
Once the request is performed and you receive a Gentrace run ID, store the initial pipeline run ID in persistent storage (e.g. database).
At a later point, your end user decides to respond to the supplied AI generation with a follow up question.
- TypeScript
- Python
typescript
import {OpenAI } from "@gentrace/openai";constopenai = newOpenAI ({apiKey :process .env .OPENAI_KEY ,});// This would normally be populated from an HTTP requestconstendUserQuestion = "Great to hear! What's the capital of Maine?";constpreviousRunId = ...; // TODO: pull prior run ID from DBconstpriorMessages = ...; // TODO: pull prior messages from DBconstchatCompletionResponse = awaitopenai .chat .completions .create ({messages : [...priorMessages ,{role : "user",content :endUserQuestion },],model : "gpt-3.5-turbo",pipelineSlug : "introduction",gentrace : {// By specifying the previous run ID, we associate this generation to a thread in GentracepreviousRunId ,}});construnId =chatCompletionResponse .pipelineRunId ;// We have omitted logic:// - to store this run in your desired persistent storage (e.g. PostgRES// - to return the chat response to the user
python
import gentraceimport osgentrace.init(api_key=os.getenv("GENTRACE_API_KEY"),)openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))previousRunId = ... # TODO: pull prior run ID from DBpriorMessages = ... # TODO: pull prior messages array from DB# This would normally be populated from an HTTP requestendUserQuestion = "Great to hear! What's the capital of Maine?"result = openai.chat.completions.create(pipeline_slug="introduction",messages=[*priorMessages,{"role": "user","content": endUserQuestion}],model="gpt-3.5-turbo",gentrace={# By specifying the previous run ID, we associate this generation to a thread in Gentrace"previousRunId": previousRunId})runId = result.pipelineRunId# We have omitted logic:# - to store this run in your desired persistent storage (e.g. PostgRES# - to return the chat response to the user
Once the generation completes, Gentrace will associate that generation with the previously-specified run.
UI
You can view threads in the Observe → Runs view. Runs that belong to the same thread are rolled up into a single row.
While in the detailed view for the thread, navigate through the individual runs with the left/right arrow keys.
You can also navigate through individual runs in the timeline view.
Limitations
This feature is available for our observability features only.
The previousRunId
parameter cannot be specified on a single step within a multi-step pipeline.
To demonstrate why, let's consider the below incorrect code, which makes two calls to OpenAI in a single generation.
- TypeScript
- Python
typescript
import {Pipeline } from "@gentrace/core";import {initPlugin } from "@gentrace/openai";constplugin = awaitinitPlugin ({apiKey :process .env .OPENAI_KEY ,});constpipeline = newPipeline ({slug : "advanced-pipeline",plugins : {openai :plugin ,},});construnner =pipeline .start ();constopenai =runner .openai ;constchatCompletionResponse = awaitopenai .chat .completions .create ({messages : [{role : "user",content : "Hello! My name is Vivek." }],model : "gpt-3.5-turbo",// 🛑 This is not correct since it's assigned to a single stepgentrace : {previousRunId : "67be57c5-fd7d-4bd8-9c6c-17991a4d689f",}});constchatCompletionResponse2 = awaitopenai .chat .completions .create ({messages : [{role : "user",content : "Hello! My name is Doug." }],model : "gpt-3.5-turbo",});awaitrunner .submit ()
python
import gentracePIPELINE_SLUG = "compose"pipeline = gentrace.Pipeline(PIPELINE_SLUG,openai_config={"api_key": process.env.OPENAI_KEY,},)runner = pipeline.start()openai = runner.get_openai()result = openai.chat.completions.create(messages=[{ "role": "user", "content": "Hello! My name is Vivek." }],model="gpt-3.5-turbo",stream=True,# 🛑 This is not correctgentrace={"previousRunId": "67be57c5-fd7d-4bd8-9c6c-17991a4d689f",})result2 = openai.chat.completions.create(messages=[{ "role": "user", "content": "Hello! My name is Doug." }],model="gpt-3.5-turbo",stream=True)runner.submit()
In Gentrace, a previousRunId
is always associated with a single Gentrace run. In the case of our advanced SDK, multiple steps
within a single runner are considered part of a single run.
Threading supports only singly-linked runs. We currently do not support branching where multiple generations reference
the same run. If two runs are submitted with the same previous run ID (previousRunId
), one run submission will
be rejected with a 400 status code.
Types
previousRunId: string (UUID)
Future work
- Thread evaluation
- Visual thread comparison
- Aggregate statistics (e.g. average run latency)
- Thread metadata
If you're interested in shaping this future work, please reach out over email.