OpenAI
This guide covers our OpenAI integration. We provide two levels of SDK for instrumenting OpenAI code.
Installation
- TypeScript
- Python
Please only use this library on the server-side. Using it on the client-side will reveal your API key.
First, install our core package.
bash
# Execute only one, depending on your package managernpm i @gentrace/coreyarn add @gentrace/corepnpm i @gentrace/core
bash
# Execute only one, depending on your package managerpip install gentrace-pypoetry add gentrace-py
If you want to use our provider SDK handlers, you must install our associated plugin SDKs. These SDKs have a direct dependency on the officially supported SDK for their respective providers. We type match the official SDK whenever possible.
@gentrace/openai@v4
This section requires Gentrace's official OpenAI plugin. The plugin version matches the major version of the official OpenAI Node.JS SDK.
shell
# For OpenAI v4 (the new version)npm install @gentrace/openai@v4
These NPM packages will only work with Node.JS versions >= 16.16.0
.
This PyPI package will only work with Python versions >= 3.7.1
.
Simple usage
- TypeScript
- Python
We designed our SDKs to mostly preserve the original interface to OpenAI's client library. You can simply insert the following lines of code before your OpenAI invocations.
typescript
import {init } from "@gentrace/core";import {OpenAI } from "@gentrace/openai";// This function globally initializes Gentrace with the appropriate// credentials. Constructors like OpenAI() will transparently use// these credentials to authenticate with Gentrace.init ({apiKey :process .env .GENTRACE_API_KEY });constopenai = newOpenAI ({apiKey :process .env .OPENAI_KEY ,});
The OpenAI
class is virtually identical to the equivalents in the official SDK.
You can then execute your OpenAI functions against the openai
handle directly.
typescript
async function createEmbedding() {const embeddingResponse = await openai.embeddings.create({model: "text-embedding-ada-002",input: "Example input",// IMPORTANT: Supply a Gentrace Pipeline slug to track this invocationpipelineSlug: "create-test-embedding"});console.log("Pipeline run ID: ", embeddingResponse.pipelineRunId);}createEmbedding();
We designed our SDKs to mostly preserve the original interface to OpenAI's client library. You can simply insert the following two lines of code before your OpenAI invocations.
python
import gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))
The gentrace.OpenAI()
constructors automatically tracks invocations to OpenAI and asynchronously forwards
information to our service. Our SDK will not increase request latency to OpenAI.
Here's an example usage for creating embeddings.
python
import osimport gentraceimport openaigentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))# Our SDK transparently sends this information async to our servers.result = openai.embeddings.create(input="sample text",model="text-embedding-3-small",# IMPORTANT: Supply a Gentrace Pipeline slug to track this invocationpipeline_slug="create-sample-embedding",)# We modify the OpenAI SDK return value to include a PipelineRun ID. This ID is# important to tie feedback to a particular AI generation.print("Pipeline run ID: ", result.pipelineRunId)# Since we send information asynchronously, we provide a function that allows you to wait# until all requests are sent. This should be used only in development.gentrace.flush()
You should provide a Pipeline slug as a request parameter to any method that you want to instrument. This ID associates OpenAI invocations to that identifier on our service. If you omit the slug, we will not track telemetry for that invocation.
The PipelineRun ID provided by the OpenAI create()
return value is from the Gentrace SDK. Our SDK provides this for you to
uniquely associate feedback with AI generated content. If you do not provide a Pipeline slug, the create()
functions
will not return a PipelineRun ID.
Asynchronous commands
We support instrumenting the asynchronous methods of OpenAI's SDK.
python
import asyncioimport osimport gentraceasync def main():gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.AsyncOpenAI()openai.api_key = os.getenv("OPENAI_KEY")result = await openai.embeddings.create(input="sample text",model="text-embedding-3-small",pipeline_slug="testing-value",)gentrace.flush()# We still modify the OpenAI SDK return value to include a PipelineRun IDprint("Pipeline run ID: ", result.pipelineRunId)asyncio.run(main())
Keep these notes in mind when using async functions.
- The SDK still returns a PipelineRun ID that you can use to uniquely associate feedback to a generation.
- If you await the asynchronous invocation, the SDK does not wait for Gentrace's telemetry request to complete.
Content templates for the openai.chat.completions.create()
interface
The other difference between the Gentrace-instrumented SDK and the official SDK is how prompts are specified for openAi.chat.completion.create()
requests.
In the official version of the SDK, you specify your chat completion input as an object array with role
and content
key-pairs defined.
typescript
// ❌ Official OpenAI SDK invocationconst chatCompletionResponse = await openai.chat.completions.create({messages: [{role: "user",content: "Hello Vivek!"},],model: "gpt-3.5-turbo"});
In our SDK, if part of the content is dynamically generated, you should instead create contentTemplate
and contentInputs
key-pairs to separate the static and dynamic information, respectively. This is helpful to better display the generation in our UI and internally track version changes.
We use Mustache templating with the Mustache.js library to render the final content that is sent to OpenAI.
typescript
// ✅ Gentrace-instrumented OpenAI SDKconst chatCompletionResponse = await openai.chat.completions.create({messages: [{role: "user",contentTemplate: "Hello {{ name }}!",contentInputs: { name: "Vivek" },},],model: "gpt-3.5-turbo",pipelineSlug: "testing-pipeline-id",});
Note: We still allow you to specify the original content
key-value pair in the dictionary if you want to incrementally migrate your invocations.
Consult OpenAI's Node.JS SDK documentation for more details to learn more about the original SDK.
Content templates for the openai.chat.completions
interface
The other difference between the Gentrace-instrumented SDK and the official SDK is how prompts are specified for openai.chat.completions.create()
requests.
In the official version of the SDK, you specify your chat completion input as an array of dictionaries with role
and content
key-pairs defined.
python
import openaiopenai = OpenAI(api_key=os.getenv("OPENAI_KEY"))# ❌ Official OpenAI SDK invocationresult = openai.chat.completions.create(messages=[{"role": "user","content": "Hello Vivek!"}],model="gpt-3.5-turbo",)
In our SDK, if part of the content is dynamically generated, you should instead create contentTemplate
and contentInputs
key-pairs to separate the static and dynamic information, respectively. This is helpful to better display the generation in our UI and internally track version changes.
We use Mustache templating with the Pystache library to render the final content that is sent to OpenAI.
python
# ✅ Gentrace-instrumented OpenAI SDKimport gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))openai.chat.completions.create(messages=[{"role": "user","contentTemplate": "Hello {{ name }}!","contentInputs": {"name": "Vivek"},}],model="gpt-3.5-turbo",pipeline_slug="test-hello-world-templatized",)
Note: We still allow you to specify the original content
key-value pair in the dictionary if you want to
incrementally migrate your invocations.
Consult OpenAI's Python SDK documentation for more details to learn more about the original SDK.
Streaming
We transparently wrap OpenAI's Node streaming functionality.
typescript
// Imports and initialization truncatedasync function main() {const streamChat = await openai.chat.completions.create({model: 'gpt-4',messages: [{ role: 'user', content: 'Say this is a test' }],stream: true,});// This PipelineRun ID actually hasn't been created on the server yet.// It's created asynchronously after the final stream event is processed.console.log("Pipeline run ID: ", streamChat.pipelineRunId);for await (const part of streamChat) {console.log(part.choices[0]?.delta?.content || '');}// Stream data is coalesced and sent to our server.}main();
We support data streaming of OpenAI's API for both synchronous and asynchronous SDK methods. Our SDK instruments the start time at the moment prior to the invocation and the end time at the moment immediately after the final stream event is received for that invocation.
Synchronous
python
import osimport gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.OpenAI(api_key=os.getenv("OPENAI_KEY"))result = openai.chat.completions.create(pipeline_slug="testing-chat-completion-value",messages=[{"role": "user", "content": "Hello!"}],model="gpt-3.5-turbo",stream=True,)pipeline_run_id = Nonefor value in result:if hasattr(value, "pipelineRunId"):pipeline_run_id = value.pipelineRunIdprint("Result: ", pipeline_run_id)gentrace.flush()
Asynchronous
python
import asyncioimport osimport gentracegentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))openai = gentrace.AsyncOpenAI(api_key=os.getenv("OPENAI_KEY"))async def main():result = await openai.chat.completions.create(pipeline_slug="testing-chat-completion-value",messages=[{"role": "user", "content": "Hello!"}],model="gpt-3.5-turbo",stream=True,)pipeline_run_id = None# 👀 The async iteration is key here!async for value in result:if hasattr(value, "pipelineRunId"):pipeline_run_id = value.pipelineRunIdgentrace.flush()print("Result: ", pipeline_run_id)asyncio.run(main())
Keep in mind that the PipelineRun ID is included in the payload for every event that's returned from the server.
We measure the total time a generation takes from the first byte received from iterating on the stream to the last event yielded from the stream.
Before sending the streamed events to Gentrace, we coalesce the streamed payload as a single payload to improve readability.
Tracking multiple invocations in one Pipeline
With the functions shown thus far, you can only track a single OpenAI invocation per Pipeline. Check out the advanced section below to learn about our methods for tracking multiple invocations across multiple providers (e.g. Pinecone vector query + OpenAI embedding call) in a single Pipeline.
Telemetry support
We automatically capture analytics from these OpenAI SDK methods. We plan to support other methods upon request.
In these points, openai
is an instance of the OpenAI
class or AsyncOpenAI
class (Python only).
openai.beta.chat.completions.parse()
openai.chat.completions.create()
openai.embeddings.create()
Advanced usage
The SDKs described above are designed for creating single invocations to one provider like OpenAI or Pinecone. We also provide abstractions for chaining multiple invocations together into a single pipeline.
Creating Pipeline
and PipelineRuns
To declare a Pipeline, you must define the configuration (including API keys) for Gentrace and the services you intend to monitor.
typescript
import { init, Pipeline } from "@gentrace/core";import { initPlugin as initOpenAIPlugin } from "@gentrace/openai";// This function globally initializes Gentrace with the appropriate// credentials. Constructors like Pipeline() will transparently use// these credentials to authenticate with Gentrace.init({apiKey: process.env.GENTRACE_API_KEY});const openaiPlugin = await initOpenAIPlugin({apiKey: process.env.OPENAI_KEY,});const pineconePlugin = await initPineconePlugin({apiKey: process.env.PINECONE_API_KEY,environment: process.env.PINECONE_ENVIRONMENT,});const pipeline = new Pipeline({slug: "searchCompanyKnowledge",plugins: {openai: openaiPlugin,pinecone: pineconePlugin},});
python
import gentraceimport ospipeline = gentrace.Pipeline("write-email",os.getenv("GENTRACE_API_KEY"),openai_config={"api_key": os.getenv("OPENAI_KEY"),},pinecone_config={"api_key": os.getenv("PINECONE_API_KEY"),"environment": os.getenv("PINECONE_ENVIRONMENT"),})
Pipeline
We designed the Pipeline
class to specify the static, global configuration of a pipeline. Then, we expect users to use this global Pipeline
reference to create additional PipelineRun
instances via pipeline.start()
. More on that below.
To create a PipelineRun, invoke the following code. The returned runner allows you to interact with providers like OpenAI and Pinecone.
typescript
const runner = await pipeline.start();
python
runner = pipeline.start()
To access a handle on a supported provider like OpenAI or Pinecone, invoke the following code.
typescript
// For OpenAIconst openAi = runner.openai;// For Pineconeconst pinecone = runner.pinecone;
python
# For OpenAIopen_ai = runner.get_openai()# For Pineconepinecone = runner.get_pinecone()
You can then access methods for these external services on the handlers. These clients are nearly API-compatible with their equivalent official SDKs. There are a few key differences we’ll get into later when we cover each provider in detail.
Submission
Once you've invoked all the requests you need, you can submit this data to our external provider with the following
code. This functionality asynchronously sends the PipelineRun data to our servers and returns a PipelineRun
ID that
you can send to your client.
typescript
const { pipelineRunId } = await runner.submit()
If you want to wait for the result of the submission request, you can invoke the following.
typescript
const { pipelineRunId } = await runner.submit({waitForServer: true})// This will block until the request returns
python
info = runner.submit()pipeline_run_id = info["pipelineRunId"]
If you want to wait for the result of the submission request, you can invoke the following.
python
info = runner.submit(wait_for_server=True)# This will block until the request returns
If you want to wait for the submission request using Python's asynchronous primitives, you can invoke the following.
python
import asyncioasync def main():# Runner setupinfo = await runner.asubmit()
The PipelineRun
ID is used to associate user feedback with the generated content. It is important to pass this ID to your client application so that you can effectively link user feedback to the corresponding AI-generated content. To facilitate this association, you can use the browser Feedback SDK.
Advanced SDK
@gentrace/openai@v4
or @gentrace/openai@v3
This section requires Gentrace's official OpenAI plugin. The plugin version matches the major version of the official OpenAI Node.JS SDK.
Our package provides a near type-match for the OpenAI Node.JS SDK. To get an instrumented version of the OpenAI SDK, simply invoke the following code.
typescript
const openaiPlugin = await initPlugin({apiKey: process.env.OPENAI_KEY,});const pipeline = new Pipeline({slug: "openai",plugins: {openai: openaiPlugin,},});const runner = pipeline.start();const openai = runner.openai;const embeddingResponse = await openai.embeddings.create({model: "text-embedding-ada-002",input: "What is Vivek's birthday?",});
You can then invoke functions against the resulting handle that match the official SDK.
Note that in the Simple SDK, you had to specify a pipelineSlug
for your invocations. If you're using the Pipeline
object, the Pipeline slug is declared explicitly in the Pipeline
object constructor. Similarly, the result of an invocation will not return a PipelineRun ID.
Our library requires the official OpenAI Python SDK to be installed. You can install their official package with
pip install openai
or poetry add openai
.
Our package provides a near type-match for the OpenAI Python SDK. To get an instrumented version of the OpenAI SDK, simply invoke the following code.
python
runner = pipeline.start()openai = runner.get_openai()embeddingResponse = openai.embeddings.create(model="text-embedding-ada-002",input="What is Vivek's birthday?",);
You can then invoke functions against the resulting handle (in this case, the openai
variable) that match the official SDK.
Note that in the Simple SDK, you had to specify a pipeline_slug
for your invocations. If you're using the Pipeline
object, the Pipeline slug is declared explicitly in the Pipeline
object constructor. Similarly, the result of an invocation will not return a PipelineRun ID.
Configuration
To configure Gentrace's Node.JS SDK with OpenAI, you must initialize a plugin using the initPlugin()
method exported from every Gentrace plugin. Then, pass the same parameter object that you would pass to the OpenAI
constructor as the first parameter to initPlugin()
.
typescript
import {init ,Pipeline } from "@gentrace/core";import {initPlugin } from "@gentrace/openai";// The provided parameter object has the same types as the OpenAI constructor.constopenaiPlugin = awaitinitPlugin ({apiKey :process .env .OPENAI_KEY ,});constpipeline = newPipeline ({slug : "searchCompanyKnowledge",// ... other configurationplugins : {openai :openaiPlugin }});
To configure Gentrace's Python SDK with OpenAI, pass a dictionary of OpenAI configuration parameters as the openai_config
keyword parameter.
python
import gentraceimport osgentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))pipeline = gentrace.Pipeline(slug="write-email",openai_config={"api_key": os.getenv("OPENAI_KEY"),})
The configuration options in the dictionary (e.g. the api_key
parameter) are mapped onto the related OpenAI()
or AsyncOpenAI()
constructor parameters.
python
from openai import OpenAI, AsyncOpenAIopenai = OpenAI(# These map to the `openai_config` param dictionaryapi_key=os.getenv("OPENAI_KEY"))async_openai = AsyncOpenAI(# These map to the `openai_config` param dictionaryapi_key=os.getenv("OPENAI_KEY"))
Asynchronous commands
We support instrumenting the asynchronous methods of OpenAI's Python SDK in our advanced SDK. The key is to specify
runner.get_openai(asynchronous=True)
instead of runner.get_openai()
.
python
import asyncioimport osfrom dotenv import load_dotenvload_dotenv()import gentracePIPELINE_SLUG = "compose"gentrace.init(api_key=os.getenv("GENTRACE_API_KEY"))pipeline = gentrace.Pipeline(PIPELINE_SLUG,openai_config={"api_key": os.getenv("OPENAI_KEY")},)async def main():runner = pipeline.start()# ✅ - key differenceopenai = runner.get_openai(asynchronous=True)response = await openai.chat.completions.create(messages=[{"role": "system","content": f"Write a brief history of Maine."},],model="gpt-3.5-turbo")print("LLM response:", response.choices[0].message.content)runner.submit()asyncio.run(main())
Prompt templates
The only difference between the interface of the Gentrace-instrumented SDK and official SDK is how prompts are specified. Consult the section about prompt templates in the OpenAI Simple SDK section earlier in this guide for more information.
Telemetry support
We automatically capture analytics from these OpenAI SDK methods. We plan to support other methods upon request.
In these points, openai
is an instance of the OpenAI
class.
openai.beta.chat.completions.parse()
openai.chat.completions.create()
openai.embeddings.create()
We automatically capture analytics from these OpenAI SDK methods. We plan to support other methods upon request.
openai.beta.chat.completions.parse()
openai.chat.completions.create()
openai.embeddings.create()
Full example
Here's a full example of a PipelineRun
invocation with multiple calls to OpenAI and Pinecone (docs here).
typescript
export async function generateKnowledgeResponse(input: string) {const runner = pipeline.start();// Near type matches for the respective clientsconst openai = runner.openai;const pinecone = runner.pinecone;const embeddingResponse = await openai.embeddings.create({model: 'text-embedding-ada-002',input,});const vectorQuery = // process embedding response// getPinecone() returns a near type match for the Pinecone clientconst vectorResponse = await pinecone.Index('main').query(vectorQuery);const snippets = // process vectorResponseconst response = await openai.completions.create({model: 'text-davinci-003',temperature: 0.7,// We do modify OpenAI in one way, splitting prompt into template and inputs// this allows us to monitor metrics changes due to the templatepromptTemplate: `Context:\n\n{{ snippets }}\n\n{{ input}}`,promptInputs: {input,snippets,},});// Data is submitted asynchronouslyawait runner.submit();return response;}
python
async def generate_knowledge_response(input: str):runner = pipeline.start()# Near type matches for the respective clientsopenai = runner.get_openai()pinecone = runner.get_pinecone()# Create embedding using OpenAI clientembedding_response = openai.embeddings.create (model='text-embedding-ada-002',input=input)vector_query = ... # process embedding responsevector_response = pinecone.Index('main').query(vector_query)snippets = ... # process vector_response# Create completion using OpenAI clientresponse = openai.completions.create(model='text-davinci-003',temperature=0.7,prompt_template='Context:\n\n{{ snippets }}\n\n{{ input}}',prompt_inputs={'input': input,'snippets': snippets})runner.submit()return response
Structured Outputs
Gentrace's OpenAI integration supports structured outputs for chat completions. This allows you to define a specific response structure, making it easier to use the generated content. For more details, see the OpenAI documentation on structured outputs.
Structured outputs are currently in beta with OpenAI. This feature may be subject to changes or updates as OpenAI continues to develop and refine it.
To use structured outputs with the TypeScript SDK, you can specify a response_format
parameter in your chat completion request. Here's an example:
typescript
const Step = z.object({explanation: z.string(),output: z.string(),});const MathReasoning = z.object({steps: z.array(Step),final_answer: z.string(),});// Omit Gentrace pipeline initialization...const runner = pipeline.start();const completion = await runner.openai.beta.chat.completions.parse({model: "gpt-4o-2024-08-06",messages: [{role: "system",content:"You are a helpful math tutor. Guide the user through the solution step by step.",},{ role: "user", content: "how can I solve 8x + 7 = -23" },],response_format: zodResponseFormat(MathReasoning, "math_reasoning"),gentrace: {metadata: {problemType: {type: "string",value: "linear_equation",},},},});
To use structured outputs with the Python SDK, you can specify a response_format
parameter in your chat completion request. Here's an example:
python
class Step(BaseModel):explanation: stroutput: strclass MathReasoning(BaseModel):steps: list[Step]final_answer: strresult = await openai.beta.chat.completions.parse(model="gpt-4o-2024-08-06",messages=[{"role": "system","content": "You are a helpful math tutor. ""Guide the user through the solution step by step.",},{"role": "user", "content": "how can I solve 8x + 7 = -23"},],response_format=MathReasoning,pipeline_slug="math-reasoning-pipeline",gentrace={"metadata": {"problem_type": {"type": "string","value": "algebra"}}},)
When using structured outputs with the Gentrace SDK:
- Parsed information and any refusal details are displayed in the Gentrace UI, allowing you to verify the model's response against your defined structure
- Quickly identify any parsing issues or refusals
- Define evaluators or processors that validate the presence of structured data