OpenAI API Complete Guide: From Chat Completions to Assistants

The OpenAI API has become the foundation for countless AI applications, from chatbots to code assistants to creative tools. But with the rapid evolution of models—from GPT-4 to GPT-4 Turbo to the recent GPT-4o—and the introduction of the Assistants API, understanding the full landscape can be overwhelming.

In this comprehensive guide, I’ll walk you through everything you need to build production applications with the OpenAI API, including practical code examples and best practices I’ve learned from deploying these systems at scale.

What You’ll Learn

Understanding the GPT-4 model family and when to use each
Chat Completions API for conversational applications
Function Calling for tool use and structured outputs
The Assistants API for stateful AI agents
Vision capabilities with GPT-4o
Production best practices: error handling, rate limits, and cost optimization

The GPT-4 Model Family
Getting Started
Chat Completions API
Function Calling
Vision Capabilities
The Assistants API
Streaming Responses
Production Best Practices
Cost Optimization

OpenAI GPT-4 Model Family Comparison — Figure 1: The GPT-4 Model Family – Comparing capabilities, context windows, and pricing

The GPT-4 Model Family

As of September 2024, OpenAI offers several GPT-4 variants, each optimized for different use cases:

Model	Context Window	Best For	Input / Output (per 1M tokens)
gpt-4o	128K tokens	Best overall, vision, fast	$5 / $15
gpt-4o-mini	128K tokens	Cost-effective, fast	$0.15 / $0.60
gpt-4-turbo	128K tokens	Complex reasoning, legacy	$10 / $30
gpt-4	8K tokens	Legacy applications	$30 / $60

💡 Recommendation

For most new applications, gpt-4o is the best choice. It offers the best price-to-performance ratio, includes vision capabilities, and is optimized for speed. Use gpt-4o-mini for high-volume, cost-sensitive applications where top-tier quality isn’t critical.

Getting Started

Installation

# Install the OpenAI Python SDK
pip install openai

# For async support (recommended for production)
pip install openai httpx

Authentication

from openai import OpenAI
import os

# Initialize the client
# The SDK will automatically use OPENAI_API_KEY environment variable
client = OpenAI()

# Or explicitly pass the key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# For organization-specific billing
client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    organization=os.environ.get("OPENAI_ORG_ID"),
)

Chat Completions API

The Chat Completions API is the core interface for interacting with GPT models:

Basic Usage

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant specializing in Python programming."
        },
        {
            "role": "user",
            "content": "How do I read a JSON file in Python?"
        }
    ],
    temperature=0.7,
    max_tokens=500,
)

# Extract the response
answer = response.choices[0].message.content
print(answer)

# Access usage statistics
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

Multi-Turn Conversations

class Conversation:
    """Manage multi-turn conversations with GPT-4o"""
    
    def __init__(self, system_prompt: str, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.messages = [
            {"role": "system", "content": system_prompt}
        ]
    
    def chat(self, user_message: str) -> str:
        # Add user message to history
        self.messages.append({"role": "user", "content": user_message})
        
        # Get response
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            temperature=0.7,
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add assistant response to history
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message
    
    def clear_history(self):
        """Keep only the system prompt"""
        self.messages = [self.messages[0]]


# Usage
conv = Conversation("You are a helpful coding assistant.")
print(conv.chat("What is a decorator in Python?"))
print(conv.chat("Can you show me an example?"))  # Model remembers context

Function Calling

Function calling enables GPT models to invoke external functions, making them ideal for building agents and tool-using applications:

import json
from openai import OpenAI

client = OpenAI()

# Define available functions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the product database",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    },
                    "category": {
                        "type": "string",
                        "enum": ["electronics", "clothing", "books"],
                        "description": "Product category to filter by"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Actual function implementations
def get_weather(location: str, unit: str = "fahrenheit") -> dict:
    """Mock weather API"""
    return {
        "location": location,
        "temperature": 72 if unit == "fahrenheit" else 22,
        "unit": unit,
        "conditions": "sunny"
    }

def search_database(query: str, category: str = None) -> dict:
    """Mock database search"""
    return {
        "results": [
            {"name": f"{query} Product 1", "price": 29.99},
            {"name": f"{query} Product 2", "price": 49.99},
        ],
        "total": 2
    }

# Function dispatcher
available_functions = {
    "get_weather": get_weather,
    "search_database": search_database,
}

def run_conversation(user_message: str):
    messages = [{"role": "user", "content": user_message}]
    
    # First API call
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # Let the model decide
    )
    
    response_message = response.choices[0].message
    
    # Check if the model wants to call functions
    if response_message.tool_calls:
        messages.append(response_message)
        
        # Execute each function call
        for tool_call in response_message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
            
            # Call the function
            function_response = available_functions[function_name](**function_args)
            
            # Add function response to messages
            messages.append({
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": json.dumps(function_response),
            })
        
        # Get final response with function results
        second_response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
        )
        
        return second_response.choices[0].message.content
    
    return response_message.content


# Usage
print(run_conversation("What's the weather in New York?"))
print(run_conversation("Search for laptops in electronics"))

Vision Capabilities

GPT-4o includes powerful vision capabilities, allowing you to analyze images:

from openai import OpenAI
import base64

client = OpenAI()

# Method 1: URL-based image
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image? Describe it in detail."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg",
                        "detail": "high"  # or "low" for faster/cheaper processing
                    }
                }
            ]
        }
    ],
    max_tokens=500,
)

print(response.choices[0].message.content)

# Method 2: Base64-encoded image (for local files)
def analyze_local_image(image_path: str, prompt: str) -> str:
    with open(image_path, "rb") as image_file:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        max_tokens=500,
    )
    
    return response.choices[0].message.content

# Usage
result = analyze_local_image("screenshot.png", "Extract all text from this screenshot")

The Assistants API

The Assistants API enables you to build stateful AI assistants with persistent threads, file handling, and built-in tools:

from openai import OpenAI
import time

client = OpenAI()

# Step 1: Create an Assistant
assistant = client.beta.assistants.create(
    name="Data Analyst",
    instructions="""You are a data analyst assistant. 
    You can analyze CSV files, generate insights, and create visualizations.
    Always explain your analysis clearly.""",
    model="gpt-4o",
    tools=[
        {"type": "code_interpreter"},  # Can execute Python code
        {"type": "file_search"},       # Can search through files
    ]
)

print(f"Created assistant: {assistant.id}")

# Step 2: Create a Thread (represents a conversation)
thread = client.beta.threads.create()
print(f"Created thread: {thread.id}")

# Step 3: Add a message to the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyze the sales data and identify the top 3 products by revenue."
)

# Step 4: Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# Step 5: Wait for completion (polling)
def wait_for_run(thread_id: str, run_id: str) -> str:
    while True:
        run_status = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run_id
        )
        
        if run_status.status == "completed":
            # Get the assistant's response
            messages = client.beta.threads.messages.list(thread_id=thread_id)
            return messages.data[0].content[0].text.value
        elif run_status.status == "failed":
            raise Exception(f"Run failed: {run_status.last_error}")
        elif run_status.status in ["queued", "in_progress"]:
            time.sleep(1)
        else:
            raise Exception(f"Unexpected status: {run_status.status}")

response = wait_for_run(thread.id, run.id)
print(response)

# Continue the conversation in the same thread
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Now create a bar chart of these top products."
)

run2 = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

response2 = wait_for_run(thread.id, run2.id)
print(response2)

File Upload with Assistants

# Upload a file
file = client.files.create(
    file=open("sales_data.csv", "rb"),
    purpose="assistants"
)

# Create a message with the file
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyze this sales data file",
    attachments=[
        {
            "file_id": file.id,
            "tools": [{"type": "code_interpreter"}]
        }
    ]
)

Streaming Responses

For real-time applications, streaming provides a better user experience:

from openai import OpenAI

client = OpenAI()

# Streaming chat completions
def stream_response(user_message: str):
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}],
        stream=True,
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            print(content, end="", flush=True)
            full_response += content
    
    print()  # New line at the end
    return full_response


# Async streaming (for web applications)
async def async_stream_response(user_message: str):
    from openai import AsyncOpenAI
    
    async_client = AsyncOpenAI()
    
    stream = await async_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}],
        stream=True,
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content


# FastAPI example
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.get("/stream")
async def stream_endpoint(prompt: str):
    async def generate():
        async for chunk in async_stream_response(prompt):
            yield f"data: {chunk}\n\n"
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

Production Best Practices

Error Handling and Retries

from openai import OpenAI, RateLimitError, APIError, APIConnectionError
import time
from functools import wraps

def retry_with_exponential_backoff(
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
):
    """Decorator for retrying OpenAI API calls with exponential backoff"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            retries = 0
            delay = base_delay
            
            while retries <= max_retries:
                try:
                    return func(*args, **kwargs)
                except RateLimitError as e:
                    retries += 1
                    if retries > max_retries:
                        raise
                    
                    # Use retry-after header if available
                    retry_after = getattr(e, 'retry_after', None)
                    wait_time = retry_after if retry_after else delay
                    
                    print(f"Rate limited. Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                    delay = min(delay * 2, max_delay)
                    
                except APIConnectionError as e:
                    retries += 1
                    if retries > max_retries:
                        raise
                    
                    print(f"Connection error. Retrying in {delay}s...")
                    time.sleep(delay)
                    delay = min(delay * 2, max_delay)
                    
                except APIError as e:
                    # Don't retry on client errors (4xx)
                    if e.status_code and 400 <= e.status_code < 500:
                        raise
                    
                    retries += 1
                    if retries > max_retries:
                        raise
                    
                    print(f"API error. Retrying in {delay}s...")
                    time.sleep(delay)
                    delay = min(delay * 2, max_delay)
        
        return wrapper
    return decorator


# Usage
@retry_with_exponential_backoff(max_retries=3)
def safe_completion(messages: list) -> str:
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    return response.choices[0].message.content

Timeout Configuration

from openai import OpenAI
import httpx

# Configure timeouts
client = OpenAI(
    timeout=httpx.Timeout(
        connect=5.0,     # Connection timeout
        read=30.0,       # Read timeout
        write=10.0,      # Write timeout
        pool=10.0,       # Pool timeout
    ),
    max_retries=2,  # Built-in retry support
)

Cost Optimization

Strategy	Impact	Implementation
Use gpt-4o-mini	~30x cheaper than gpt-4o	For simpler tasks, classification, extraction
Prompt caching	50-90% reduction	Cache responses for repeated queries
Limit max_tokens	Variable	Set appropriate limits per use case
Batch processing	50% discount	Use Batch API for non-time-sensitive tasks
Compress prompts	10-30% reduction	Remove redundant text, use abbreviations

Implementing Semantic Caching

import hashlib
import json
from typing import Optional
import redis

class SemanticCache:
    """Cache OpenAI responses to reduce API costs"""
    
    def __init__(self, redis_client: redis.Redis, ttl: int = 3600):
        self.redis = redis_client
        self.ttl = ttl
    
    def _generate_key(self, model: str, messages: list) -> str:
        """Generate a cache key from the request"""
        content = json.dumps({"model": model, "messages": messages}, sort_keys=True)
        return f"openai:cache:{hashlib.sha256(content.encode()).hexdigest()}"
    
    def get(self, model: str, messages: list) -> Optional[str]:
        """Get cached response if available"""
        key = self._generate_key(model, messages)
        cached = self.redis.get(key)
        return cached.decode() if cached else None
    
    def set(self, model: str, messages: list, response: str):
        """Cache a response"""
        key = self._generate_key(model, messages)
        self.redis.setex(key, self.ttl, response)


# Usage with OpenAI
def cached_completion(
    client: OpenAI,
    cache: SemanticCache,
    model: str,
    messages: list
) -> str:
    # Check cache first
    cached = cache.get(model, messages)
    if cached:
        return cached
    
    # Make API call
    response = client.chat.completions.create(
        model=model,
        messages=messages,
    )
    
    result = response.choices[0].message.content
    
    # Cache the result
    cache.set(model, messages, result)
    
    return result

Key Takeaways

gpt-4o is the best general-purpose model with vision support
gpt-4o-mini offers excellent value for high-volume applications
Function calling enables building powerful tool-using agents
The Assistants API simplifies building stateful applications
Always implement error handling and retries in production
Use caching and batching to optimize costs
Stream responses for better user experience

References

The OpenAI API continues to evolve rapidly. Stay updated with the official documentation and changelog for the latest features and improvements. With these fundamentals in place, you’re ready to build powerful AI applications.

Building something interesting with the OpenAI API? I’d love to hear about it—connect with me on LinkedIn to share your projects.

Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Searching in