Skip to main content
Learn how to quickly identify and fix issues in production LLM applications using Helicone’s debugging features.

What You’ll Learn

  • Set up comprehensive request logging
  • Use filters to isolate problematic requests
  • Debug errors and unexpected outputs
  • Track prompt performance over time
  • Identify and fix latency issues

Prerequisites

  • Helicone API key (get one here)
  • An LLM application in production
  • Basic understanding of your application’s architecture

Common Debugging Scenarios

This tutorial covers:
  1. Finding and fixing errors (4XX/5XX)
  2. Debugging unexpected model outputs
  3. Identifying latency bottlenecks
  4. Tracking down cost spikes
  5. Investigating user-reported issues

Step 1: Enable Comprehensive Logging

Add headers to capture debugging context:
import { OpenAI } from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.helicone.ai/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

// Add debugging context to every request
function makeRequest(userId: string, feature: string, input: string) {
  return client.chat.completions.create(
    {
      model: "gpt-4o",
      messages: [{ role: "user", content: input }],
    },
    {
      headers: {
        // Essential debugging headers
        "Helicone-User-Id": userId,
        "Helicone-Property-Feature": feature,
        "Helicone-Property-Environment": process.env.NODE_ENV,
        "Helicone-Property-Version": "v2.1.0",
        
        // Optional: Add custom request ID for correlation
        "Helicone-Request-Id": `${feature}-${Date.now()}`,
      },
    }
  );
}
Key Debugging Headers:
  • Helicone-User-Id: Identify which users experience issues
  • Helicone-Property-Feature: Isolate problems to specific features
  • Helicone-Property-Environment: Separate dev/staging/production issues
  • Helicone-Property-Version: Track which code version has problems

Scenario 1: Finding and Fixing Errors

Problem: Users reporting 500 errors

1

Navigate to Requests Dashboard

2

Filter for Errors

Apply filters:
Status Code: 500 (or 4XX/5XX)
Time Range: Last 24 hours
Environment: production
3

Identify Patterns

Look at the error list:
  • Are errors concentrated on a specific feature?
  • Affecting specific users?
  • Started at a specific time?
Example findings:
23 errors in "document-analysis" feature
All started after 2:30 PM
Error message: "Rate limit exceeded"
4

Inspect Request Details

Click on an error to see:
  • Full request payload
  • Error response
  • Model used
  • Request headers
  • Timestamp
{
  "error": {
    "message": "Rate limit exceeded for gpt-4",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
5

Fix the Issue

Based on findings:
// Add retry logic with exponential backoff
import { setTimeout } from 'timers/promises';

async function makeRequestWithRetry(
  userId: string,
  feature: string,
  input: string,
  maxRetries = 3
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await makeRequest(userId, feature, input);
    } catch (error: any) {
      if (error.status === 429 && attempt < maxRetries - 1) {
        const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
        console.log(`Rate limited, retrying in ${delay}ms...`);
        await setTimeout(delay);
      } else {
        throw error;
      }
    }
  }
}
6

Monitor the Fix

Set up an alert to catch future issues:
  1. Go to Settings → Alerts
  2. Create alert:
    • Metric: Error Rate
    • Threshold: > 5%
    • Time window: 10 minutes
    • Filter: Feature = “document-analysis”
  3. Add Slack/email notification

Scenario 2: Debugging Unexpected Outputs

Problem: Model generating incorrect format

1

Find Problematic Requests

Filter requests:
Feature: data-extraction
Time Range: Last 7 days
Sort by: Recent first
2

Review Request/Response

Click on a request to see:
// Request
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "Extract data as JSON"
    },
    {
      "role": "user",
      "content": "Name: John Doe, Age: 30"
    }
  ]
}

// Response (incorrect)
"The person's name is John Doe and they are 30 years old."

// Expected
{"name": "John Doe", "age": 30}
3

Identify the Issue

The prompt is too vague. The model needs clearer instructions.
4

Test Fix in Dashboard

Use Helicone’s prompt testing feature or test locally:
// Improved prompt
const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `Extract data and respond ONLY with valid JSON.
        
Format: {"name": "string", "age": number}
        
Do not include any explanation or additional text.`
      },
      {
        role: "user",
        content: "Name: John Doe, Age: 30"
      }
    ],
    temperature: 0,  // More deterministic
  },
  {
    headers: {
      "Helicone-Property-Version": "v2.2.0",  // Track new version
    }
  }
);
5

Compare Versions

After deploying, compare old vs. new:
Filter 1: Version = v2.1.0
Filter 2: Version = v2.2.0

Compare success rates, costs, latencies

Scenario 3: Identifying Latency Issues

Problem: Slow response times

1

Filter Slow Requests

Latency: > 5000ms (5 seconds)
Time Range: Last 24 hours
Sort by: Latency (descending)
2

Analyze Patterns

Look for:
  • Specific models (GPT-4 vs. GPT-4o-mini)
  • Request size (token count)
  • Features with long prompts
Example findings:
Feature: report-generation
Average latency: 12.3s
Token count: 8,500 tokens (very large)
Model: gpt-4o
3

Optimize

// Before: Including entire document
const largePrompt = `Analyze this document:\n${fullDocument}`;

// After: Summarize or chunk first
const optimizedPrompt = `Analyze this summary:\n${summarizeDocument(fullDocument)}`;
4

Set Latency Alert

Create alert:
  • Metric: Latency
  • Threshold: P95 > 5000ms
  • Time window: 1 hour
  • Feature: report-generation

Scenario 4: Investigating Cost Spikes

Problem: Unexpected $500 charge

1

View Cost Dashboard

Go to Helicone Dashboard and check:
  • Daily cost trend (when did spike occur?)
  • Cost by feature
  • Cost by user
2

Filter High-Cost Requests

Date: [Date of spike]
Sort by: Cost (descending)
Findings:
Top request: $12.50 (!)  
User: user-789
Feature: document-analysis
Tokens: 125,000 (prompt) + 8,000 (completion)
3

Investigate the Request

Click on expensive request:
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "[Entire 500-page PDF content]..."  // Problem!
    }
  ]
}
User uploaded massive document without chunking.
4

Implement Safeguards

function validateInput(text: string): void {
  const estimatedTokens = text.length / 4; // Rough estimate
  const MAX_TOKENS = 50000;
  
  if (estimatedTokens > MAX_TOKENS) {
    throw new Error(
      `Input too large (${estimatedTokens} tokens). Maximum: ${MAX_TOKENS}`
    );
  }
}

// Add cost-per-request limit
function checkCostLimit(user: User): void {
  if (user.tier === "free" && user.monthlySpend > 10) {
    throw new Error("Monthly limit reached. Upgrade to continue.");
  }
}

Scenario 5: User-Reported Issue

Problem: “User user-456 says chatbot gave wrong answer yesterday”

1

Find User's Requests

Filter by:
- User ID: user-456
- Date: [Yesterday]
- Feature: chatbot
2

Review Conversation

If using sessions:
Go to: Sessions
Filter: User ID = user-456
Find relevant session by timestamp
View entire conversation flow to understand context.
3

Identify Issue

Review the specific request/response:
  • Was context missing?
  • Did model hallucinate?
  • Was there a misunderstanding?
Share findings with user:
Found the issue: The chatbot didn't have access to the
latest product pricing, which was updated yesterday morning.
We're adding a knowledge base refresh to fix this.

Advanced: Custom Request IDs

Correlate Helicone logs with your application logs:
const appRequestId = generateId(); // Your app's ID

// Log in your application
logger.info("Starting LLM request", { requestId: appRequestId });

// Use same ID in Helicone
await client.chat.completions.create(
  { /* ... */ },
  {
    headers: {
      "Helicone-Request-Id": appRequestId,
    }
  }
);

// Later, search Helicone by your ID
// URL: https://helicone.ai/requests?requestId=your-app-id-123

Best Practices

Add context headers: Include user ID, feature, environment, and version in every request
Use sessions for multi-step flows: Group related requests to see full context
Set up alerts early: Don’t wait for users to report issues
Compare before/after: Use version tags to measure impact of changes
Remove sensitive information from prompts before logging. Consider using environment variables or secure vaults for API keys.

Debugging Checklist

When investigating an issue:
  • Filter by relevant properties (user, feature, environment)
  • Check error rates and status codes
  • Review request/response payloads
  • Look for patterns (time-based, user-based, feature-based)
  • Check related requests (sessions)
  • Compare with working requests
  • Test fix with version tracking
  • Set up alert to catch recurrence

Next Steps

Alerts

Set up proactive monitoring for errors and anomalies

Sessions

Track multi-step workflows for better context

Custom Properties

Add metadata for powerful filtering and debugging

Webhooks

Get notified immediately when issues occur