Distributed Tracing

What is Distributed Tracing?

Distributed tracing lets you track complex LLM workflows where one request triggers multiple child requests. Visualize the entire execution tree, understand dependencies, and debug multi-step AI operations.

When to Use Traces

Agent Workflows

Track agents that make multiple LLM calls to plan, execute, and reflect

RAG Pipelines

Trace embedding generation, retrieval, and final generation steps

Parallel Processing

Monitor concurrent LLM calls and their relationships

Complex Chains

Debug LangChain, LlamaIndex, or custom chains

Traces vs Sessions

Sessions group related requests chronologically (like chat messages).Traces show parent-child relationships between requests (like function calls).

Feature	Sessions	Traces
Relationship	Sequential	Parent-Child
Use Case	Conversations	Workflows
Visualization	Timeline	Tree
Example	Multi-turn chat	Agent with sub-tasks

How Tracing Works

Helicone supports OpenTelemetry-style tracing with parent-child relationships:

Parent Request (Main task)
├── Child 1 (Subtask A)
│   ├── Grandchild 1 (Step A.1)
│   └── Grandchild 2 (Step A.2)
└── Child 2 (Subtask B)
    └── Grandchild 3 (Step B.1)

Using Node IDs

Create traces by setting parent-child relationships with the Helicone-Node-Id header:

Helicone-Node-Id

string

Unique identifier for this request node. Format: parent_id:child_id

For root requests: {unique-id}
For child requests: {parent-id}:{child-id}

Basic Trace Example

from openai import OpenAI
import uuid

client = OpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": "Bearer YOUR_HELICONE_KEY"
    }
)

# Parent request
parent_id = str(uuid.uuid4())
parent_response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Create a travel plan for Tokyo"}],
    extra_headers={
        "Helicone-Node-Id": parent_id
    }
)

# Child request 1: Research attractions
child1_id = str(uuid.uuid4())
child1_response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What are top attractions in Tokyo?"}],
    extra_headers={
        "Helicone-Node-Id": f"{parent_id}:{child1_id}"
    }
)

# Child request 2: Research restaurants
child2_id = str(uuid.uuid4())
child2_response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What are best restaurants in Tokyo?"}],
    extra_headers={
        "Helicone-Node-Id": f"{parent_id}:{child2_id}"
    }
)

# Grandchild request: Get specific restaurant details
grandchild_id = str(uuid.uuid4())
grandchild_response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me about Sukiyabashi Jiro"}],
    extra_headers={
        "Helicone-Node-Id": f"{parent_id}:{child2_id}:{grandchild_id}"
    }
)

Agent Trace Example

Track a ReAct-style agent:

import uuid
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://oai.helicone.ai/v1",
    default_headers={"Helicone-Auth": "Bearer YOUR_HELICONE_KEY"}
)

def run_agent(task: str):
    agent_id = str(uuid.uuid4())
    
    # Step 1: Plan
    plan_id = str(uuid.uuid4())
    plan = client.chat.completions.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": f"Plan how to accomplish: {task}"
        }],
        extra_headers={
            "Helicone-Node-Id": f"{agent_id}:{plan_id}",
            "Helicone-Property-Step": "planning"
        }
    )
    
    # Step 2: Execute actions
    actions = parse_plan(plan.choices[0].message.content)
    
    for i, action in enumerate(actions):
        action_id = str(uuid.uuid4())
        result = client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": f"Execute action: {action}"
            }],
            extra_headers={
                "Helicone-Node-Id": f"{agent_id}:{action_id}",
                "Helicone-Property-Step": f"action_{i}"
            }
        )
    
    # Step 3: Reflect
    reflect_id = str(uuid.uuid4())
    reflection = client.chat.completions.create(
        model="gpt-4",
        messages=[{
            "role": "user",
            "content": "Review the results and provide final answer"
        }],
        extra_headers={
            "Helicone-Node-Id": f"{agent_id}:{reflect_id}",
            "Helicone-Property-Step": "reflection"
        }
    )
    
    return reflection

# Run agent
result = run_agent("Research AI safety concerns")

RAG Pipeline Trace

Trace retrieval-augmented generation:

import uuid
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://oai.helicone.ai/v1",
    default_headers={"Helicone-Auth": "Bearer YOUR_HELICONE_KEY"}
)

def rag_query(query: str):
    trace_id = str(uuid.uuid4())
    
    # Step 1: Generate embedding for query
    embed_id = str(uuid.uuid4())
    embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
        extra_headers={
            "Helicone-Node-Id": f"{trace_id}:{embed_id}",
            "Helicone-Property-Stage": "embedding"
        }
    )
    
    # Step 2: Retrieve relevant documents (simulated)
    docs = vector_search(embedding.data[0].embedding)
    
    # Step 3: Generate answer with context
    gen_id = str(uuid.uuid4())
    answer = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context: {docs}\n\nQuestion: {query}"}
        ],
        extra_headers={
            "Helicone-Node-Id": f"{trace_id}:{gen_id}",
            "Helicone-Property-Stage": "generation"
        }
    )
    
    return answer

result = rag_query("What is quantum computing?")

Parallel Request Tracing

Trace concurrent requests:

import asyncio
import uuid
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="YOUR_OPENAI_KEY",
    base_url="https://oai.helicone.ai/v1",
    default_headers={"Helicone-Auth": "Bearer YOUR_HELICONE_KEY"}
)

async def parallel_analysis(topic: str):
    parent_id = str(uuid.uuid4())
    
    # Create multiple parallel analysis tasks
    tasks = [
        analyze_aspect(parent_id, topic, "technical", "Technical analysis"),
        analyze_aspect(parent_id, topic, "business", "Business analysis"),
        analyze_aspect(parent_id, topic, "ethical", "Ethical analysis"),
    ]
    
    results = await asyncio.gather(*tasks)
    return results

async def analyze_aspect(parent_id: str, topic: str, aspect: str, prompt: str):
    child_id = str(uuid.uuid4())
    
    response = await client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": f"{prompt} of {topic}"}],
        extra_headers={
            "Helicone-Node-Id": f"{parent_id}:{child_id}",
            "Helicone-Property-Aspect": aspect
        }
    )
    
    return response

# Run parallel analysis
results = asyncio.run(parallel_analysis("AI regulation"))

Custom Trace Logging

For non-OpenAI requests or custom tracing:

// Log custom traces via API
await fetch('https://api.helicone.ai/v1/trace/custom/log', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_HELICONE_KEY',
    'Content-Type': 'application/json',
    'Helicone-Node-Id': `${parentId}:${childId}`
  },
  body: JSON.stringify({
    providerRequest: {
      url: "https://api.anthropic.com/v1/messages",
      json: {
        model: "claude-3-opus-20240229",
        messages: [{ role: "user", content: "Hello" }]
      },
      meta: { 'Helicone-Auth': 'Bearer YOUR_HELICONE_KEY' }
    },
    providerResponse: {
      json: responseData,
      status: 200,
      headers: {}
    },
    timing: {
      startTime: { seconds: startTime, nanos: 0 },
      endTime: { seconds: endTime, nanos: 0 }
    },
    provider: "anthropic"
  })
});

Viewing Traces

Visualize traces in the Helicone dashboard:

Navigate to Requests

Go to the Requests page and find a traced request

View Trace Tree

Click the trace icon to see the full parent-child hierarchy

Analyze Each Node

Click any node to see its request details, cost, and latency

Identify Bottlenecks

Find slow or expensive operations in the trace tree

Trace Metrics

Helicone calculates metrics across traces:

Total Cost: Sum of all nodes in the trace
Total Duration: Time from root start to last leaf completion
Node Count: Number of requests in the trace
Max Depth: Deepest level in the trace tree
Success Rate: Percentage of successful nodes

Best Practices

Use Meaningful Node IDs

Generate unique IDs but keep them traceable:

import uuid
parent_id = str(uuid.uuid4())
child_id = str(uuid.uuid4())
node_id = f"{parent_id}:{child_id}"

Add Context with Properties

Use custom properties to annotate trace nodes:

extra_headers={
    "Helicone-Node-Id": node_id,
    "Helicone-Property-Step": "planning",
    "Helicone-Property-Iteration": "1"
}

Limit Trace Depth

Keep traces manageable. Very deep traces (>10 levels) can be hard to visualize and debug.

Combine with Sessions

Use both tracing (for workflow structure) and sessions (for conversation context):

extra_headers={
    "Helicone-Node-Id": f"{parent_id}:{child_id}",
    "Helicone-Session-Id": session_id
}

Handle Errors Gracefully

Continue tracing even if some nodes fail. This helps debug failures:

try:
    result = make_llm_call(node_id)
except Exception as e:
    # Log error but continue trace
    log_error(node_id, e)

Tracing Integrations

OpenTelemetry

Helicone supports OTEL trace format for compatibility with existing instrumentation

LangChain

Automatic tracing for LangChain chains and agents

LlamaIndex

Trace RAG pipelines and query engines

Custom Frameworks

Use custom trace logging API for any framework

Next Steps

Session Tracking

Learn about grouping related requests

Custom Properties

Add metadata to trace nodes

Request Logging

Understand individual request tracking

User Metrics

Track traces per user

​What is Distributed Tracing?

​When to Use Traces

Agent Workflows

RAG Pipelines

Parallel Processing

Complex Chains

​Traces vs Sessions

​How Tracing Works

​Using Node IDs

​Basic Trace Example

​Agent Trace Example

​RAG Pipeline Trace

​Parallel Request Tracing

​Custom Trace Logging

​Viewing Traces

​Trace Metrics

​Best Practices

​Tracing Integrations

OpenTelemetry

LangChain

LlamaIndex

Custom Frameworks

​Next Steps

Session Tracking

Custom Properties

Request Logging

User Metrics

What is Distributed Tracing?

When to Use Traces

Traces vs Sessions

How Tracing Works

Using Node IDs

Basic Trace Example

Agent Trace Example

RAG Pipeline Trace

Parallel Request Tracing

Custom Trace Logging

Viewing Traces

Trace Metrics

Best Practices

Tracing Integrations

Next Steps