Architecture Overview
Helicone consists of five main components working together to provide seamless LLM observability:Core Components
1. Worker (Cloudflare Workers)
Location:worker/Technology: TypeScript on Cloudflare’s global edge network
Purpose: Request proxy and logging
The Worker is the entry point for all LLM requests. It intercepts requests, forwards them to the appropriate provider, and logs metadata with minimal latency overhead (less than 50ms).
- Proxy requests to LLM providers (OpenAI, Anthropic, Google, etc.)
- Extract request/response metadata (tokens, latency, cost)
- Implement caching, rate limiting, and fallback logic
- Stream responses back to clients without buffering
- Asynchronously send logs to Jawn
- Deployed to 300+ global edge locations
- Less than 50ms latency overhead on average
- Handles 10,000+ requests/second per deployment
- Automatic scaling with no cold starts
2. Jawn (API Server)
Location:valhalla/jawn/Technology: Express + TypeScript with Tsoa for type-safe APIs
Purpose: Centralized API server for log collection and queries
Jawn (named after the Philadelphia slang for “thing”) is the heart of Helicone’s backend. It collects logs from Workers, serves the Web dashboard, and provides the REST API.
- Receive logs from Workers and insert into ClickHouse
- Serve REST API for dashboard queries
- Handle webhooks, exports, and integrations
- Manage prompt versioning and deployment
- Process batch operations and analytics queries
- Implement authentication and authorization
POST /v1/request/query- Query requests with filtersPOST /v1/session/query- Retrieve session tracesPOST /v1/prompt-2025- Create/update promptsGET /v1/request/{id}- Fetch individual request details
- Express.js for HTTP routing
- Tsoa for OpenAPI spec generation
- Auto-generated TypeScript types for type safety
3. Web (Next.js Dashboard)
Location:web/Technology: Next.js 14 (App Router) + React + TailwindCSS
Purpose: User-facing dashboard and UI
The Web component provides the visual interface for exploring requests, debugging sessions, managing prompts, and analyzing costs.
- Request explorer with advanced filtering
- Session tree visualization for multi-step workflows
- Cost and usage dashboards with charts
- Prompt management UI with versioning
- Playground for testing prompts
- User settings and API key management
- Next.js 14 (App Router) for SSR and routing
- React for components
- TailwindCSS for styling
- Tremor for charts and analytics
- Recharts for custom visualizations
4. Supabase (Application Database)
Location:supabase/Technology: PostgreSQL + Supabase Auth
Purpose: User authentication and application state
Supabase stores user accounts, API keys, organization settings, prompt metadata, and configuration. It does NOT store request/response bodies—those go to ClickHouse.
auth.users- User accounts and authenticationorganization- Multi-tenant org structureprompt_v2- Prompt definitions and versionsapi_keys- API key managementwebhooks- Webhook configurationsrate_limits- Custom rate limit rules
- Built-in authentication with JWTs
- Real-time subscriptions for live updates
- Row-level security (RLS) for multi-tenancy
- Easy local development with Docker
5. ClickHouse (Analytics Database)
Location:clickhouse/Technology: ClickHouse (columnar OLAP database)
Purpose: High-performance storage for request logs and analytics
ClickHouse is a columnar database optimized for analytical queries. It enables Helicone to query millions of requests in milliseconds.
- 100-1000x faster than PostgreSQL for analytics queries
- Handles billions of rows without performance degradation
- Efficient compression (10:1 ratio on average)
- Columnar storage perfect for aggregations
request_response_versioned- All LLM requests and responsesrequest_response_log- Immutable append-only logproperties_v2- Custom properties (user IDs, tags, etc.)cache_hits- Cache performance metrics
- Full-text search across millions of requests: less than 100ms
- Cost aggregations by user: less than 200ms
- Session tree reconstruction: less than 50ms
Request Flow Walkthrough
Let’s trace a single LLM request through the entire Helicone system:Client sends request
Your application sends a request to
https://ai-gateway.helicone.ai using the OpenAI SDK.Worker intercepts request
The request hits Cloudflare’s global edge network and routes to the nearest Worker.The Worker:
- Authenticates the API key
- Checks for cached responses (if caching enabled)
- Determines the target provider (OpenAI, Anthropic, etc.)
Worker proxies to provider
The Worker forwards the request to the LLM provider’s API (e.g.,
api.openai.com).For streaming requests, the Worker streams chunks back to the client in real-time without buffering.Provider responds
The LLM provider (OpenAI, Anthropic, etc.) sends the response back through the Worker.The Worker:
- Calculates latency (total time, time to first token)
- Counts tokens (from response headers or body)
- Calculates cost using Helicone’s pricing database
Worker logs to Jawn (async)
The Worker asynchronously sends the log payload to Jawn via HTTP POST:This happens after the response is sent to the client, adding zero latency.
Jawn writes to ClickHouse
Jawn receives the log and inserts it into ClickHouse’s
request_response_versioned table.Custom properties (user ID, environment, etc.) are stored in the properties_v2 table.Dashboard queries Jawn
When you open the Web dashboard, it queries Jawn’s REST API:Jawn queries ClickHouse and returns results in less than 100ms, even with millions of rows.
Deployment Options
Helicone offers three deployment models:1. Helicone Cloud (Managed)
Best for: Most teams who want zero infrastructure management
- Fully managed by Helicone
- SOC 2 compliant
- 99.9% uptime SLA
- US and EU regions available
- Automatic updates and scaling
2. Self-Hosted (Docker)
Best for: Teams with compliance requirements or who want full controlDeploy Helicone in your own infrastructure using Docker Compose:Includes:
- All five components (Web, Worker, Jawn, Supabase, ClickHouse)
- MinIO for object storage
- Pre-configured with sane defaults
3. Enterprise (Helm Chart)
Best for: Large enterprises with Kubernetes clustersProduction-ready Helm chart for Kubernetes deployments:
- High availability with automatic failover
- Horizontal scaling for all components
- Observability with Prometheus/Grafana
- Backup and disaster recovery
Performance Characteristics
Latency Overhead
- P50: less than 30ms
- P95: less than 50ms
- P99: less than 100ms
Throughput
- 10,000+ requests/second per region
- Auto-scales to millions of requests/day
- No rate limits on logging
Storage
- Unlimited request storage
- 10:1 compression ratio
- Cost: ~$0.01 per 1,000 requests
Query Speed
- Full-text search: less than 100ms
- Aggregations: less than 200ms
- Session reconstruction: less than 50ms
Data Flow & Security
What Gets Stored Where?
| Data Type | Storage Location | Retention |
|---|---|---|
| Request/response bodies | ClickHouse | Unlimited (configurable) |
| User accounts & auth | Supabase (PostgreSQL) | Permanent |
| API keys | Supabase (encrypted) | Permanent |
| Prompt templates | Supabase | Permanent with versioning |
| Analytics aggregates | ClickHouse | Unlimited |
| Custom properties | ClickHouse | Unlimited |
Security Measures
Encryption
Encryption
- In transit: All traffic uses TLS 1.3
- At rest: AES-256 encryption for stored data
- API keys: Hashed with bcrypt before storage
- Secrets: Managed via secure environment variables
Compliance
Compliance
- SOC 2 Type II certified
- GDPR compliant with EU region option
- CCPA compliant
- Data residency: US and EU regions available
Data Privacy
Data Privacy
- Omit logs: Disable request/response body storage
- PII redaction: Automatically detect and redact sensitive data
- Self-hosting: Full control over your data
Learn More About Security
Read our data autonomy and security documentation
Monitoring & Observability
Helicone itself is instrumented for observability:- Metrics: Prometheus metrics for all components
- Logs: Structured logging with correlation IDs
- Tracing: Distributed tracing across Worker → Jawn → ClickHouse
- Alerts: PagerDuty integration for incidents
Scaling Considerations
Worker scaling (automatic)
Cloudflare Workers auto-scale globally with no configuration needed. You can handle sudden traffic spikes without manual intervention.
Jawn scaling (horizontal)
Deploy multiple Jawn instances behind a load balancer. Jawn is stateless and scales horizontally.
ClickHouse scaling (vertical + sharding)
- Vertical: Increase CPU/RAM for single-node deployments
- Horizontal: Use ClickHouse clusters with sharding for billions of rows
Development & Testing
Local Development Setup
See the AGENTS.md file for detailed development guidelines.
Testing
- Worker: Vitest for unit tests
- Jawn: Jest for API tests
- Web: Jest + React Testing Library
- Integration: Python integration tests in
tests/
Technical Decisions & Trade-offs
Why Cloudflare Workers?
Why Cloudflare Workers?
Pros:
- Global edge deployment (300+ locations)
- Sub-50ms cold starts
- Auto-scaling without configuration
- Generous free tier
- Stateless (no local disk)
- CPU time limits (requires efficient code)
Why ClickHouse over PostgreSQL?
Why ClickHouse over PostgreSQL?
Pros:
- 100-1000x faster for analytical queries
- Columnar storage ideal for aggregations
- Built-in time-series capabilities
- Eventually consistent (not ACID)
- Less flexible schema changes
- Requires separate database from Supabase
Why Supabase for auth?
Why Supabase for auth?
Pros:
- Built-in JWT authentication
- Row-level security for multi-tenancy
- Real-time subscriptions
- Easy local development
- Additional database dependency
- Tied to PostgreSQL ecosystem
Next Steps
Now that you understand Helicone’s architecture:Self-Host Helicone
Deploy Helicone in your own infrastructure with Docker
Explore Features
Learn about sessions, prompts, and advanced features
API Reference
Integrate directly with Helicone’s REST API
Contributing Guide
Contribute to the open-source project
Questions?
For architecture-specific questions:- Join Discord and ask in
#architecture - Open a GitHub discussion
- Email engineering@helicone.ai