Claude has become the second LLM of choice for AI agencies. Claude 3.5 Sonnet handles complex reasoning and long-context tasks better than most alternatives. Claude Haiku is one of the cheapest fast models available. And Claude 4 Opus is pushing into territory that enterprises are willing to pay for. The result: agencies are routing real workloads — and real spend — through Anthropic's API every day.

The billing problem is identical to OpenAI's: Anthropic invoices your organization as a whole, not per-client. One bill at the end of the month. Every client's usage pooled. If you're running five clients through Claude, you have no native way to know what each one actually cost.

Why Anthropic's Billing Structure Creates a Problem for Agencies

Anthropic's API is priced per million tokens — input and output separately. The invoice you receive shows total token consumption across your account. That's useful for knowing what you owe Anthropic. It tells you nothing about what any individual client is responsible for.

For agencies, this creates a compounding problem. You're likely running Claude across multiple apps — a content generation pipeline, a customer support bot, an internal research assistant. All hitting the same API key. All showing up as a single line on Anthropic's invoice. Without a tracking layer between your app and Anthropic's API, you're flying blind on per-client costs.

The margin risk: If you're billing a client a flat $600/month for AI-powered services, and their actual Anthropic token cost runs $420 — that's a 30% margin. Fine. But if usage spikes and their cost hits $590, you're at 1.7% margin on that client. You won't know until the invoice lands. By then it's already happened.

Claude Model Pricing: Haiku, Sonnet, and Opus

The cost gap between Claude models is significant. Routing to the wrong model for a given task — or letting clients run Opus when Haiku would do — can multiply costs by 10x or more. Here's a reference table for current Anthropic pricing:

Model Input (per 1M tokens) Output (per 1M tokens)
Claude 3 Haiku $0.25 $1.25
Claude 3.5 Haiku $0.80 $4.00
Claude 3.5 Sonnet $3.00 $15.00
Claude 3 Opus $15.00 $75.00
Claude 4 Opus $15.00 $75.00

To calculate a single call: (input tokens × input price / 1,000,000) + (output tokens × output price / 1,000,000). A 1,000-token prompt with a 500-token response from Claude 3.5 Sonnet costs: (1,000 × $3.00 / 1M) + (500 × $15.00 / 1M) = $0.003 + $0.0075 = $0.0105 per call.

That same call on Claude 3 Haiku: $0.00025 + $0.000625 = $0.000875. A 12x difference. If a client is running 50,000 calls a month and you're using Sonnet where Haiku would do, that's the difference between $437 and $52.50 in raw token costs for that client alone.

Why Spreadsheet Tracking Fails for Claude Usage

Most agencies start with a spreadsheet. Log calls manually, pull the monthly invoice, try to allocate costs by client. It works for a few weeks. Here's where it breaks:

1. Multiple models, no automatic attribution

Claude's model family spans a wide price range. Agencies typically use Haiku for high-volume, low-complexity tasks (classification, summarization, quick responses) and Sonnet or Opus for tasks that require deeper reasoning. Each client may be hitting different models depending on which pipeline they're using. Manual tracking can't capture this automatically — you'd need to log every API call by client and model, manually, in real time. That's not tracking, that's a second job.

2. Context windows inflate costs invisibly

Claude models support long context windows — up to 200,000 tokens on some versions. This is a feature, but it's also a cost multiplier. A client whose workflow regularly passes large documents or maintains long conversation histories will generate far more input tokens than one running short, stateless queries. Spreadsheets don't catch this drift until the invoice arrives.

3. Cross-client usage looks identical at the API level

Anthropic's API has no concept of end-clients. Every call from your codebase looks the same — just tokens in, tokens out. Without a tagging layer in your application code, there's no way to attribute costs after the fact. You can't go back and split last month's invoice by client once it's been paid.

Automated Claude API Cost Tracking

The right architecture intercepts each Anthropic API call, captures the metadata (model, token counts, timestamp, client identifier), and attributes the cost immediately. The key requirements for a working system:

  • Per-call cost capture — input tokens × model input rate + output tokens × model output rate, calculated at call time
  • Client attribution — each call tagged to a client ID before it's logged, not retroactively
  • Multi-model awareness — the tracking layer needs to know Haiku costs differently than Opus, and apply the right rate to each call
  • Markup application — raw Anthropic costs multiplied by your margin % before surfacing as client-facing costs
  • Budget thresholds — alerts when a client's monthly spend approaches or exceeds what you've priced them at

With that in place, you're not waiting for Anthropic's invoice to understand costs. You open a dashboard, select a client, and see their Claude spend for the month — broken down by model, with your markup applied, and a budget indicator showing how close they are to their ceiling.

What This Looks Like in Practice

An agency running three clients through Claude might look like this at month-end:

  • Client A (content pipeline, Haiku): 2.1M input tokens, 800K output tokens → $0.525 + $1.00 = $1.53 raw cost. With 40% markup: $2.14 — well inside their $50 AI budget
  • Client B (research assistant, Sonnet): 480K input tokens, 220K output tokens → $1.44 + $3.30 = $4.74 raw cost. With 30% markup: $6.16
  • Client C (document analysis, Opus): 150K input tokens, 90K output tokens → $2.25 + $6.75 = $9.00 raw cost. With 25% markup: $11.25 — approaching their monthly AI budget

Without per-client tracking, Client C's Opus usage is invisible until it hits your invoice. With tracking, you know mid-month that they're trending toward their budget and can either adjust the model routing or update their pricing before the cycle closes.

TokenTally for Anthropic API Cost Tracking

TokenTally tracks Claude API usage per-client alongside OpenAI, Google, and Azure — all in one place. Pre-loaded Anthropic model pricing covers Haiku, Sonnet, and Opus. Costs are calculated per-call and attributed to whichever client the API call is logged against.

  • All Claude model rates pre-loaded — Haiku, Sonnet, and Opus pricing built in and editable if Anthropic updates rates
  • Per-client cost views — see exactly what each client has spent on Claude this month, by model
  • Markup automation — your margin % applied automatically so client-facing costs are always markup-adjusted
  • Budget alerts — color-coded thresholds (green <80%, yellow 80–99%, red ≥100%) so overages surface before the invoice
  • CSV and invoice export — bill clients with itemized Anthropic costs instead of a flat AI fee

One tracking layer, all your Claude clients, real per-client cost visibility every month. If you're also running OpenAI or Gemini alongside Claude, the LLM API pricing comparison shows how Haiku, Sonnet, and Opus stack up against GPT-4o and Gemini Pro in real agency workloads. For the OpenAI-specific equivalent — model pricing, per-call math, and why org-level billing creates the same attribution problem — see the OpenAI API cost calculator guide. To implement the full tracking pipeline — from centralizing keys to automated billing reports — follow the multi-client agency setup guide. And if the underlying problem is why agencies lose money without per-client token tracking, that's a good place to start before building the solution.