You've solved the tracking problem — you know exactly how much each client costs you in AI tokens every month. Now comes the harder question: how do you actually charge them?
Most AI agencies handle this badly. They absorb API costs as overhead and bake a flat estimate into their retainer. Or they slap on a vague "AI usage" line item with no real math behind it. Both approaches bleed money. Here's how to do it right.
The Three Billing Approaches Agencies Use
When it comes to charging clients for AI costs, most agencies fall into one of three camps:
1. Flat fee with no AI tracking
"We'll handle your AI workflows for $4,000/month." The problem: you have no idea if their actual AI usage is $400 or $4,000. High-consumption clients silently destroy your margins. Low-consumption clients are your profit centers, but you don't know that either.
2. Per-token pass-through at cost
"We'll pass through your OpenAI costs at cost." You bill what you paid, exactly. The problem: API calls are a cost of delivery, not a profit center. You're doing real work — logging, attributing, reporting — and getting nothing for it. You also absorb every price cut and model change without any upside.
3. Per-token billing with markup
"Your AI usage this month was $347. We apply a 25% markup. Your total AI line item is $434." This is the right model. You recover your costs, earn a margin on the service component of AI delivery, and have data to back every line item on the invoice.
The key insight: AI agency billing isn't about overcharging clients — it's about charging accurately. A 25% markup on token costs means you're charging for the service of logging, attributing, reporting, and managing AI usage, not padding invoices. Clients who understand this don't push back. Clients who push back are the ones you've been subsidizing.
Why Markup Beats Flat Fees Every Time
Flat fees work when costs are predictable. AI usage isn't. A client's token consumption can swing 2–3x month-to-month based on project scope, model selection, and iteration cycles. When you bill a flat fee:
- A slow month is fine — you're profitable
- A heavy-usage month means you're paying out of pocket
- Model price changes hit your margin directly
- You can't accurately price new clients without historical data
With a per-token billing model and a set markup, you solve all of this. Usage fluctuates. Your margin per dollar of AI spend stays constant. The client sees an itemized AI cost line on their invoice. You have a defensible number you can walk through in any review.
What Clients Actually See
When billing is done right, clients see a monthly breakdown that looks something like this:
- OpenAI GPT-4o: 2.1M tokens @ $2.50/1M = $5.25
- Anthropic Claude 3.5 Sonnet: 890K tokens @ $3.00/1M = $2.67
- Google Gemini 1.5 Pro: 450K tokens @ $1.25/1M = $0.56
- Base AI cost: $8.48
- Service markup (25%): $2.12
- Total AI line item: $10.60
That's a real invoice line. The client can see exactly where the numbers come from, which models they're using most, and what the service markup covers. This level of transparency builds trust — and makes it much easier to have the conversation when a client's usage spikes and it's time to reprice.
Bonus benefit: When clients see which models are driving their costs, they often ask how to optimize. This opens a consulting conversation — and you can pass those savings to clients or keep them as margin. Either way, you're having a smarter business conversation.
The Tools That Make This Practical
Doing per-client AI billing manually is not scalable. You need:
- Per-client token attribution — every API call tagged to the right client
- Multi-model cost aggregation — OpenAI, Anthropic, Google, and others, all converted to a common cost unit
- Configurable markup — different clients can have different margins based on contract terms
- Invoice-ready reports — per-client monthly summaries with cost + markup that drops onto an invoice
TokenTally handles all of this. Set a per-client markup percentage, log token usage by client and model, and export a monthly breakdown that's ready to attach to an invoice. No spreadsheets. No manual math.
One More Thing: Budget Alerts
Charging clients after the fact is fine. Proactively alerting them before they hit a budget cap is better. When you set a monthly AI budget for a client and track it in real time, you can ping them at 80% usage: "Hey, you're on track to hit your $500 AI cap this month. Anything we should optimize before the billing cycle closes?"
Clients love this. It shows you're paying attention and managing their costs proactively. It also gives you a natural conversation opener about repricing if they keep hitting their cap.
Getting Started
If you're currently absorbing AI costs as overhead, the fix is straightforward: start tracking, apply a markup, and add an AI line item to your next invoice. You don't need to retroactively charge clients — just introduce the line item going forward with a clear explanation of what it covers.
Most clients understand this when you explain it clearly: "We log every AI call per your account, attribute costs to your usage, and apply a 25% service markup. Here's your first monthly breakdown."
TokenTally is free to start. Add your clients, set your markup percentages, and start generating billing-ready reports from day one. For a complete implementation walkthrough — from API key setup to your first billing report — see the multi-client agency setup guide. If you're weighing which provider to use with clients, the LLM API pricing comparison breaks down the real cost differences between OpenAI, Anthropic, and Google. For the provider-specific cost math on OpenAI, the OpenAI API cost calculator guide shows how per-call costs stack up across GPT-4o, GPT-4o Mini, and o1 — and why manual spreadsheet tracking breaks at scale. The same applies to Anthropic: the Claude API cost tracking guide covers Haiku, Sonnet, and Opus pricing with real agency examples.
Get notified when we publish new cost-tracking guides
No fluff. Just practical guides on AI billing, token tracking, and agency margins.