You're managing five AI clients. Each one uses OpenAI, one uses Claude, another uses Gemini. At the end of the month, you get three separate bills. You have no idea which client drove which costs. Your margins are invisible.

This is the most common problem in AI agencies: zero visibility into token costs per client. You're billing them flat monthly fees, absorbing overages, or guessing at usage. As we cover in why AI agencies lose money without token tracking, the cost gap between clients compounds silently — your profitable clients end up subsidizing the expensive ones.

This guide walks you through the setup that fixes it: centralizing API keys, tagging usage per client, tracking costs in real time, and generating billing reports that are ready to attach to an invoice. No spreadsheets. No manual math. Once you have the data, billing clients for AI API usage walks through how to apply markup and structure the invoice line item.

The Problem: Why Agencies Bleed Money Here

Most AI agencies run like this:

  • Each client has their own OpenAI key (or they all share a dev key)
  • You get the bill at the end of the month
  • You have no way to attribute costs to specific clients
  • You bill a flat monthly fee and hope it breaks even
  • When a client's usage spikes (or yours does), nobody sees it coming

The result: some clients are profitable at their flat fee. Others are loss-makers. You're cross-subsidizing — your good clients are bankrolling the bad ones.

The data exists. You just don't see it until the bill arrives.

Step 1: Centralize Your API Keys (The Hub Model)

Stop giving each client their own OpenAI key. This is the root of the visibility problem.

Instead, use a hub model: you own all the API keys. Clients make requests through your infrastructure (your app, your proxy, or a managed service). Every request gets tagged with the client's ID.

This solves three problems at once:

  • Visibility: You own the billing relationship with OpenAI/Claude/Gemini, so you see every token used across all clients
  • Cost control: You can set per-client usage limits. A client hits their cap? You can pause, alert them, or renegotiate
  • Security: Clients never see your API keys. You can rotate keys without coordinating with them

The mechanics are simple: clients make requests to your app/API, your app adds their client ID to the request (via request header, query param, or proxy header), forwards to the AI provider, and logs the response. The token count + client ID go into your database.

How this actually looks: Client makes a request to your endpoint. Your code extracts the client_id from an auth header or JWT. You add a namespace to the OpenAI request (e.g., user: "client_123"). OpenAI logs tokens under that user ID. You read the token count from the response and store it: { client_id, tokens, model, cost, timestamp }. Done.

Step 2: Tag Usage Per Client (The Namespace)

Once requests flow through your infrastructure, you need to tag them. The OpenAI API (and Claude, Gemini) support user/organization tagging.

OpenAI: Use the user parameter on every request. Set it to your internal client ID or a stable identifier.

Anthropic (Claude): Include metadata in your request headers or use the user ID in logging. Anthropic includes usage in the response; you extract and store it.

Google (Gemini): Tag requests with your own identifier in the request. Parse the response for token counts.

The pattern is identical: every request carries the client's identity. The API response includes token usage. You extract tokens + client ID and write one row to your database.

This is called a usage log: timestamp, client_id, model, input_tokens, output_tokens, cost. That's all you need.

Step 3: Set Up Automated Cost Tracking

You've centralized keys and you're logging usage. Now automate the aggregation.

Every night (or every hour), you need a query that groups usage by client:

Pseudo-SQL: SELECT client_id, model, SUM(input_tokens), SUM(output_tokens), SUM(cost) FROM usage_logs WHERE timestamp > NOW() - INTERVAL '1 month' GROUP BY client_id, model

This is your monthly ledger. Run it every night and store the results. At month-end, you have a complete picture of each client's usage by model.

The second automation is applied markup. Set a markup percentage per client (25%, 40%, whatever your contracts say). When you run the nightly aggregation, multiply cost × (1 + markup_percent). That's what you charge.

Example: Client "ACME" used $500 in tokens. You applied 30% markup. Their billed cost is $650. The difference ($150) is your margin. Repeat for all clients and you have your monthly AI revenue.

Step 4: Create Client-Facing Billing Reports

Now comes the accountability piece. Generate a monthly report for each client.

It should include:

  • Model breakdown (e.g., GPT-4o: 50K tokens, Claude 3: 20K tokens)
  • Cost per model (what you paid)
  • Applied markup (your service fee)
  • Total billed (cost + markup)
  • Trend (vs. last month)

Export this as a PDF or CSV. This is your invoice line item. You don't need to itemize every single API call — just the aggregate by month and model.

Pro move: Show the breakdown to clients as part of your service. "Here's what you spent on OpenAI, Claude, and Gemini this month. Your total AI cost was $500, plus 30% markup ($150) for our service and optimization." Transparency builds trust.

Step 5: Set Up Usage Alerts

The last piece is proactive monitoring. Set a budget per client and alert when they're approaching it.

  • Client's monthly budget: $1,000
  • Current usage (mid-month): $750
  • Alert triggers at 80% ($800)
  • You ping them: "Hey, you're on track to hit your budget. Anything we should optimize?"

This does three things: it prevents bill shock, it opens a conversation about optimization, and it gives you a natural moment to discuss repricing if they consistently exceed their budget.

Real Numbers: What Multi-Client Agencies Actually Spend

Here's what 10-client agencies typically see:

  • Small client: 1M tokens/month = ~$20 cost, billed at $26 (30% markup)
  • Mid client: 10M tokens/month = ~$200 cost, billed at $260
  • Large client: 50M tokens/month = ~$1,200 cost, billed at $1,560
  • Across 10 clients: ~$2,500 in actual costs, ~$3,250 billed to clients, $750/month in margin

These numbers vary wildly based on your model mix (GPT-4o vs Haiku vs Sonnet) and your markup (15% is lean, 50% is aggressive). But the structure is the same.

The agencies that get this right are making $5K-$15K/month in AI service revenue from a stable 8-12 client base. The ones that don't set this up are either breaking even or losing money.

How to Implement This in 5 Minutes (The Easy Path)

You can build all of this yourself in a Node.js app with a Postgres database. But it takes time to wire: authentication, per-client rate limiting, invoice generation, CSV exports.

Or you can use a tool that handles the infrastructure.

TokenTally does this setup for you:

  • Add your OpenAI, Anthropic, and Google API keys (encrypted, never exposed)
  • Register each client and set a markup percentage
  • Log token usage by client as you use the APIs
  • Get a real-time dashboard of per-client costs
  • Export monthly billing reports (PDF or CSV, invoice-ready)
  • Set per-client budget alerts

No code. No infrastructure. Just plug in your keys, define your clients, and start logging. The reports generate automatically.

Real example: An agency with 8 clients and $40K/month in pass-through AI costs now has full per-client visibility, bills accurately, and added $8K/month in margin just by stopping the subsidy leaks. Setup took 20 minutes.

Getting Started This Week

Don't wait for a complex build. Start today:

  1. Audit your current setup. Do you own the API keys or do clients? If clients own them, you have a visibility problem.
  2. Centralize keys. If you're not there yet, move client API calls through your infrastructure. This is the foundation.
  3. Log usage. For every AI call, write one row: client_id, model, tokens, cost, timestamp. This is all the raw data you need.
  4. Run your first aggregation query. Group by client. See what each one cost last month.
  5. Generate a test report. Show it to your biggest client. Get feedback. Bill it starting next month.

This whole sequence takes a few hours if you're building it yourself, or 5 minutes if you're using a tool like TokenTally that has the plumbing already wired.

Either way, the outcome is the same: you stop flying blind. You see which clients are profitable, which ones are loss-makers, and where your optimization opportunities are. You charge what you're worth.

The agencies that do this grow faster. They're not subsidizing unprofitable clients. They're having data-driven conversations with clients about pricing. And they're building a defensible service: "You get real-time visibility into your AI costs per model, managed budgets, and invoicing built in. Worth the premium we charge."

If you're running OpenAI, the OpenAI API cost calculator guide covers GPT-4o, GPT-4o Mini, and o1 pricing in detail. For Anthropic workloads, the Claude API cost tracking guide explains Haiku, Sonnet, and Opus attribution — including how context window sizes inflate costs invisibly.

Track AI costs per client, charge what you're worth

Real-time cost tracking, automated markup, billing-ready reports. Start free and bill accurately from day one.

Try TokenTally Free No credit card required · Set up in under 5 minutes