Built for indie hackers

Stop getting wrecked by your AI bill.

One API. Every model. Hard budget caps that actually work. Your existing OpenAI and Ollama code keeps running — but you stop overpaying for the same tokens.

Get started — first deposit doubled Read the docs
First deposit doubled Drop-in for OpenAI & Ollama Stop & cap any time
The problem nobody's solving

You shipped an AI feature. Now your bill is the product.

Every indie hacker shipping AI has lived through this. The product works. Users love it. Then the invoice arrives and you spend the weekend arguing with your co-founder about whether to take the feature down.

"One viral tweet and my OpenAI bill jumped from $40 to $1,800.

By the time I noticed, the damage was done. There's no real "stop at $200" button — only soft alerts that fire after you're already over.

"I spent two days porting from GPT-4 to Claude when prices changed. Two weeks later, prices changed again.

Switching providers means rewriting prompts, re-testing edge cases, and praying the new model handles your tool-calling format. Most of us just eat the cost.

"I have three SDKs in my codebase. OpenAI, Anthropic, and Ollama for local dev. They all behave differently.

Different streaming formats, different error shapes, different rate-limit semantics. Half my AI code is glue.

How TokenFlow works

One API key. Every model. One predictable invoice.

Point your existing OpenAI SDK at TokenFlow. Use a model name like smart-chat or coder-pro. We pick the right model behind the scenes and charge you a single flat rate. When you hit your budget, we stop. No surprises.

Sign up

Get an API key in 30 seconds. Top up $10, get $20 of usable balance.

Change one line

Set base_url to TokenFlow. Your existing OpenAI code just works.

Pick a model alias

Use smart-chat, fast-chat, or any of our presets. Or pick a specific model.

Set a hard cap

"Stop spending after $50/day" actually stops. We refuse the call, not after.

Drop-in for the SDKs you already use

No new client library. No re-learning streaming formats. Three lines change.

# Before
client = OpenAI(api_key="sk-...")
client.chat.completions.create(
    model="gpt-4o",
    messages=[...]
)

# After
client = OpenAI(
    api_key="tk_...",
    base_url="https://api.tokenflow.dev/v1",   # ← only change
)
client.chat.completions.create(
    model="smart-chat",                       # ← cheaper, same quality
    messages=[...]
)
What you get

Everything you keep duct-taping together — built in.

🎯

Smart aliases

Use smart-chat or fast-chat. We route to whichever model gives you the best price for the same quality.

🛡️

Hard budget caps

Set monthly, daily, or per-API-key limits. When you hit the cap, we refuse the request — not days later in an email.

📊

Real usage dashboards

Per-key, per-model, per-day costs. Find the runaway feature in 30 seconds, not after rooting through invoices.

🔌

SDK compatible

Works with the OpenAI SDK, Anthropic SDK, and Ollama clients. Change the base URL — that's it.

Streaming & tools

Server-sent events, function calling, vision — everything you expect from a modern AI API. Same shape as OpenAI.

🔐

API keys per project

One key for prod, another for staging, a third for that side project. Track them separately, revoke instantly.

Ship your next feature without the bill anxiety.

First deposit doubled — top up $10, get $20 of usable balance. Cancel any time — there's nothing to cancel.

Get your API key