Features

The boring plumbing, done right.

You shouldn't have to build cost control, alerting, and SDK compatibility yourself. Here's what TokenFlow handles so you can ship features instead.

One API, every model

Use the OpenAI SDK to call any model — including ones OpenAI doesn't make. Same code. Same streaming. Same response shape. We translate behind the scenes.

🎭

OpenAI-compatible

The OpenAI Python and JS SDKs work as-is. Just change the base URL.

🦙

Ollama-compatible

If your code talks to a local Ollama, it talks to TokenFlow. /api/chat, /api/generate, /api/tags — all there.

🤖

Anthropic-compatible

Anthropic SDK calls work too. Cross-model code without three different clients.

Cost control that actually controls cost

Most platforms send you an alert email after you've spent the money. TokenFlow refuses the request before it costs you anything.

🛑

Hard caps

"Stop at $50 today" means stop. The next request returns a 402 the moment the cap is hit.

📈

Soft alerts

Get an email at 50%, 75%, 90% of your budget. Spot trouble before it's a crisis.

🔑

Per-key budgets

Cap your prod key at $500/mo and your dev key at $20/mo. Different rules for different surfaces.

🔍

Per-feature usage

Tag requests by feature. Find out which one is eating 80% of the bill in 10 seconds.

Smart aliases — pick a job, not a model

Stop writing "gpt-4o" in your code and praying it stays cheap. Use a job-shaped alias and we pick the best model for that job today.

💬

smart-chat

General-purpose chat. Best price-to-quality ratio across the major models.

fast-chat

Optimized for speed. Use this when you need a streaming response yesterday.

💻

coder-pro

Code generation, refactoring, debugging. Tuned for technical accuracy.

🧠

deep-reasoning

Hard problems that need long thinking. Slower but more accurate.

👁️

vision-pro

Image understanding. Pass images, get descriptions, classifications, or extracted data.

🧮

embed-default

Vector embeddings for search and RAG. Consistent dimensions, predictable price.

Built for solo builders, scales with you

🔄

Streaming & tools

SSE streaming, function calling, structured outputs, vision inputs. Everything modern AI workflows need.

📚

API keys you can manage

Create, name, scope, and rotate keys. Revoke any one without touching the others.

📋

Audit log

Every request logged. Filter by key, model, status, date. Export to CSV for your records.

💳

Pay how you want

Card or crypto. Top up your balance — no automatic charges, no subscription if you don't want one.

🌐

Global low-latency

Edge-routed to the nearest healthy provider. Most requests complete in under 500ms TTFT.

🛟

Automatic failover

If one model is down, we route to the next best one. Your code keeps working.

The features list is short for a reason.

Everything we build solves one of three problems: bill shock, vendor lock-in, or SDK glue. The rest is somebody else's product.

Try it free