Tokuin: Token Intelligence for LLM Developers

4 minute read Published: 2025-11-07

I built Tokuin because the tooling around prompt design still feels like guesswork. We scribble system prompts in scratchpads, retry endpoints until rate limits scream, and only know what the bill looks like after the invoice arrives. Tokuin changes that. It gives you reliable token estimates, cost projections, and synthetic load tests—all packaged in a Rust CLI you can trust.

Why Another CLI?

Prompts have become product surfaces. Teams iterate on them like code, but the support tooling still feels like duct tape. I wanted something that:

Reads prompts from files, stdin, or watch mode
Targets multiple providers without juggling SDKs
Surfaces pricing before you ever hit “Send”
Scales up to load tests when you need to know how a model behaves across 1,000 requests

Tokuin is the result: a ground-up build in Rust with a modular architecture for tokenizers, providers, and output formats. It’s fast, predictable, and stays out of your way while you experiment.

Feature Highlights

Token estimation that understands context: Supply raw text, chat transcripts, or JSON payloads and Tokuin breaks down system/user/assistant roles with optional markdown minification.
Multi-model comparisons: Pass --compare with OpenAI, Anthropic, or OpenRouter models to see how token counts and costs differ.
Pricing awareness: Add --price or --estimate-cost to forecast spend per prompt, per run, or across a load test.
Watch mode: Keep Tokuin running with --watch and it re-computes every time you save a file. Perfect for prompt gardening.
Load testing (feature flag): Enable the load-test feature to hammer APIs with controlled concurrency, think times, retry logic, and automatic latency/cost reports.
Format-flexible output: Emit human-friendly text, JSON for scripts, Markdown for docs, CSV for spreadsheets, or Prometheus metrics when you need dashboards.
Provider-ready: OpenAI and OpenRouter work out of the box, Gemini is one flag away, and new providers plug into the registry without rewriting the CLI surface.¹

Feature set sourced from the project README.tokuin

Advantages for Developers

Budget confidence: Know input/output token counts and projected cost before you ship or run a batch job.
Faster iteration: Swap “test, wait, pray” with a tight loop; piping prompts through Tokuin becomes muscle memory.
CI-friendly: Because it’s a single binary, you can run token checks in CI to guard against prompt bloat before merge.
Team alignment: Comparing multiple models is now a line of CLI flags, not a spreadsheet of stitched-together data.
Infra visibility: Load tests surface latency cliffs, provider throttling, and cost ceilings before your users do.

Built for Growth

Tokuin ships with a modular core: tokenizers/, models/, providers/, and output/ live in separate crates so we can grow the ecosystem without tangling the CLI.¹ Near-term roadmap items include:

Provider expansion: Native Anthropic, Mistral, and Cohere support (work already outlined in PROVIDERS_PLAN.md).
Scenario scripts: Define multi-turn conversations and replay them in load tests.
Cost guardrails: Abort long runs automatically when reaching a user-defined budget ceiling.
Prompt linting: Surface style and structure issues before they hit production pipelines.
Plugin hooks: Let users contribute tokenizers or pricing sources without forking the repo.

From Experiment to Community Tooling

Tokuin grew out of practical late-night benchmarking, budget vetting, and the need for predictable tooling across teams. We still welcome that vibe-coding energy—if you sketch solutions with AI co-pilots or riff on ideas in flow, there’s room here to help shape how we build for LLMs.

How to Get Involved

Run it: cargo install tokuin or build from source and kick the tires.
Share prompts: If you discover funky edge cases in tokenization or pricing, capture them as regression fixtures.
Open a PR: Contributions are welcome whether you’re adding providers, tightening error messages, or documenting workflows. Mention any vibe-coding sessions in your PR so we can trace the creative path.
Join the roadmap: Drop ideas in issues, especially if you’re working on multi-provider tooling—the more weird setups we test, the better Tokuin gets.

Thanks for trying it out. Tokuin exists to remove guesswork, save budgets, and let prompt engineers stay in flow. If you ship something with it, I’d love to hear the story.