· briskly / ai tools / llm token counter
· browser-only · no api key

LLM token counter.

Count tokens across every major LLM — GPT-5, GPT-4o, Claude 4.x, Gemini 2.5, Llama — and see per-model cost and context-window fill. Exact for OpenAI, approximate (±5–10%) for Claude and Gemini. Nothing uploads.

free · forever12 modelsin-browser

no signup · no api key · nothing uploads

· your text

Chars

421

Words

62

GPT-5 tokens

83

GPT-5

OpenAI

83

input tokens

0.0% of 400K context399,917 remaining

Input

$0.000104

Output (500)

$0.005

Total

$0.005104

$1.25 / 1M input · $10.00 / 1M output

What a token is, in one paragraph

A token is a unit of text that a language model sees as a single item. For English prose, a rough rule of thumb is 1 token ≈ 4 characters ≈ 0.75 words. The actual count depends on which tokenizer the model uses — "hello" is 1 token but "hello," is often 2, and "antidisestablishmentarianism" can be 3 or 4 depending on the model. LLM pricing and context-window limits are measured in tokens, not characters or words, which is why every AI dev needs some version of this tool.

Why counts differ across providers

Each model family uses its own byte-pair encoding (BPE) or SentencePiece vocabulary. OpenAI's GPT-4o and GPT-5 use o200k_base (200K vocabulary); GPT-4 and GPT-3.5 use cl100k_base (100K). Claude uses a proprietary BPE Anthropic hasn't open-sourced. Gemini uses Google's SentencePiece variant. Llama 3.3 uses a tiktoken-compatible but distinct BPE. Across plain English, these tokenize within 5–10% of each other. Across code, non-English languages, or data-heavy text, they can diverge 10–20%. The tool shows the actual count for OpenAI (exact) and approximate counts for everyone else.

How to use this for cost planning

  • Paste your typical prompt (system + user message) and set expected output tokens (500 is a common short-answer default; 2,000+ for longer generations).
  • Turn on "Compare all models" to see cost side-by-side. Claude Haiku 4.5 is often 10-50× cheaper than GPT-5 for simple tasks; Gemini 2.5 Flash is competitive for volume.
  • Multiply by your request volume. If a prompt costs $0.003 per call and you expect 10,000 calls a month, that's $30. A single expensive model choice at scale can be the difference between profitable and not.
  • Watch the context fill bar — if you're over 70%, the model has less room to generate its response, and cutoff risk goes up.

Also available as

The same token-counting logic ships in a Chrome extension (select text on any page → see token count), an MCP server (Claude and Cursor can call it programmatically), and an NPM package (embed it in your own tool). All linked from this page as they ship.

  • · web tool — you're using it
  • · chrome extension — coming soon
  • · mcp server — coming soon
  • · npm package — coming soon

FAQ

How accurate is the token count for Claude / Gemini / Llama?

For English text, within 5–10% of the actual count in almost all cases. Anthropic, Google, and Meta don't ship browser-compatible tokenizers, so this tool uses OpenAI's o200k_base as a proxy. For budgeting prompts against context windows and rough cost estimation, that's accurate enough. For billing reconciliation against actual API usage, use each provider's official count endpoint (Anthropic's /v1/messages/count_tokens, Google's countTokens, etc).

Why does GPT-4o and GPT-5 use the same tokenizer?

Both use o200k_base — OpenAI's 200K-vocabulary BPE encoding introduced with GPT-4o. GPT-5 inherited it. GPT-4 and GPT-3.5 still use cl100k_base (100K vocabulary). The tool picks the right encoding automatically per model.

Does anything I paste get uploaded anywhere?

No. Tokenization runs entirely in your browser using the gpt-tokenizer library. No network requests, no server, no API key required. The only thing persisted is your last-used text and model in localStorage (cleared when you clear browser data).

How do I read the cost table?

Each model card shows: (1) input cost — the price for sending your pasted text as a prompt, (2) output cost — the estimated price if the model responds with the number of output tokens you specified (default 500), (3) total. Prices are per provider's public list pricing in USD; update frequency is on us (we refresh when providers change their pricing).

What's the context-window fill bar?

Each model has a hard cap on how much text it can 'see' in one request (128K for GPT-4o, 200K for Claude Sonnet 4.6, 1M–2M for Gemini 2.5 Pro, etc). The fill bar shows what percentage of that your pasted text consumes. Green = comfortable (<70%), yellow = getting tight (70–90%), red = close to the limit (>90%). For long inputs, this tells you which models can handle it without truncation.

Why do identical strings tokenize to different counts across models?

Each tokenizer family has its own vocabulary. GPT-4o's o200k_base includes more whole-word tokens for common patterns (URLs, emojis, code, non-English languages) than GPT-3.5's cl100k_base, so it counts fewer tokens for the same text. Claude and Gemini have their own splits. Across English prose the counts are within a few percent; across heavy code / non-English / structured data they can diverge 10–20%.

Can you add model X?

Yes — email hello@briskly.tools with the model, its context window, and its public pricing. We add new models as they're released.

Building on Claude? See also the freelance rate calculator if you're pricing client AI work, or the invoice generator for billing them.