LLM token counter.
Count tokens across every major LLM. GPT-5.5, GPT-5.4, GPT-5, GPT-4o, Claude 4.x, Gemini 3.x, Llama 4, and see per-model cost and context-window fill. Exact for OpenAI, approximate (within 5 to 10%) for Claude, Gemini, and Llama. Nothing uploads.
no signup · no api key · nothing uploads
Chars
421
Words
62
GPT-5.4 tokens
83
GPT-5.4
OpenAI
83
input tokens
Input
$0.000208
Output (500)
$0.0075
Total
$0.007707
$2.50 / 1M input · $15.00 / 1M output
What a token is, in one paragraph
A token is a unit of text that a language model sees as a single item. For English prose, a rough rule of thumb is 1 token ≈ 4 characters ≈ 0.75 words. The actual count depends on which tokenizer the model uses, "hello" is 1 token but "hello," is often 2, and "antidisestablishmentarianism" can be 3 or 4 depending on the model. LLM pricing and context-window limits are measured in tokens, not characters or words, which is why every AI dev needs some version of this tool.
Why counts differ across providers
Each model family uses its own byte-pair encoding (BPE) or SentencePiece vocabulary. OpenAI's GPT-4o, GPT-5, GPT-5.4, GPT-5.5, and the o-series reasoning models all use o200k_base (200K vocabulary); only legacy GPT-4 and GPT-3.5 still use cl100k_base (100K). Claude uses a proprietary BPE that Anthropic hasn't open-sourced (their official /v1/messages/count_tokens API is the authoritative source). Gemini uses Google's SentencePiece variant. Llama 3 and Llama 4 use a tiktoken-style but distinct BPE. Across plain English these tokenize within 5-10% of each other; across code, non-English, or data-heavy text they can diverge 10-20%. The tool shows the actual count for OpenAI (exact) and approximate counts for everyone else.
How to count GPT-5.5 tokens
GPT-5.5 (and every OpenAI model since GPT-4o, including GPT-5.4, GPT-5, GPT-4o, and the o-series) uses o200k_base, a 200K-vocabulary BPE tokenizer. The tool above uses the official open-source tiktoken library to count tokens exactly the way OpenAI's API would. Pick GPT-5.5 from the model dropdown and paste your text.
How to count Claude tokens
Anthropic uses a proprietary BPE that hasn't been open-sourced. The tool above approximates Claude tokens using o200k_base as a proxy, which is within 5-10% for English prose and tighter for code. For exact billing reconciliation, call Anthropic's /v1/messages/count_tokens endpoint with your API key. For prompt budgeting and context-window math, the proxy is accurate enough.
How to count Gemini tokens
Gemini uses Google's SentencePiece variant, also not open-sourced for in-browser use. Same proxy story as Claude: the tool gives an approximate count via o200k_base, accurate within 5-10% for most English text. For exact counts, call countTokens on the Gemini API. The 2M-token context window on Gemini 2.5 Pro means the proxy error is bounded in practice.
How to count Llama tokens
Llama 3 and Llama 4 use a tiktoken-style BPE that's distinct from o200k_base, larger vocabulary, different merges. Counts are approximate via the proxy. For exact counts, use the tokenizers Python library with the Llama tokenizer file from Hugging Face. For browser-only budgeting, the proxy is within ~10% on English.
How to use this for cost planning
- Paste your typical prompt (system + user message) and set expected output tokens (500 is a common short-answer default; 2,000+ for longer generations).
- Turn on "Compare all models" to see cost side-by-side. Claude Haiku 4.5 and GPT-5.4 nano are often 10-50× cheaper than GPT-5.4 or Claude Opus 4.7 for simple tasks; Gemini 2.5 Flash-Lite and Llama 4 Scout are competitive for high-volume, low-complexity work.
- Multiply by your request volume. If a prompt costs $0.003 per call and you expect 10,000 calls a month, that's $30. A single expensive model choice at scale can be the difference between profitable and not.
- Watch the context fill bar , if you're over 70%, the model has less room to generate its response, and cutoff risk goes up.
How many tokens is 1000 words?
Roughly 1,300 to 1,400 tokens for English prose, the rule of thumb is 1 token equals about 0.75 words. The exact ratio varies: dense academic prose runs higher (more multi-token words); conversational text runs lower (more whole-word tokens). For code and non-English text, the ratio shifts meaningfully, paste a sample to see your specific number.
What does 1 million tokens cost?
Depends on the model, that's the unit pricing tier every provider lists. As of May 2026: GPT-5.5 is $5 input / $30 output per 1M tokens. Claude Opus 4.7 is $5/$25. Gemini 3.1 Pro is $2/$12. GPT-5.4 (mid-tier) is $2.50/$15. Cheapest at scale: Gemini 3 Flash at $0.075 / $0.30. For monthly projections at scale, use the AI cost calculator.
How to budget AI API spend before you write code
Three numbers: typical prompt size in tokens (paste a sample here), typical output size, expected monthly request count. Multiply the prompt-cost shown above by your monthly volume; that's the input bill. Multiply expected output tokens by output price; that's the output bill. Sum them, then add 30% buffer for retries and longer-than-expected responses. For complex workloads with prompt caching, use the AI cost calculator instead, it factors cache hit rate.
Also available as
The same token-counting logic ships in a Chrome extension (select text on any page → see token count), an MCP server (Claude and Cursor can call it programmatically), and an NPM package (embed it in your own tool). All linked from this page as they ship.
- · web tool, you're using it
- · chrome extension, built, in submission to the web store
- · mcp server, on npm as
briskly-mcp-llm-token-counter - · primer guide, /guides/mcp-server-primer
FAQ
How accurate is the token count for Claude / Gemini / Llama?
For English text, within 5-10% of the actual count in almost all cases. Anthropic, Google, and Meta don't ship browser-compatible tokenizers, so this tool uses OpenAI's o200k_base as a proxy. For budgeting prompts against context windows and rough cost estimation, that's accurate enough. For billing reconciliation against actual API usage, use each provider's official count endpoint (Anthropic's /v1/messages/count_tokens, Google's countTokens, etc).
Why do GPT-4o, GPT-5, GPT-5.4, and GPT-5.5 all use the same tokenizer?
They all use o200k_base. OpenAI's 200K-vocabulary BPE encoding introduced with GPT-4o. Every model OpenAI has shipped since then (GPT-4o, GPT-5, GPT-5.4, GPT-5.5, o1, o4-mini) inherited it. Only GPT-4 and GPT-3.5 still use the older cl100k_base (100K vocabulary). The tool picks the right encoding automatically per model.
Does anything I paste get uploaded anywhere?
No. Tokenization runs entirely in your browser using the gpt-tokenizer library. No network requests, no server, no API key required. The only thing persisted is your last-used text and model in localStorage (cleared when you clear browser data).
How do I read the cost table?
Each model card shows: (1) input cost, the price for sending your pasted text as a prompt, (2) output cost, the estimated price if the model responds with the number of output tokens you specified (default 500), (3) total. Prices are per provider's public list pricing in USD; update frequency is on us (we refresh when providers change their pricing).
What's the context-window fill bar?
Each model has a hard cap on how much text it can 'see' in one request (272K for GPT-5.5 and GPT-5.4, 400K for GPT-5, 128K for GPT-4o, 200K for Claude Sonnet 4.6, 1M for Claude Opus 4.7 and Llama 4, 1M for Gemini 3 Flash, 2M for Gemini 2.5 Pro, 1.05M for GPT-5.5 Pro and GPT-5.4 Pro). The fill bar shows what percentage of that your pasted text consumes. Green = comfortable (below 70%), yellow = getting tight (70 to 90%), red = close to the limit (above 90%). For long inputs, this tells you which models can handle it without truncation.
Why do identical strings tokenize to different counts across models?
Each tokenizer family has its own vocabulary. GPT-4o's o200k_base includes more whole-word tokens for common patterns (URLs, emojis, code, non-English languages) than GPT-3.5's cl100k_base, so it counts fewer tokens for the same text. Claude and Gemini have their own splits. Across English prose the counts are within a few percent; across heavy code / non-English / structured data they can diverge 10-20%.
Can you add model X?
Yes, email hello@briskly.tools with the model, its context window, and its public pricing. We add new models as they're released.
Building on Claude? See also the freelance rate calculator if you're pricing client AI work, or the invoice generator for billing them.