Question 1

How accurate is the token count for Claude / Gemini / Llama?

Accepted Answer

For English text, within 5-10% of the actual count in almost all cases. Anthropic, Google, and Meta don't ship browser-compatible tokenizers, so this tool uses OpenAI's o200k_base as a proxy. For budgeting prompts against context windows and rough cost estimation, that's accurate enough. For billing reconciliation against actual API usage, use each provider's official count endpoint (Anthropic's /v1/messages/count_tokens, Google's countTokens, etc).

Question 2

Why do GPT-4o, GPT-5, GPT-5.4, and GPT-5.5 all use the same tokenizer?

Accepted Answer

They all use o200k_base. OpenAI's 200K-vocabulary BPE encoding introduced with GPT-4o. Every model OpenAI has shipped since then (GPT-4o, GPT-5, GPT-5.4, GPT-5.5, o1, o4-mini) inherited it. Only GPT-4 and GPT-3.5 still use the older cl100k_base (100K vocabulary). The tool picks the right encoding automatically per model.

Question 3

Does anything I paste get uploaded anywhere?

Accepted Answer

No. Tokenization runs entirely in your browser using the gpt-tokenizer library. No network requests, no server, no API key required. The only thing persisted is your last-used text and model in localStorage (cleared when you clear browser data).

Question 4

How do I read the cost table?

Accepted Answer

Each model card shows: (1) input cost, the price for sending your pasted text as a prompt, (2) output cost, the estimated price if the model responds with the number of output tokens you specified (default 500), (3) total. Prices are per provider's public list pricing in USD; update frequency is on us (we refresh when providers change their pricing).

Question 5

What's the context-window fill bar?

Accepted Answer

Each model has a hard cap on how much text it can 'see' in one request (272K for GPT-5.5 and GPT-5.4, 400K for GPT-5, 128K for GPT-4o, 200K for Claude Sonnet 4.6, 1M for Claude Opus 4.7 and Llama 4, 1M for Gemini 3 Flash, 2M for Gemini 2.5 Pro, 1.05M for GPT-5.5 Pro and GPT-5.4 Pro). The fill bar shows what percentage of that your pasted text consumes. Green = comfortable (below 70%), yellow = getting tight (70 to 90%), red = close to the limit (above 90%). For long inputs, this tells you which models can handle it without truncation.

Question 6

Why do identical strings tokenize to different counts across models?

Accepted Answer

Each tokenizer family has its own vocabulary. GPT-4o's o200k_base includes more whole-word tokens for common patterns (URLs, emojis, code, non-English languages) than GPT-3.5's cl100k_base, so it counts fewer tokens for the same text. Claude and Gemini have their own splits. Across English prose the counts are within a few percent; across heavy code / non-English / structured data they can diverge 10-20%.

Question 7

Can you add model X?

Accepted Answer

Yes, email hello@briskly.tools with the model, its context window, and its public pricing. We add new models as they're released.

LLM token counter.

GPT-5.4

What a token is, in one paragraph

Why counts differ across providers

How to count GPT-5.5 tokens

How to count Claude tokens

How to count Gemini tokens

How to count Llama tokens

How to use this for cost planning

How many tokens is 1000 words?

What does 1 million tokens cost?

How to budget AI API spend before you write code

Also available as

FAQ