AI cost calculator.
Project monthly API spend across Claude, GPT-5, Gemini, and Llama. Set request volume, prompt size, output length, and cache hit rate. See the cheapest model, the cheapest flagship, and your annual bill in one table.
Claude · GPT-5 · Gemini · Llama
· total monthly requests
10,000
· total monthly tokens
25.0M
· cheapest fit
$3.00
Gemini 3 Flash
| Model | Input | Output | Monthly | Annual | Per 1K req |
|---|---|---|---|---|---|
Gemini 3 Flash· cheapest Google · fast | $1.50 | $1.50 | $3.00 | $36.00 | $0.300 |
Llama 4 Scout Meta · open | $1.60 | $1.50 | $3.10 | $37.20 | $0.310 |
Llama 4 Maverick· cheapest flagship Meta · flagship | $5.40 | $4.25 | $9.65 | $116 | $0.965 |
Claude Haiku 4.5 Anthropic · fast | $5.00 | $6.25 | $11.25 | $135 | $1.13 |
GPT-5 mini OpenAI · fast | $5.00 | $10.00 | $15.00 | $180 | $1.50 |
Gemini 2.5 Pro Google · flagship | $25.00 | $25.00 | $50.00 | $600 | $5.00 |
Gemini 3.1 Pro Google · flagship | $40.00 | $60.00 | $100 | $1200 | $10.00 |
GPT-5.4 OpenAI · mid | $50.00 | $75.00 | $125 | $1500 | $12.50 |
Claude Sonnet 4.6 Anthropic · mid | $60.00 | $75.00 | $135 | $1620 | $13.50 |
Claude Opus 4.7 Anthropic · flagship | $100 | $125 | $225 | $2700 | $22.50 |
GPT-5.5 OpenAI · flagship | $100 | $150 | $250 | $3000 | $25.00 |
· take-aways
- · Cheapest model overall: Gemini 3 Flash at $3.00/month ($36.00/yr).
- · Cheapest flagship-tier: Llama 4 Maverick at $9.65/month, a 3.2× premium for higher quality.
- · Calculated on standard list pricing, no batch / volume discount. Anthropic and OpenAI offer roughly 50% off for batch jobs; hosted Llama varies by provider.
How to read the table
Eleven models sorted from cheapest to priciest at the volume you set. The green-highlighted row is the cheapest overall; the orange-highlighted row is the cheapest flagship-tier model (use that one when quality matters, the cheapest overall when it doesn't). Annual is monthly times twelve, no compounding.
How much does ChatGPT cost per month?
Consumer ChatGPT Plus is $20/month flat, no per-token billing. GPT-5.5 via API depends entirely on your volume; the calculator above shows the math. The break-even between Plus and API depends on your daily request volume; for occasional users Plus is cheaper, for bulk usage API wins.
How much does Claude cost per month?
Claude Pro is $20/month consumer; Opus 4.7 via API is the priciest at $5/$25 per million tokens. Sonnet 4.6 at $3/$15 is the value pick for most production workloads. Full Claude pricing breakdown for the per-workload math.
How to calculate AI API cost for a chatbot
Three numbers: daily-active users, requests per user per day, average tokens per request. Multiply user count by request count by 30 to get monthly requests. Plug into the calculator, pick the workload preset closest to your shape (probably "customer support chatbot"), then read the cheapest row.
When prompt caching changes the math
Prompt caching is most useful when the same input prefix (system prompt, document context, codebase) repeats across many requests. RAG, agentic coding, and chatbots with long system prompts often see 60-80% input cost cuts. One-shot chat and stateless API calls see no benefit. Read the prompt caching guide for which providers cache what.
Use it from inside Claude or Cursor
Same calculator also ships as an MCP server, so Claude Desktop, Cursor, or any MCP-compatible client can call estimate_ai_cost and cheapest_ai_model programmatically. Useful for AI agents that need to budget their own token spend or pick a cost-optimized model.
- · npm package:
briskly-mcp-ai-cost-calculator - · three tools:
estimate_ai_cost,cheapest_ai_model,list_cost_models - · same math as this page, deterministic outputs
When this is the wrong tool
For a single prompt rather than monthly volume, use the LLM token counter instead, it shows per-prompt cost across the same models. If you don't know which model fits your use case, run the AI model picker first, then come back here for the cost projection.
FAQ
How much does the OpenAI API cost per month?
Depends on volume and model. GPT-5.5 (the new flagship) is $5/M input, $30/M output. GPT-5.4 (mid-tier) is $2.50/$15. GPT-5 mini is $0.25/$2. For a chatbot doing 50K requests a month at 1.5K input tokens and 300 output tokens, GPT-5.4 lands around $300/month, GPT-5 mini around $30. Run the picker above with your actual numbers.
How much does the Claude API cost per month?
Claude Opus 4.7 is $5/$25 per million tokens. Sonnet 4.6 is $3/$15. Haiku 4.5 is $0.25/$1.25. With prompt caching enabled (which Anthropic offers at 90% off cached reads), high-cache-rate workloads can drop input cost by 60-80%. The calculator above includes a cache-hit-rate slider so you can see the impact.
Which AI API is cheapest?
Gemini 3 Flash at $0.075 input / $0.30 output per million tokens, then Llama 4 Scout at roughly the same on hosted providers. Among flagship-tier models, Gemini 2.5 Pro ($1.25/$5) is cheapest, then Gemini 3.1 Pro ($2/$12). Claude Opus 4.7 and GPT-5.5 are the most expensive flagships at $5 input each.
How do I estimate AI cost for my use case?
Three numbers: monthly request count, average input tokens per request, average output tokens per request. Use one of the workload presets in the calculator as a starting point if you're not sure (chatbot, RAG, agentic coding, content writing, summarizer, high-volume API). Adjust from there.
Should I include prompt caching in my estimate?
Only if you're going to use it. Prompt caching gives you 80-90% off the input price for tokens you've sent before, useful when system prompts or long stable prefixes repeat across requests. RAG and agentic-coding workloads benefit most. Set the cache-hit-rate slider to your realistic rate (60% is common for RAG, 70% for agentic coding, 0% for one-shot chat).
What's a typical AI API budget for a small SaaS?
Wildly variable. A typical pattern: a small SaaS with 1K daily-active users, each making 5 LLM-backed actions per day, ends up at 150K monthly requests. At 2K input / 500 output average, that's around $1,200/month on GPT-5.4 and $400/month on GPT-5 mini. Cache caching aggressively or moving high-volume calls to a fast tier can cut the bill 70-90%.
How does cost scale with users?
Roughly linearly with request volume, since pricing is per-token. Doubling your users typically doubles your bill (sometimes less, since system prompts cache hits go up at scale). Use the calculator to model 2x, 5x, 10x scenarios by changing the monthly-requests field.
Are there hidden fees beyond per-token pricing?
Not on the major providers' standard tiers. Anthropic, OpenAI, and Google charge per-token only. Some providers add surcharges for batch jobs or volume tiers, usually as discounts not surcharges. Hosted Llama (Together, Fireworks, Groq, etc.) varies by provider, the calculator uses median hosted prices. Cache write cost is real on Claude (~$1.25/M tokens for cache writes vs $5/M for fresh reads); the calculator factors this into the cached input rate.