Skip to content
brisklytools
· briskly / ai tools / block ai crawlers
· 23 bots · client-side

Block AI crawlers.

Generate a robots.txt that blocks the known AI training and citation crawlers. 23bots covered: OpenAI's GPTBot, Anthropic's ClaudeBot, Google-Extended, PerplexityBot, Apple, Meta, Bytespider, Common Crawl, Cohere, and more. Pick what to block, copy the output, drop it at the root of your domain.

free · forever3 presets · per-bot togglesin-browser

companion: ai slop detector

· quick presets

· pick which bots to block

16 of 23 blocked

· OpenAI

· Anthropic

· Google

· Perplexity

· Apple

· Meta

· Amazon

· ByteDance

· Common Crawl

· Cohere

· Diffbot

· ImageSift

· Webz.io

· You.com

· DuckDuckGo

· output options

· your robots.txt

65 lines

# robots.txt — AI crawler blocking rules
# Generated by https://briskly.tools/tools/block-ai-crawlers
# 16 of 23 known AI crawlers blocked.

# Note: robots.txt is voluntary. Major crawlers (GPTBot, ClaudeBot, Google-Extended,
# CCBot, Applebot-Extended) honor it. Some (Bytespider, Perplexity in particular)
# have been reported to ignore it. For stronger control, use Cloudflare AI bot
# blocking, server-side user-agent filtering, or rate limiting on top of this file.

# ----- Amazon -----
User-agent: Amazonbot
Disallow: /

# ----- Anthropic -----
User-agent: ClaudeBot
User-agent: anthropic-ai
Disallow: /

# ----- Apple -----
User-agent: Applebot-Extended
Disallow: /

# ----- ByteDance -----
User-agent: Bytespider
Disallow: /

# ----- Cohere -----
User-agent: cohere-ai
User-agent: cohere-training-data-crawler
Disallow: /

# ----- Common Crawl -----
User-agent: CCBot
Disallow: /

# ----- Diffbot -----
User-agent: Diffbot
Disallow: /

# ----- Google -----
User-agent: Google-Extended
User-agent: Google-CloudVertexBot
Disallow: /

# ----- ImageSift -----
User-agent: ImagesiftBot
Disallow: /

# ----- Meta -----
User-agent: Meta-ExternalAgent
User-agent: FacebookBot
Disallow: /

# ----- OpenAI -----
User-agent: GPTBot
Disallow: /

# ----- Webz.io -----
User-agent: Omgilibot
Disallow: /

# Allow everything else (regular search engines, etc.)
User-agent: *
Allow: /
Upload this file to the root of your domain so it's accessible at yoursite.com/robots.txt. Most static-site hosts (Vercel, Netlify, Cloudflare Pages) serve files from /public/robots.txt automatically.

Two kinds of AI bots, two different decisions

The bots on this page split into two camps. The first are training crawlers: GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider. Their job is to scrape your content into a dataset that's used to train future LLMs. The choice with these is binary: opt in (do nothing) and your content becomes training data; opt out (block them) and it doesn't.

The second camp are citation crawlers: ChatGPT-User, Claude-Web, Perplexity-User, OAI-SearchBot. Their job is to fetch your page at the moment a user asks an AI about you, so the answer can cite your URL. Blocking these costs you the referral traffic that AI answers can send when they cite specific sources.

The default preset ("Block AI training") blocks the first camp and leaves the second alone. That's the recommended stance for most sites: control your training data footprint, keep your citation traffic. The strict preset blocks both. Custom toggles let you pick per-bot.

Where to put the file

  • Vercel / Next.js / Astro / SvelteKit: drop into /public/robots.txt in your repo. Deploys to your domain root automatically.
  • Netlify / Cloudflare Pages: same as above (/public/robots.txt or framework-equivalent static directory).
  • WordPress: use an SEO plugin (Yoast, Rank Math, All in One SEO) that supports custom robots.txt editing, or upload via FTP to the site root.
  • Custom Node / Express / Django / Rails: serve as a static asset from the root path. Most frameworks have a built-in static-files handler that picks it up automatically when placed in the public directory.
  • Verify it works: after deploying, visit yoursite.com/robots.txt in a browser. You should see the file contents as plain text. If it 404s, the file isn't at the right path.

FAQ

Will blocking AI bots hurt my Google search ranking?

No. The AI training bots (GPTBot, Google-Extended, ClaudeBot, CCBot) are separate from Googlebot, Bingbot, and other regular search crawlers. Google-Extended specifically is a pseudo-bot: it doesn't crawl your site, it just tells Google whether to USE the content Googlebot already fetched for Gemini training. Blocking Google-Extended doesn't change your Google Search ranking. Same with Applebot-Extended (separate from Applebot for Spotlight/Siri). Regular search crawlers stay allowed via the User-agent: * Allow: / fallback the tool includes by default.

What's the difference between training bots and citation bots?

Training bots (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, CCBot, Bytespider) scrape the open web to build the dataset future LLM versions are trained on. Citation bots (ChatGPT-User, Claude-Web, Perplexity-User, OAI-SearchBot) fetch your site at the moment a user asks about you, so the AI's answer can cite your URL and send you traffic. Most sites want to block training (preserve content control) but allow citation (keep referral traffic). That's the 'Block AI training' preset.

Does robots.txt actually work? Can't bots just ignore it?

Major bots from major companies (OpenAI, Anthropic, Google, Apple, Meta, Common Crawl) honor robots.txt as a matter of public policy and have published documentation on the user-agent strings to use. Some bots are reported to ignore it: Perplexity has been called out by 404 Media and others for fetching blocked content via Perplexity-User even when PerplexityBot is disallowed; Bytespider has a history of slow compliance. For stronger control on top of robots.txt: Cloudflare offers AI bot blocking at the firewall layer, you can server-side filter by user-agent and 403 known AI bots, or rate-limit aggressive crawlers. robots.txt is the polite baseline; harder controls layer on top.

What about bots that aren't on this list?

The list covers the 22 most-active and most-discussed AI crawlers in 2026. New bots appear regularly (new model labs, new scraping startups). The tool defaults to a 'Allow all other crawlers' fallback, meaning your robots.txt only specifically blocks what's listed; everything else can crawl unless explicitly blocked. Watch dark visitors logs and add specific User-agent entries if you see new ones. The list updates as new bots become known; check back periodically for new entries.

Will this work on any website host?

Yes. robots.txt is the universal web crawler standard, supported by every host. Drop the generated file at the root of your domain so it's accessible at /robots.txt. For Vercel, Netlify, Cloudflare Pages, and most static-site hosts, that means /public/robots.txt in your repo. For WordPress, use a robots.txt SEO plugin or your hosting control panel. For server-rendered apps, serve it as a static asset from the root path.

Why include explanatory comments by default?

Two reasons. First, comments make the file maintainable: in six months when you revisit it, you'll know why each block exists. Second, comments are a positive signal: if an AI company audits why their bot is blocked, the comment 'Block AI training' is clearer than just a User-agent line, and may inform future bot-naming or opt-out conventions. The comments add maybe 50 lines to a 100-line file; turn them off if you prefer a minimal output.

I want stronger blocking than robots.txt. What else can I do?

robots.txt is a request, not enforcement. For real enforcement: (1) Cloudflare's 'AI bot blocking' feature blocks known AI crawlers at the edge regardless of robots.txt. (2) Server-side user-agent filtering: drop or 403 any request whose User-agent matches a known AI bot. (3) Rate limiting: AI scrapers are typically aggressive; aggressive rate-limits hurt them more than humans. (4) Authentication: if your content is genuinely sensitive, gate it behind a login. (5) Watermarking: embed canary phrases that only your content has, search for them in chatbot outputs to detect training-data inclusion.

Companion tools: the AI Slop Detector (paste URL, get visual AI-template score) and the AI Tell Killer (paste-in prompts that stop AI writing tells).