Is My Website Blocking AI Bots?

Want to know if AI search engines can actually reach and read your site? Check it free. Run the AI visibility check.

Quick answer

A single robots.txt line can keep a good page out of an AI answer. The common mistake is blocking the search bots while trying to block training bots.

If your question is, "is my website blocking AI bots," check the bots that power live AI search first. For AI answers, look for OAI-SearchBot for ChatGPT search, Claude-SearchBot for Claude, PerplexityBot for Perplexity, Googlebot for Google AI Overviews, and Applebot for Apple Intelligence. Disallowing these in robots.txt removes you from that engine.

Use the free AI visibility checker for a fast read on the exact tokens. Check again after a redesign, CMS migration, CDN change, WAF rule change, or agency SEO update. Robots.txt often changes without a clean handoff.

Check the right tokens

AI search and AI training use different controls. Mixing them up leads to bad SEO calls.

Allow OAI-SearchBot if you want ChatGPT search to crawl pages for answers.
Allow Claude-SearchBot if you want Claude to find and cite your site in search answers.
Allow PerplexityBot if you want Perplexity to surface and link to your pages.
Allow Googlebot if you want Google Search, including AI Overviews, to crawl your pages. Google AI Overviews ride the normal Search index. There is no separate opt-out crawler.
Allow Applebot if you want Apple search surfaces and Apple Intelligence to reach your pages.

Now check the training controls. GPTBot, ClaudeBot, CCBot, Google-Extended, and Applebot-Extended are training or opt-out controls. Blocking them does not affect live AI-search visibility. Google-Extended and Applebot-Extended are robots-only control tokens with no separate crawl user-agent.

Robots.txt is your stated crawl policy. It is not proof of what happened on your server. Perplexity-User and Bytespider have been reported to ignore robots.txt, so use logs as the evidence for requests. Do not claim a bot obeyed or broke a rule unless the logs show it.

Make the page readable

Permission is only the first check. The page also needs to be reachable and readable.

Open /robots.txt and look for a broad User-agent: * block that disallows key folders.
Check for bot-specific blocks above or below the wildcard group.
Make sure the page returns a normal 200 response without a login wall, country block, broken TLS, or WAF challenge.
Put the main answer, title, price, address, or product facts in the initial HTML when possible.
Keep canonical tags, noindex tags, and sitemap URLs clean so crawlers do not get mixed signals.

Only Googlebot documents JavaScript rendering in detail. Other AI crawler docs do not promise the same rendering path. Treat client-side-only content as a risk, not as a known failure for every bot. If the core copy appears only after JavaScript runs, move the main text into server-rendered HTML or static markup.

Do not confuse this with email blocking

Your website can be visible to AI search while your email still lands in spam. These are separate checks.

Mailbox providers such as Gmail and Outlook judge mail by authentication, domain and IP reputation, complaint rates, list quality, content, and sending patterns. Start with DNS. SPF is a TXT record that lists hosts allowed to send for your domain. Keep one SPF record, avoid +all, and remember the SPF 10-DNS-lookup limit from RFC 7208. Many teams use ~all while they find every real sender, then move toward -all once the sender list is correct.

DKIM signs each message with a selector and a private key at the sending service. The public key lives in DNS, as described in RFC 6376. If the selector is missing, stale, or not used by the sender, receivers cannot verify the signature. DMARC, defined in RFC 7489, checks whether SPF or DKIM passes and aligns with the visible From domain. Use p=none to collect rua reports, then move to quarantine or reject after reports show your real mail passes.

MX records tell the world where to deliver mail for the domain. Blocklists can also hurt inbox placement, especially after a compromised account or a bad list import. If you need the email side checked, the free domain scorecard reads SPF, DKIM, DMARC, MX, and blocklist signals and shows the highest-impact fix first.

What to fix first

Fix the rule that blocks important pages before polishing small SEO details.

If AI search bots are disallowed, decide which engines you want to appear in and allow those user-agent tokens.
If training bots are blocked, leave them blocked if that matches your policy. They do not control live AI-search visibility.
If the page is client-side-only, add server-rendered copy for the core answer.
If email is failing, fix SPF, DKIM, and DMARC before chasing copy tweaks or warmup myths.
Recheck after DNS, CDN, CMS, WAF, or robots.txt changes.

For source material, use RFC 7208 for SPF, RFC 6376 for DKIM, RFC 7489 for DMARC, Google and Microsoft sender guidelines, and crawler docs from OpenAI, Anthropic, Perplexity, Google, Apple, and Common Crawl. When a vendor does not document a behavior, say that plainly.

Common questions

Does blocking GPTBot remove my site from ChatGPT search?

No. GPTBot is for training. ChatGPT search uses OAI-SearchBot for search indexing. If you want ChatGPT search visibility, do not disallow OAI-SearchBot.

Does Google have a special AI Overviews crawler?

No. Google AI Overviews use the normal Google Search index. Robots.txt rules for Googlebot affect Google Search surfaces, including AI Overviews.

Can robots.txt prove an AI bot stayed away?

No. Robots.txt states your crawl policy. Your server logs show requests. Some bots and fetchers have been reported to ignore robots.txt, so do not treat the file as proof of behavior.

Should I block training bots but allow search bots?

That is a common setup. Allow the search bots you want in AI answers, then use GPTBot, ClaudeBot, CCBot, Google-Extended, and Applebot-Extended for training or usage opt-outs.

Why mention SPF, DKIM, and DMARC on a crawler page?

Many teams check website visibility and email reach during the same domain audit. Robots.txt affects crawling. SPF, DKIM, DMARC, MX records, and blocklists affect inbox placement.