How to Get Indexed by Perplexity

Want to know if AI search engines can actually reach and read your site? Check it free. Run the AI visibility check.

Perplexity has to reach the page first

A useful page can miss every Perplexity answer if the crawler gets a 403, sees a blank app shell, or is blocked by one line in robots.txt.

Start with access. Perplexity uses PerplexityBot to surface and link websites in Perplexity search results. If you want a page to be eligible, allow PerplexityBot in robots.txt and make sure your CDN, WAF, rate limiter, and bot filter do not block its requests. Perplexity publishes IP ranges for its bots, so use those official ranges when you build allow rules.

Check /robots.txt for a rule that disallows PerplexityBot or a broad rule that catches all bots.
Return a clean 200 status for the page. Avoid login walls, country blocks, broken redirects, soft 404s, and accidental noindex.
Keep the canonical URL stable. Do not point the canonical tag at a weaker duplicate.
Link to the page from your site and include it in a sitemap, so crawlers can discover it without guessing.
Run a real fetch from outside your network. A browser view from your desk does not prove a crawler can read the page.

If you want a quick pass on AI crawler access, use the free AI visibility checker. It helps when robots.txt, headers, and CDN rules disagree.

Allow the right AI search crawlers

Training bots and answer bots are different. If you block the search bot, that engine loses the normal path to read and cite your page.

The crawlers that decide whether you appear in AI answers are OAI-SearchBot for ChatGPT search, Claude-SearchBot for Claude, PerplexityBot for Perplexity, Googlebot for Google AI Overviews, and Applebot for Apple Intelligence. Disallowing these in robots.txt removes you from that engine. Google AI Overviews ride the normal Google Search index. There is no separate opt-out crawler for them.

By contrast, GPTBot, ClaudeBot, CCBot, Google-Extended, and Applebot-Extended are training or opt-out controls. Blocking them does not affect live AI-search visibility. Google-Extended and Applebot-Extended are robots-only control tokens with no separate crawl user-agent.

Keep the file plain. A short allow policy for answer visibility is easier to audit than a copied block list. If you block training, keep that in its own section and review vendor docs before changing search crawler rules.

Allow PerplexityBot for Perplexity citations.
Allow OAI-SearchBot if you want ChatGPT search visibility.
Allow Claude-SearchBot if you want Claude search visibility.
Allow Googlebot if you want Google Search and AI Overview eligibility.
Allow Applebot if you want Apple search and Apple Intelligence visibility.

Robots.txt is a site's stated policy, not proof of behavior. Perplexity-User and Bytespider are reported to ignore it, so do not use robots.txt alone to claim what a bot did. Use server logs, verified IP ranges, and response codes when you need evidence.

Make the page easy to extract

Perplexity cites pages that answer a question in text it can read, split, and quote with confidence.

Write the answer on the page, not inside an image, PDF, hidden widget, or script that fills the page after a click. Of the crawlers above, Googlebot is the one that documents JavaScript rendering. For the other AI crawlers, client-side-only content is an undocumented risk. Server-render the main answer when you can, and make sure the raw HTML includes the facts a crawler needs.

Answer the target question near the top in plain language.
Use specific headings, short sections, and named steps.
Define terms once. Do not assume the answer engine knows your product names or acronyms.
Show dates for time-sensitive claims and update the page when vendor rules change.
Match structured data to visible text. Do not add schema that says more than the page says.
Remove thin boilerplate above the answer. Crawlers and readers both need the useful part fast.

For this query, the useful answer is direct: allow PerplexityBot, remove crawl and WAF blocks, publish server-readable text, link the page internally, and make the content worth citing. Perplexity may still choose another source. Indexing and citation are never guaranteed.

Keep email trust clean too

Perplexity indexing and email delivery are separate systems, but the same domain often carries your site, sales mail, alerts, and password resets.

Mailbox providers such as Gmail and Outlook route mail to spam when authentication fails, the sender has weak reputation, recipients complain, links look risky, volume spikes fast, or the message resembles abuse. The core DNS checks are SPF, DKIM, and DMARC.

SPF is a TXT record that lists which servers may send mail for a domain. Keep one SPF record, stay under the 10 DNS lookup limit, use ~all while tuning, and move toward -all only when every sender is covered.
DKIM signs mail with a private key. Receivers verify it with the public key published under a selector such as selector._domainkey.example.com. Each real sending service should sign with a domain that aligns with your visible From domain.
DMARC checks whether SPF or DKIM aligns with the visible From domain. Start with p=none and rua reports, then move to p=quarantine and p=reject after legitimate senders pass.
MX records route inbound mail. Bad or stale MX records can break replies, password resets, and report delivery.
Blocklists and reputation systems reflect recent sending behavior. Authentication helps, but it does not erase spam complaints or bad list hygiene.

Use the free domain scorecard to check SPF, DKIM, DMARC, MX, and blocklist basics. If you have DMARC aggregate reports and need to see which services are passing or failing, read them with the free DMARC report reader.

Publish the page like a source

A page worth citing gives an answer engine a clear reason to trust one passage over another.

Use names, numbers, and sources where they matter. Say which crawler controls live AI answers and which controls training. Say what you checked. If a vendor changes a bot name or policy, update the page and the date. Avoid claims that sound wider than your evidence, such as saying a bot obeyed or ignored a rule because robots.txt said so.

Keep one primary URL for the guide.
Add internal links from related guides, especially from your main article hub.
Use descriptive anchor text, such as AI visibility checker, robots.txt audit, or DMARC report reader.
Make the page fast and readable without cookie banners covering the first screen.
Watch logs after publishing for PerplexityBot fetches, status codes, and blocked requests.

The practical test is simple. If a crawler can fetch the page, read the main answer in HTML, understand the date, and follow internal links to related proof, you have done the work that makes Perplexity indexing possible.

Official references

Use vendor and standards docs for crawler and mail-auth changes, because bot names and sender rules change. Start with Perplexity crawler docs, OpenAI crawler docs, Anthropic crawler docs, Google AI features in Search, Applebot docs, and Common Crawl CCBot docs. For email authentication, use RFC 7208 for SPF, RFC 6376 for DKIM, RFC 7489 for DMARC, Google sender guidelines, and Microsoft Outlook sender requirements.

FAQ

How do I get Perplexity to index my site?

Allow PerplexityBot in robots.txt, allow its official IP ranges through your WAF, return a normal 200 response, avoid noindex, publish text in the HTML, and link the page from your site. Then give Perplexity time to crawl. There is no guaranteed submit button for citation.

Should I block GPTBot if I want Perplexity citations?

GPTBot is an OpenAI training crawler. Blocking GPTBot does not control Perplexity. For Perplexity visibility, focus on PerplexityBot. For ChatGPT search visibility, focus on OAI-SearchBot.

Can I block AI training and still appear in AI search?

Often, yes. Keep live search crawlers allowed, such as PerplexityBot, OAI-SearchBot, Claude-SearchBot, Googlebot, and Applebot. Use separate training controls such as GPTBot, ClaudeBot, CCBot, Google-Extended, and Applebot-Extended according to each vendor's docs.

Does robots.txt prove that Perplexity obeyed my rule?

No. Robots.txt states your site policy. It is not a behavior log. Use server logs, verified IP ranges, user agents, and response codes when you need to know what happened.

Do SPF, DKIM, and DMARC affect Perplexity indexing?

They do not directly decide whether Perplexity indexes a web page. They do affect whether mail from your domain reaches inboxes. Gmail, Outlook, and other mailbox providers use authentication, alignment, reputation, complaints, and content signals when they decide inbox or spam placement.