robots.txt
Definition
robots.txt is a text file at the root of a website (yourdomain.com/robots.txt) that tells web crawlers which pages they can and cannot access. For AI visibility, it's critical to ensure AI crawlers (GPTBot, ClaudeBot, PerplexityBot) are not blocked.
robots.txt is a plain text file at the root of every website (yourdomain.com/robots.txt) that instructs web crawlers which pages they can and cannot access. For AI search visibility, robots.txt configuration is the single most impactful technical factor — blocking an AI crawler makes your entire site invisible to that AI engine.
The file uses a simple syntax: User-agent directives specify which crawler the rules apply to, followed by Allow or Disallow rules for specific URL paths. A blanket block (`User-agent: GPTBot` followed by `Disallow: /`) prevents the named crawler from accessing any page on the site.
The AI crawlers you need to know about: GPTBot (OpenAI, general knowledge indexing), OAI-SearchBot (ChatGPT Search real-time browsing), ChatGPT-User (ChatGPT browsing for individual users), PerplexityBot (Perplexity's real-time search), ClaudeBot (Anthropic's Claude), Google-Extended (controls Gemini training data, separate from Googlebot), and Bingbot (used by both Bing and indirectly by Perplexity).
During 2023-2024, many publishers reflexively blocked AI crawlers due to concerns about training data scraping. This created a widespread problem: sites ranking well on Google but invisible to AI search engines. Checking robots.txt takes 30 seconds (visit yourdomain.com/robots.txt), and removing unnecessary blocks takes under 5 minutes.
Best practice for most sites: allow all AI crawlers access to public content pages. If you have concerns about specific content being used for training, block Google-Extended (training only) while keeping GPTBot (citation) and PerplexityBot (real-time search) allowed. This preserves AI search visibility while limiting training data usage.
Learn More
Related Terms
Check Your AI Search Visibility
TurboAudit audits 250+ signals across 7 dimensions — including robots.txt — in about 2 minutes. Free to start.
Get Started Free