How AI bots really crawl your site (what we learned in May 2026)

Six different AI bots crawled our site in May. They do not behave like Google. Here is exactly which page types each one picks and what that means for your AI visibility.
1. Methodology and the caveat we owe you
We pulled IIS basic logs and Cloudflare logs from trendos.io for a few days in mid-May 2026. We filtered out attacker probes (anything hitting .env, wp-*, xmlrpc, phpunit, and similar paths) and our own office requests. What was left: 6 named AI/LLM platforms, plus a handful of suspiciously thorough "human" visitors.
This is a few days of data from one B2B SaaS site. We are not claiming statistical significance. We are claiming this is the most honest look at AI bot behavior most teams will get before they have to make decisions about robots.txt, content priorities, and AI visibility tooling. Treat it as a directional read, not a benchmark.
To keep this useful without handing competitors a recipe, we share ratios and percentages, not raw traffic. The pattern is what matters. We will run this monthly. If you want the longitudinal version for your own brand, that is what our AI Visibility tracker is for.
2. The 6 bots and what they fetched

| Bot | Share of AI bot traffic | Behavior pattern | Page categories fetched |
|---|---|---|---|
| ChatGPT-User (OpenAI live answer) | ~46% | High-volume live-answer fetch on user demand | Homepage, dated listicle, category landing, feature pages, product vertical |
| Meta AI | ~37% | Live-answer fetch, also grabs visual assets | Homepage, feature pages, static logo asset |
| OpenAI GPTBot (training) | ~6% | Sparse, scheduled, narrow page set | Feature pages only |
| Applebot | ~6% | Polite, robots.txt-first | A small canonical set |
| Perplexity | ~3% | Sparse, live-answer fetch | One feature page, one vertical page |
| Amazon Bot | ~3% | Sparse index pulls | Canonical pages |
Two patterns jump out. First, the two live-answer crawlers from OpenAI and Meta together took roughly four out of every five AI bot hits on our site. If your brand is not showing up in those two surfaces today, you are missing the majority of the AI referral funnel. Second, the page sets these bots pulled were almost completely different from each other. There was almost no overlap. Each bot is composing its own answer from its own slice of your site.
A small detail with a big implication: in this sample, ChatGPT-User fetched roughly 16x more pages than Perplexity, and the top engine pulled around 6x more pages than the bottom-quartile engines. AI bot traffic is not evenly distributed. Optimizing for ChatGPT first is not a stylistic choice. It is where the volume is.
3. Indexing crawlers vs answer crawlers (the split that matters)
Group the 6 bots into two columns and the strategy gets clearer.
Indexing and training crawlers
GPTBot, Applebot, Amazon Bot. These show up on a schedule, hit a few canonical URLs, and leave. They are filling a corpus that gets used later when a user asks a question. They do not care about urgency.
Live-answer crawlers
ChatGPT-User, Meta AI, Perplexity. These hit the site because a real human just asked a question in the chat window. The model decided your URL was worth a fresh read before composing the answer. This is the high-intent traffic. Every fetch here represents a real user, in a real session, asking about your topic, right now.
Indexing crawlers govern whether you exist in the model's memory. Answer crawlers govern whether you show up at decision time. Both matter. The work to win each is different.
The split is uneven. In our sample, live-answer crawlers accounted for roughly 86% of total AI bot traffic. Indexing crawlers were the remaining 14%. If you only had budget to optimize for one, optimize for the answer crawlers. Indexing is a slower lever, and you cannot control what the model remembers. Answer crawlers reward you the same week you ship the change.
4. Which page types AI bots prefer (and which they completely ignore)
We sorted every fetched URL by directory and matched the pattern.
What AI bots pulled across the window:
- The homepage (
/) - Feature pages with proper-noun slugs (
/features/chatgpt-visibility-tracker,/features/gemini-visibility-tracker,/features/real-time-competitor-alerts,/features/ad-intelligence,/features/update-frequency) - A dated listicle blog post (
/blog/ecommerce-competitive-intelligence-tools-2026) - A category-keyword landing page (
/landing/ecommerce-competitive-intelligence) - A comparison-style URL at the root (
/ecommerce-competitive-intelligence) - A vertical page (
/d2c-competitors) - Static brand assets (
/static/logo.webp, grabbed by Meta AI for visual answer cards)
What no AI bot touched, across all 6 platforms, in the entire window:
- Index pages (e.g.,
/personas/) - Any
/solutions/*page - Static contact and legal pages (e.g.,
/contact,/privacy)
This is the finding I would have lost money betting against. We spent serious time building out persona pages and solution pages. They are well written. They are linked from the nav. Not one AI bot fetched any of them. Meanwhile, every feature page with a competitor or product name in the slug got pulled.
The lesson is uncomfortable but clean. AI models pull pages that look like answers to questions a user just typed. "How does Trendos track ChatGPT visibility?" maps to /features/chatgpt-visibility-tracker. "Solutions for marketers" does not map to anything a real human types into a chat window. Persona and solution pages are sales artifacts. AI bots are not your sales team.
5. The Applebot anomaly
One pattern was unique enough to call out. Applebot was the only AI bot in the dataset that checked /robots.txt before crawling. The other 5 went straight to content.
This is consistent with Apple's broader posture (slow, deferential, careful about legal exposure), but it has a practical implication. If you want to control how a specific AI sees your site through robots.txt, Applebot is the only one in this sample that will read your instructions. GPTBot is supposed to. In this window, it did not. Meta AI is not supposed to and did not.
If your AI strategy depends on robots.txt, you are writing notes to one bot out of six. Plan accordingly.
6. What this means for your AI visibility strategy
Three plays we would bet money on, even off this small a sample.
Play 1: Name the thing in your URL
Feature pages with specific, named slugs outperform abstract category pages. /features/chatgpt-visibility-tracker got pulled. /solutions/marketing did not. If your roadmap calls for a page about Shopify pricing intelligence, the URL is /features/shopify-pricing-intelligence, not /solutions/retail. Make the slug match the question a real user types.
Play 2: Date your listicles
Putting 2026 in your listicle URL is not gimmicky. It tells the model the content is current. A live-answer crawler fielding "best X tools right now" is choosing between a dozen candidate URLs. The one with this year in the slug wins the tiebreaker. Roll the year over with a 301 when it changes.
Play 3: Treat your homepage as an answer source
Both top live-answer crawlers in our sample fetched the root. The first 500 words of your homepage should describe what your product does, in plain sentences a model can quote back to a user without editing. Mission statements and hero taglines do not survive that round trip. Direct product descriptions do.
A fourth one we are less certain about but want to flag: there is almost zero overlap between which URLs each AI bot pulled. Winning ChatGPT and winning Perplexity are not the same project. You may need different content for each surface. We will report back when we have more data.
7. Want to see this for your own site?
We built this report because we built the tooling for ourselves. Customers kept asking for the same picture on their domain. So yes. The Trendos AI Visibility tracker watches which AI bots hit which pages on your site, names the gaps against your competitors, and gives you the next 3 pages to ship to close them.
The takeaway
AI bots are not vacuum cleaners. They are precision tools that pull a small set of pages per session, with sharply different preferences across operators. Optimize the pages that look like answers, drop the ones that look like sales artifacts, and accept that ChatGPT-first is where the AI volume lives.
Want a free read on your own site?
Grab 10 minutes with us at calendly.com/meet-trendos. We will pull your last 30 days of AI bot traffic before the call: your data, your URL list, your gaps against your named competitors. No deck. No "discovery questions". If it is not useful, the coffee was free.
Book a 10-minute coffee