Back to Blog
llms.txt guidehow to create llms.txtllms.txt SEO

llms.txt: The Complete Guide to Creating, Testing & Deploying Your AI Sitemap

Complete guide to llms.txt — the new standard for telling AI engines what to index. Create, test, and deploy your llms.txt file with templates and CMS guides.

Jean-Jacques Pierre16 min read
Published: March 6, 2026Last updated: March 6, 2026

robots.txt tells Google where NOT to go. llms.txt tells AI engines where TO go. Big difference.

For over 30 years, webmasters have used robots.txt to manage crawler access. Then came sitemap.xml to help search engines discover pages. But neither of these files was designed for the new wave of AI-powered search engines — the ones that don't just index your pages, but actually read and understand them.

ChatGPT, Perplexity, Gemini, Claude, Grok — these engines process content differently from Googlebot. They don't care about your canonical tags or your XML namespace declarations. What they need is a clean, structured summary of what your site is about, what your most important pages are, and how they relate to each other. That is exactly what llms.txt provides.

The concept was proposed by Jeremy Howard, founder of fast.ai and one of the most influential figures in practical AI, in late 2024. His premise was simple: if we have a standard way to tell traditional crawlers what to do (robots.txt), we should have a standard way to tell AI engines what to understand. The result is a plain Markdown file, hosted at your site root, that gives language models a structured overview of your entire website in a format they can consume in a single pass.

This guide covers everything you need to know about llms.txt: what it is, how to create one, how to deploy it on every major platform, how to test it, and the mistakes that will make your file useless. By the end, you will have a production-ready llms.txt file for your site.

What Is llms.txt?

llms.txt is a plain-text Markdown file placed at the root of your website (https://yoursite.com/llms.txt) that provides a structured summary of your site for large language models and AI crawlers. It was created by Jeremy Howard in 2024 as a response to a growing problem: AI engines were struggling to understand websites that were optimized for traditional search but not for machine comprehension.

The idea behind llms.txt is simple: provide a curated, machine-friendly overview of a website's most important content in a format that language models can parse efficiently in a single context window. Unlike sitemaps that list every URL, llms.txt prioritizes comprehension over completeness.

The file uses standard Markdown syntax — headings, bullet points, links, and short descriptions — so that any LLM can parse it without special tooling. There is no XML, no JSON, no custom schema. Just clean, structured text that both humans and machines can read.

The purpose of llms.txt is not to replace your sitemap or robots file. It serves a different function entirely. While robots.txt controls access and sitemap.xml aids discovery, llms.txt enables comprehension. It answers the question that AI engines are always trying to resolve: "What is this website actually about, and which pages matter most?"

Think of it this way. Your website might have 500 pages, but only 20 of them truly define what your business does. A sitemap lists all 500. Your llms.txt highlights the 20 that matter, with context about what each one contains and why it's important. This is incredibly valuable for an LLM that has limited context window space and needs to quickly determine whether your site is a credible source on a given topic.

Adoption is accelerating. Companies like Anthropic, Cloudflare, Stripe, and dozens of major publishers have already deployed llms.txt files. AI search engines are increasingly checking for this file when crawling a new domain. While it is not yet an official IETF or W3C standard, it is following the same path as robots.txt — widespread community adoption first, formal standardization later.

llms.txt vs robots.txt vs sitemap.xml

These three files serve complementary roles in how search engines and AI models interact with your website. Understanding the difference is key to a complete technical SEO strategy in 2026.

Featurerobots.txtsitemap.xmlllms.txt
PurposeControl crawler accessHelp engines discover pagesHelp AI engines understand your site
Primary audienceTraditional crawlers (Googlebot, Bingbot)Traditional crawlersAI engines (GPTBot, PerplexityBot, ClaudeBot)
FormatPlain text (custom syntax)XMLMarkdown
Required?Strongly recommendedStrongly recommendedNot required (but increasingly expected)
What it tells bots"Do not crawl these paths""Here are all my pages and when they changed""Here is what my site is about and which pages matter most"
Includes descriptions?NoNoYes — context for every listed page

The key takeaway: you need all three. robots.txt handles access control, sitemap.xml handles discovery, and llms.txt handles comprehension. Skipping any one of them leaves a gap in how search engines — traditional or AI-powered — interact with your site.

Anatomy of a Perfect llms.txt File

A well-structured llms.txt file follows a consistent Markdown format that AI models can parse predictably. Below is a complete, annotated example that you can use as a starting template for any website.

# Acme Corporation

> Acme Corporation is a B2B SaaS company that provides project management
> software for remote teams. Founded in 2019, we serve over 10,000 companies
> across 45 countries. Our platform combines task management, time tracking,
> and team communication in a single workspace.

## About

- [About Us](https://acme.com/about): Our mission, founding story, and the
  team behind the product. Acme was founded to solve the communication
  breakdown that remote teams face when using fragmented tool stacks.
- [Careers](https://acme.com/careers): Open positions across engineering,
  design, and go-to-market. We are a remote-first company with 120 employees.

## Core Pages

- [Product Overview](https://acme.com/product): Complete feature breakdown
  of the Acme platform including task boards, Gantt charts, time tracking,
  team chat, and file sharing.
- [Pricing](https://acme.com/pricing): Three plans — Starter ($12/user/mo),
  Team ($24/user/mo), and Enterprise (custom). All plans include a 14-day
  free trial.
- [Integrations](https://acme.com/integrations): 80+ integrations including
  Slack, GitHub, Jira, Google Workspace, and Salesforce.
- [Security](https://acme.com/security): SOC 2 Type II certified. Details
  on encryption, data residency, SSO, and compliance certifications.

## Key Content

- [Remote Team Playbook](https://acme.com/blog/remote-team-playbook): A
  comprehensive guide to managing distributed teams, covering async
  communication, meeting cadence, and performance tracking.
- [Project Management Guide](https://acme.com/blog/pm-guide): How to set
  up effective project workflows using Agile, Scrum, or Kanban methodology.
- [Case Studies](https://acme.com/case-studies): Real results from
  customers including how Stripe reduced meeting time by 40% using Acme.

## Contact

- Email: hello@acme.com
- Twitter: https://twitter.com/acmecorp
- LinkedIn: https://linkedin.com/company/acmecorp
- Support: https://acme.com/support

Let's break down each section and why it matters.

The heading (# Acme Corporation) — This is your brand name. AI models use this as the primary identifier for your entity. It should match your Organization schema and your Google Business Profile name exactly. Consistency across all machine-readable signals is critical for entity recognition.

The description block (> blockquote) — This is the most important paragraph in your entire file. It is the first thing an AI model reads, and it often becomes the basis for how that model describes your business in generated answers. Write it as if it were the answer to "What is [Your Company]?" Include your industry, founding year, scale (customers, employees, revenue if public), and your core value proposition. Keep it to 2-4 sentences.

The ## About section — Links to pages that establish your identity and credibility. AI models use these to build a trust profile. Include your about page, team page, and careers page if applicable. The descriptions after each link should explain what the page contains, not just repeat the page title.

The ## Core Pages section — Your most commercially important pages. Product pages, pricing, features, integrations — the pages that define what you sell and how much it costs. AI engines reference these when users ask comparison or purchasing questions. Include specific details (number of integrations, plan prices, certifications) directly in the descriptions.

The ## Key Content section — Your best, most authoritative content pieces. Blog posts, guides, case studies, whitepapers — the content that establishes you as an expert in your space. AI models use these to determine whether to cite you as a source on a given topic. Prioritize quality over quantity: list 5-10 of your absolute best pieces, not your entire blog archive.

The ## Contact section — Trust signals. Email, social profiles, support page. AI models use this to verify that your site represents a real, contactable business. It also helps models provide accurate contact information when users ask how to reach you.

How to Create Your llms.txt (Step by Step)

Creating an effective llms.txt file is not about dumping your sitemap into Markdown. It requires strategic thinking about which pages matter most and how to describe them for an AI audience. Follow these five steps.

Step 1: List Your Essential Pages

Open a blank document and list every page on your site that you would want an AI engine to know about. Start broad, then ruthlessly cut. A good llms.txt file typically includes 15-30 pages for a small to medium site, and 30-60 for larger sites.

Start with these categories:

  • Identity pages: Homepage, About, Team, Careers, Contact
  • Commercial pages: Product/Service pages, Pricing, Features, Integrations, Case Studies
  • Trust pages: Security, Privacy Policy, Terms of Service, Certifications
  • Content pages: Your top 5-10 blog posts or guides by traffic and quality

Do not include every blog post, every product variation, or every support article. Include the pages that a smart human would want to read to fully understand your business in 15 minutes. That is the standard: if a venture capitalist, a journalist, or a potential enterprise customer wanted to understand your business quickly, which pages would you send them? Those are your llms.txt pages.

A practical method: go to your Google Analytics or Google Search Console, sort pages by traffic, and pick the top 20. Then add any strategically important pages that might not get high traffic but are critical to understanding your business (like your security page or a key case study).

Step 2: Write a Description for Each Page

This is where most people fail. They list URLs without context and wonder why AI engines don't cite them. The descriptions are what make llms.txt powerful. Without them, you are giving AI models a list of links — which they can already get from your sitemap.

For each page, write a 1-2 sentence description that answers: "What will someone learn or find on this page?" Include specific numbers, facts, and differentiators. Compare these two approaches:

Bad: [Pricing](https://acme.com/pricing): Our pricing page.

Good: [Pricing](https://acme.com/pricing): Three plans — Starter ($12/user/mo), Team ($24/user/mo), and Enterprise (custom). All plans include a 14-day free trial and unlimited projects.

The good description gives an AI model actual facts it can use in a generated answer. When a user asks ChatGPT "How much does Acme cost?", the model can pull the pricing directly from your llms.txt description without even needing to crawl the pricing page. That is the power of well-written descriptions.

Write each description from the perspective of answering a question. What would a user be asking when they need this page? Answer that question in the description.

Step 3: Structure in Markdown with Standard Sections

Use the section structure from the template above. The standard sections that AI models expect are:

  • # Brand Name — H1 heading with your exact brand name
  • > Description blockquote — 2-4 sentence company overview
  • ## About — Identity and credibility pages
  • ## Core Pages — Main product/service/commercial pages
  • ## Key Content — Best content pieces and resources
  • ## Contact — Contact info and social profiles

You can add additional sections if they make sense for your site. An e-commerce store might add ## Top Products. A SaaS company might add ## Documentation. A publisher might add ## Topics to list their editorial verticals. The key is that every section should use an ## H2 heading and contain Markdown links with descriptions.

Keep the Markdown clean. No HTML tags, no embedded images, no tables. Pure Markdown that any parser can handle. AI models are excellent at parsing standard Markdown but can stumble on complex or non-standard formatting.

Step 4: Add Trust Signals

Your ## Contact section should include every public touchpoint for your business. AI models cross-reference these signals with other data sources to verify that your site represents a legitimate entity.

  • Business email (not a personal Gmail)
  • Social media profiles (Twitter/X, LinkedIn, GitHub if applicable)
  • Physical address or region (if you have one)
  • Support or help desk URL
  • Phone number (if public-facing)

The more verifiable contact points you include, the stronger the trust signal. This is especially important for newer or smaller businesses that may not yet have strong backlink profiles or established domain authority.

Step 5: Host at Root and Verify Access

Your llms.txt file must be accessible at https://yoursite.com/llms.txt. Not in a subdirectory. Not behind authentication. Not returning a redirect. A direct 200 OK response with the file contents.

After deploying, verify access by opening the URL in your browser. You should see the raw Markdown text. Then check that your robots.txt does not block access to /llms.txt. It sounds obvious, but we have seen sites that deploy a perfect llms.txt file and then block it with a blanket Disallow: /*.txt$ rule in robots.txt.

Also verify the content type. The server should return text/plain or text/markdown. If your hosting platform returns text/html with HTML wrapping, AI crawlers may not parse the file correctly.

Setup by Platform

The deployment process varies depending on your tech stack. Here is how to get llms.txt live on the four most common platforms.

WordPress

WordPress does not natively support serving arbitrary text files from the site root, but there are two clean approaches.

Option A: FTP/File Manager upload. Upload your llms.txt file directly to the root directory of your WordPress installation (the same directory where wp-config.php lives) using FTP, SFTP, or your hosting provider's file manager. This is the simplest method and works on every WordPress host. The file will be served directly by the web server (Apache or Nginx) without going through WordPress at all.

Option B: Plugin. Several SEO plugins now support llms.txt management. Rank Math and Yoast have added experimental support. You can also use a simple custom plugin or a code snippet in your functions.php that intercepts the /llms.txt route and serves your content:

// Add to functions.php or a custom plugin
add_action('init', function() {
    if ($_SERVER['REQUEST_URI'] === '/llms.txt') {
        header('Content-Type: text/plain; charset=utf-8');
        readfile(ABSPATH . 'llms.txt');
        exit;
    }
});

If you use a caching plugin like WP Super Cache or W3 Total Cache, make sure /llms.txt is either excluded from caching or that the cache is cleared after you update the file.

Next.js

Next.js gives you two excellent options for serving llms.txt.

Option A: Static file in /public. Place your llms.txt file in the public/ directory of your Next.js project. Next.js serves everything in public/ at the site root automatically. Your file will be available at https://yoursite.com/llms.txt with zero configuration. This is the simplest approach and works perfectly for files you update manually.

Option B: Dynamic route handler. If you want to generate your llms.txt dynamically (for example, pulling page descriptions from a CMS), create an API route:

// app/llms.txt/route.ts
import { NextResponse } from 'next/server'

export async function GET() {
  const content = `# Your Brand Name

> Your company description here.

## Core Pages

- [Product](/product): Description of your product page.
- [Pricing](/pricing): Description of your pricing page.

## Key Content

- [Blog Post Title](/blog/post-slug): Description of the post.
`

  return new NextResponse(content, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Cache-Control': 'public, max-age=86400, s-maxage=86400',
    },
  })
}

The dynamic approach is what we use at Rankeo. Our llms.txt Generator builds the content from your site data and serves it through a route handler. If you host on Vercel, both approaches deploy seamlessly.

Shopify

Shopify does not allow you to place arbitrary files at the site root directly. You need a workaround.

The recommended approach: Create a new page in Shopify with the handle llms-txt (which gives you /pages/llms-txt), then set up a URL redirect from /llms.txt to /pages/llms-txt. Go to Online Store → Navigation → URL Redirects and add the redirect.

The downside is that this serves the content as HTML within your Shopify theme, not as plain text. To work around this, you can create a custom page template called page.llms-txt.liquid that outputs only the raw text without any theme wrapping:

{%- layout none -%}
{%- comment -%} page.llms-txt.liquid {%- endcomment -%}
{{ page.content | strip_html }}

Assign this template to your llms-txt page. The result will be a clean text response without headers, footers, or navigation. Not a perfect text/plain content type, but functional enough for AI crawlers to parse. Alternatively, if you use a Shopify app that supports custom routes or a reverse proxy like Cloudflare, you can serve the file as true plain text at the root path.

Webflow

Webflow offers limited options for serving raw text files, but there are workable solutions.

Option A: Custom code in the project settings. Webflow does not allow direct file uploads to the root, but you can use Webflow's hosting settings with a reverse proxy (like Cloudflare Workers) to intercept requests to /llms.txt and return your Markdown content as plain text. This is the cleanest solution for Webflow sites.

Here is a Cloudflare Worker that serves your llms.txt:

export default {
  async fetch(request) {
    const url = new URL(request.url)
    if (url.pathname === '/llms.txt') {
      const llmsTxt = `# Your Brand Name

> Your description here.

## Core Pages
- [Home](https://yoursite.com): Main landing page.
- [Services](https://yoursite.com/services): What we offer.
`
      return new Response(llmsTxt, {
        headers: { 'Content-Type': 'text/plain; charset=utf-8' },
      })
    }
    // Pass everything else to Webflow
    return fetch(request)
  },
}

Option B: Subdomain approach. Host your llms.txt on a separate subdomain or service (like a simple Vercel or Netlify static site) and use a DNS-level redirect or proxy to serve it at the root path. This adds complexity but gives you full control over the file format and content type.

Webflow has indicated they plan to add native support for llms.txt in a future update. Until then, the Cloudflare Worker approach is the most reliable option.

How to Test Your llms.txt

Deploying your llms.txt file is only half the job. You need to verify that it is accessible, parseable, and actually being consumed by AI crawlers. Here are three tests you should run after every deployment or update.

Test 1: Direct Access

Open your browser and navigate to https://yoursite.com/llms.txt. You should see your raw Markdown content rendered as plain text. Check for these specific things:

  • HTTP status code: Must be 200 OK. Not 301, not 302, not 404. Use your browser's developer tools (Network tab) or a tool like curl -I https://yoursite.com/llms.txt to verify the status code.
  • Content type: Should be text/plain or text/markdown. If you see text/html, your file is being wrapped in HTML by your server or CMS, which can confuse parsers.
  • No HTML wrapping: The response should be raw text, not your file content embedded inside a full HTML page with headers and footers.
  • Encoding: UTF-8. Check that special characters (accented letters, em dashes, smart quotes) render correctly.
  • All links working: Click every link in your llms.txt manually. Broken links are the most common issue we see, especially after site redesigns.

You can also test with curl from the command line to see exactly what a bot would receive:

curl -s -D - https://yoursite.com/llms.txt | head -20

This shows you the response headers and the first 20 lines of content. Verify the Content-Type header and check that the body starts with your Markdown content, not with <!DOCTYPE html>.

Test 2: The ChatGPT Test

This is the test that convinces people llms.txt actually works. Open ChatGPT (with browsing enabled) and type:

Read the file at https://yoursite.com/llms.txt and give me a summary
of what this company does, their main products, and their pricing.

If your llms.txt is well-written, ChatGPT will respond with an accurate, detailed summary that pulls directly from your descriptions. It will cite your specific pages, mention your pricing tiers, and describe your product in the terms you chose. This is the "wow" moment — you realize you are directly influencing how AI engines describe your business.

Try the same test with Perplexity, which aggressively uses llms.txt when available. Ask it a question that your llms.txt specifically answers, like "What does [Your Company] do?" or "How much does [Your Product] cost?". If the response matches your llms.txt descriptions, your file is working.

If the AI gives a vague or inaccurate response, your descriptions need work. Go back to Step 2 and rewrite them with more specific facts and details.

Test 3: Server Logs

Check your server access logs for requests to /llms.txt. Look for these user agents:

  • GPTBot — OpenAI's crawler (ChatGPT, GPT-based products)
  • PerplexityBot — Perplexity's crawler
  • ClaudeBot — Anthropic's crawler (Claude)
  • Google-Extended — Google's AI training crawler
  • Applebot-Extended — Apple's AI features crawler
  • cohere-ai — Cohere's crawler

If you use Cloudflare, Vercel Analytics, or any server-level logging tool, filter your logs for the /llms.txt path and examine the user agent strings. You should start seeing crawl activity within 1-2 weeks of deployment. If you see no activity after a month, double-check that your robots.txt is not blocking these bots.

You can also use the Rankeo Authority Checker to verify whether AI crawlers have indexed your llms.txt and how it affects your overall AI visibility score.

7 Mistakes That Make Your llms.txt Useless

After reviewing hundreds of llms.txt files across different industries, these are the seven most common mistakes that render the file ineffective.

1. Empty or too short (< 100 words). Some sites deploy a llms.txt that is just a brand name and three links. This gives AI models almost no useful signal. If your file contains fewer than 100 words, it is not providing enough context to influence how AI engines understand your site. A minimum of 200 words with descriptions is the threshold for a functional file.

2. No page descriptions (just URLs). Listing URLs without descriptions is the single most common mistake. A list of links is just a poor man's sitemap. The descriptions are what make llms.txt valuable — they give AI models the context they need to understand each page without crawling it. Every link should have a 1-2 sentence description that answers "What will someone find on this page?"

3. HTML format instead of Markdown. Your llms.txt should be pure Markdown. Not HTML. Not rich text. Not a JSON file renamed to .txt. AI models are optimized for Markdown parsing. If your file contains <h1> tags, <a href> links, or <div> wrappers, some parsers will handle it gracefully but others will choke on the tags and extract garbled content. Stick to standard Markdown syntax: # for headings, [text](url) for links, - for lists.

4. Blocked by robots.txt or Cloudflare. This is painfully common. Sites deploy a perfectly crafted llms.txt and then block AI crawlers in robots.txt with rules like User-agent: GPTBot / Disallow: / or have Cloudflare's bot protection set to block all AI bots. If you want AI engines to read your llms.txt, you must allow them access. Review your robots.txt and your CDN's bot management settings to ensure the major AI crawlers are not blocked.

5. Broken URLs. Every link in your llms.txt should return a 200 OK status. Links to pages that return 404, 301 redirect chains, or 500 errors undermine the credibility of your entire file. AI models may treat a file with broken links as unreliable and give it less weight. Run a link check after every update. You can use the Rankeo Schema Validator to catch broken URLs and formatting issues.

6. Not updated when site changes. Your llms.txt should reflect the current state of your site. If you redesigned your site six months ago, launched three new products, and moved your blog to a new URL structure, but your llms.txt still references the old pages, you are actively sending AI engines to dead ends. Treat llms.txt like a living document. Add it to your deployment checklist alongside your sitemap.

7. Too long (> 5,000 tokens). The opposite problem of being too short. Some sites try to include every single page, every blog post, every product variant. This defeats the purpose. AI crawlers have token limits on how much of a file they process in a single pass. If your llms.txt exceeds 5,000 tokens (roughly 3,500 words), the crawler may truncate it or skip it entirely. Be selective. Include your 20-30 most important pages, not your entire site inventory.

Generate Your llms.txt in 60 Seconds

Rankeo's llms.txt Generator analyzes your site, identifies your key pages, writes AI-optimized descriptions, and exports a production-ready file. No manual work required.

Frequently Asked Questions