How to Create an llms.txt File (Step by Step)

How to Create an llms.txt File (Step by Step)

A hands-on guide on how to create llms.txt: required sections, root hosting, formatting rules, and validation steps so AI crawlers and LLMs can read your site.

llms.txtai crawlersgeo optimizationai searchtechnical seo
June 2, 2026
10 min read
Diyanshu Patel
DP
Diyanshu PatelCo-Founder at GrowReddit

Founder at GrowReddit. Helps brands dominate Reddit through authentic community engagement and strategic marketing campaigns.

Connect on LinkedIn

Key Takeaways: Learning how to create llms.txt comes down to four moves: write a clear heading and summary, list your most important pages in titled sections with short descriptions, host the file as plain text at your domain root, and validate that crawlers can actually fetch it. The file is a curated, human-readable map of your best content written in Markdown, not code, so AI models can find and cite the right pages. It is a companion to robots.txt, not a replacement, and it pairs with strong off-site signals like Reddit discussions to maximize AI visibility. Keep it short, accurate, and updated. Below is a step-by-step build with a sample structure, hosting rules, and a validation checklist.


What sections go in an llms.txt file?

An llms.txt file needs a single top-level heading with your brand name, a short summary describing what you do, and one or more titled link sections pointing to your most valuable pages. The format is plain Markdown: a heading, a summary line, optional context, then grouped lists of links with one-line descriptions.

The convention follows a loose but consistent shape. Rather than paste raw markup, here is each part explained field by field so you can assemble it in any text editor.

SectionPurposeRequiredExample contents
Brand headingThe single top-level title naming your company or productYes"Acme Analytics"
Summary blockquoteOne or two sentences stating what you do and who you serveRecommended"Acme is a product analytics platform for B2B SaaS teams."
Context paragraphPlain notes that help a model interpret your linksOptionalPositioning, key differentiators, ideal customer
Core links sectionA titled list of your most important pages with descriptionsYesDocs, pricing, key guides, API reference
Secondary links sectionLower-priority but useful pagesOptionalBlog, changelog, integrations
Optional sectionPages a model may skip if context is tightOptionalLegal, careers, press

The single most important habit: every link gets a short, factual description. A model reading "Pricing: plans, seat limits, and annual discounts for SaaS teams" understands far more than a bare URL. Treat each description like a sentence you would be happy to see quoted in an AI answer. This is the same extraction-first writing principle we cover in our guide on what AI assistants look for in brand content.

How is llms.txt different from robots.txt and sitemaps?

The llms.txt file curates and summarizes your best content for comprehension, while robots.txt controls crawler access and a sitemap lists every URL for indexing. They solve three different problems and should coexist.

Think of it as a division of labor. Robots.txt is the bouncer deciding who gets in. The XML sitemap is the full directory of every room. The llms.txt file is the concierge handing a model a curated short list of where the good stuff is, with notes on what each page contains.

  • robots.txt governs which paths user agents may fetch and which they must avoid.
  • sitemap.xml enumerates all indexable URLs with metadata like last-modified dates for search engines.
  • llms.txt is a human-readable, curated index that helps language models understand and cite your most relevant pages.

If you want a deeper conceptual breakdown of the format and why it emerged, our companion explainer on what llms.txt is and why it matters for AI crawlers covers the standard itself. This guide stays focused on building and shipping the file.

How do you write the file step by step?

Build the file in five passes, top to bottom, in any plain text editor. The goal is a clean Markdown document you can read aloud and that accurately maps your site.

  1. Add the brand heading. Open with a single top-level Markdown heading containing only your company or product name. There should be exactly one of these.
  2. Write the summary. Directly under the heading, add a one-line blockquote summary: what you do, for whom, in plain language. This is the first thing a model reads, so make it precise.
  3. List your core pages. Create a section titled something like "Core" or "Documentation" and add bullet links to your highest-value pages. Use the Markdown link format where the anchor text is the page title and add a colon plus a short description after it.
  4. Group secondary content. Add additional titled sections for supporting material such as guides, integrations, or your blog. Keep descriptions to one factual line each.
  5. Mark optional material. Add a final "Optional" section for pages a model can safely skip when context is limited, such as legal or careers pages.

A worked example helps. For a fictional SaaS called Acme Analytics, the file would open with the heading "Acme Analytics," followed by the summary "Product analytics for B2B SaaS teams that need self-serve funnels and retention reports." A Core section would link the docs home, the API reference, and the pricing page, each with a one-line description. A Guides section would link your best tutorials. An Optional section would point to terms and the careers page.

Keep the whole file lean. A focused file of fifteen to forty curated links almost always outperforms a sprawling dump of every URL, because curation is the entire point. If a page would not help a model answer a real customer question, leave it out.

Should you also create an llms-full.txt file?

Create llms-full.txt only if you have substantial documentation you want models to ingest directly. Where llms.txt is a curated index of links, llms-full.txt concatenates the full text of those pages into one large plain-text document so a model can read the content without following links.

The trade-off is maintenance. A full-text file goes stale fast and can grow large enough to strain context windows. For most B2B and SaaS sites, the lighter llms.txt is the right starting point. Add the full version later if your docs are deep and you see AI assistants struggling to retrieve specifics. Start curated, expand only when there is a clear comprehension gap.

How do you host llms.txt correctly?

Host the file as plain text at your domain root so it resolves at yourdomain.com/llms.txt, returns an HTTP 200 status, and is served with a text content type. The root path is a convention, so placement matters as much as content.

Hosting rules differ slightly by stack, but the requirements are constant.

PlatformWhere to put the fileResulting URL
Static site or plain serverThe web root, alongside robots.txtyourdomain.com/llms.txt
Next.jsThe public folderyourdomain.com/llms.txt
WordPressThe site root directory via FTP or a file pluginyourdomain.com/llms.txt
Webflow or hosted buildersCustom code or a hosted file feature, mapped to the rootyourdomain.com/llms.txt

Three things break llms.txt files in practice, and all are avoidable. First, serving the file with an HTML content type instead of plain text, which happens when a framework wraps it in a page route. Second, placing it in a subfolder so it never resolves at the canonical root. Third, accidentally blocking the path in robots.txt or behind a login wall. Confirm the file is publicly fetchable, unauthenticated, and returns a clean 200.

How do you test that AI crawlers can read it?

Validate in four checks: load the URL in a browser, fetch it from the command line to confirm a 200 status and text content type, verify robots.txt does not block the path, and then test comprehension by asking an AI assistant a question your file should answer.

Run this checklist after every meaningful edit:

  • Browser check. Visit yourdomain.com/llms.txt directly. It should display as raw text, not render as a styled webpage.
  • Status and type check. Fetch the URL with a command-line tool and confirm the response is HTTP 200 with a plain-text content type, not an HTML or redirect response.
  • Access check. Open your robots.txt and confirm no rule disallows the llms.txt path for the AI user agents you care about, such as GPTBot, ClaudeBot, and PerplexityBot.
  • Comprehension check. Ask an AI assistant a question your file is designed to answer, like "what does Acme Analytics do and where is its API reference?" and see whether the answer reflects your summary and links.
  • Freshness check. Re-open the file after major site changes and prune dead links or outdated descriptions.

Crawler access is the step teams most often miss, because a perfect file is useless if the AI bots that read it are blocked at the door. Which user agents to allow, and how to manage their access without inviting scrapers you do not want, is its own topic. We cover it in detail in our breakdown of AI crawler access for GPTBot, ClaudeBot, and PerplexityBot. Pair that with this file and you have both the map and the open gate.

How does llms.txt fit into a wider AI visibility strategy?

An llms.txt file improves on-site comprehension, but on-site signals alone rarely get a brand cited in AI answers. The file makes your own pages legible; off-site discussion is what teaches models to recommend you in the first place.

Language models weight third-party, opinionated, community sources heavily when they answer "what is the best tool for X" style questions. That is why a curated llms.txt works best alongside earned presence in places models trust, especially Reddit. Our analysis of Reddit's role in AI search visibility explains why those threads carry disproportionate weight, and our walkthrough of how Reddit content becomes ChatGPT citations shows the mechanism end to end. For a structured program, the Reddit LLM visibility guide ties the on-site and off-site halves together.

A realistic sequence for a SaaS team looks like this: ship a clean llms.txt, confirm crawler access, then invest in genuine community presence so models have both a clear map of your site and strong external reasons to cite you. The file is necessary, not sufficient.

What does GrowReddit do for you?

If you would rather not assemble, host, and maintain this yourself, GrowReddit runs it as a managed service. We build and validate your llms.txt and crawler configuration, then drive the off-site Reddit and AI-visibility work that actually gets your brand cited in ChatGPT, Claude, and Perplexity answers. See our Reddit marketing and AI visibility services and pricing for what is included, browse case studies for proof, and book a strategy call when you want a done-for-you plan tailored to your stack. No software to learn, no setup on your end.

Related guides

Frequently Asked Questions

Want this run for you?

Reddit marketing services that turn posts into pipeline

We run the strategy, content, and reputation work for B2B and SaaS brands who want Reddit as a real growth channel — not a side experiment. See GrowReddit's managed Reddit marketing services or browse the playbooks below for your category.

Related Topics

llms.txt standardAI crawler accessGenerative engine optimizationAI search visibility

Explore more from GrowReddit

More posts you might enjoy