llms.txt proposed website standard to make LLM work cheaper

2026-06-07

llms.txt and optional llms-full.txt files are part of a proposed standard aimed at making LLM website scrapping/analysis cheaper and more accurate - fewer tokens spent, smaller context required.

Instead of distilling essential information from HTML/JS heavy webpage, the bot would access a plaintext page that has all the information given outright. Sounds good, but the reality isn't as good-looking with limited adoption and impact.

The llms.txt file structure

The llms.txt file is a text file that uses Markdown formatting for its contents. The file should be placed in the root folder of the website (resulting in the website.com/llms.txt link). The general template looks like so:

# Title

Site description

## Section name

- [Link title](https://link_url): Key section URL
- [Link title](https://link_url): Key section URL
- [Link title](https://link_url): Key section URL

The purpose is to put the site title, description, and links to key pages of your website - like pricing, case studies, contact, etc.

The easiest way to generate such a file is to give your website URL to Gemini (or another capable LLM) and tell it to generate one, then fine-tune it as needed and done.

The llms-full.txt is an optional text file that should contain the full contents of your page in one file.

You can read more about the proposed standard on llmstxt.org.

Adoption of the standard

On the websites side there are two trackers: llmstxt.site and directory.llmstxt.cloud that list websites using one or both files. You can take a look at their contents.

On the LLM side, the adoption is on a somewhat questionable level. There was some traffic tracking of llms.txt files, and it turns out that only 10% of LLM traffic interacted with the file.

The llms.txt standard is portrayed as the big thing for GEO - Generative Engine Optimization which is website optimization for LLMs. In reality, Google, ChatGPT, Perplexity, or Claude don't really take this file into account.

For one, companies already have pipelines parsing HTML and can parse and index a webpage quite easily. The other - this file is controlled by the website, and its contents may not represent the website content accurately - like meta tags, it can be abused, and so it's going to be ignored when it comes to SEO and GEO.

We also have a standard that helps with website discovery and indexing. Sitemaps list pages and website structure, while robots.txt controls access, which LLM crawlers also tend to respect.

There are some cases where llms.txt makes sense, and that's documentation, code-heavy websites. It can be used to help Claude Code and other similar tools quickly grasp the library interface and capabilities so they can write working code or give a correct answer based on documentation.

Comment article