Skip to content
Meet llms.txt: The Treasure Map Helping AI Understand and Cite Your Website Correctly

Meet llms.txt: The Treasure Map Helping AI Understand and Cite Your Website Correctly

The dynamic of visiting websites has completely changed since ChatGPT became popular. Some of the sites that have been hit the most are specialized websites, such as Stack Overflow, where developers used to exchange coding tips and solutions. The site’s traffic has, since April of 2020, fallen by a whopping 90 percent. While Stack Overflow is a community project, other, more lucrative endeavours rightly feel threatened by the rise of using AI as a search engine. If a website does not appear in search results since the user used a chatbot to get its content, it will simply not get any clicks. Join us and see how llms.txt is set to help reverse this trend and keep your website indexed correctly.

What Even Is LLMs.txt?

LLMs.txt, sometimes called llms.txt, is a new proposed web-standard file that is set to help large language models easily navigate and use website content. Its general purpose is very similar to that of sitemap.xlm, which used to be crucial to search engines. But llms.txt will go beyond the scope of sitemaps thanks to nuances that it will borrow from robots.txt, specifically the concept of providing AIs with a curated and prioritized map of a site’s key pages that ought to be read first. You can think of it as a “treasure map” placed at the root directory of your website, with the treasure being your blog posts and other website content.

The idea of llms.txt came to AI researcher Jeremy Howard in September of 2024. Jeremy recognized the barriers that kept large language models from interpreting web content as:

·      Context Window Limitations – LLMs are not designed to decode websites; they need samples of text, not the usual branched code of a website.

·      HTML Noise – With code comes various features that are intended for humans, not AIs, including nav bars, headers, footers, and other visuals that an AI does not care for.

·      Scattered Documentation – Specialized websites often place API docs, help articles, and legal policies in various subdomains and/or formats, making the task of piecing them together difficult for an LLM.

The natural solution to such scattered information was to format it into a structured and clean llms.txt that would be easy for large language models to understand, thus giving the AI a better overview and reference for a website.

The Code

We will not bore you with the details too much, but we think that you ought to know that the llms.txt file is usually written in Markdown. The file's structure includes headings, backquotes, and descriptive paragraphs that help an AI interpret the content. A step further for those that really want to make sure that all of their content will be interpreted and/or cited by AI search results is a llms-full.txt, which simply concatenates markdown-converted pages into a singular document.

How It Will Cite

Many AI tools that actively crawl the internet for knowledge are now keen to cite their sources. Why is that? Well, users are aware of chatbot hallucinations that result in misleading answers. Having an AI cite its sources, therefore, gives them peace of mind.

Here is where your website will come into play. The Google Search of tomorrow will be an AI that returns a concise answer with cited sources. These sources will be various websites, including yours. But how will you be able to get to the top of the source list?

Rising To the Top

The very first step in the race to the top will be to create your very first llms.txt. The latter can be prepared by an average programmer, but we recommend that you consult SEO specialists or even AI-powered SEO tools for their insights.

Once you have an llms.txt file prepared and placed into your root directory, consider updating it on a regular basis. While it will take a while for the AI agent search engines of the future to find your site while crawling the internet, seeing regular activity will make them take notice of your site. A lively site is, after all, a site that stays on top of the latest news and is a citation-worthy reference. But if you really want to stay on top of the latest trends and keep your website among the best, follow our news updates and experiment with AI tools. The future is here, and there is no reason why you should not make your life easier with its artificial intelligence tools.

Where to Begin? 

If you’re exploring llms.txt and want to dive deeper, check out advanced AI workflow tools shaping the future of LLM operations. Explore Agenta, Orquesta LLM Ops, and Vellum AI on ToolPilot to optimise and manage your AI experiences with confidence.