Loading workspace insights... Statistics interval
7 days30 daysLatest CI Pipeline Executions
c388e1bd Improved llms.txt and llms-full.txt content for AI consumers (#28678)
`/llms.txt` and `/llms-full.txt` are a machine-readable view of a Ghost
site's public content for AI/LLM tooling, sitting alongside the
per-post/page `.md` representations. This change reworks **what those
two files contain** so they're genuinely useful to the tools that read
them, rather than technically-complete files that nothing ingests. It's
a content-design change only; the feature stays behind the `llmsTxt`
flag and nothing here changes whether, or to whom, the files are served.
The design follows from research into how these files are actually
consumed. A few principles did the deciding, and each one is why the
resulting behaviour is what it is:
**Curated, not exhaustive.** Listing every URL on a site is already the
sitemap's job, and an `llms.txt` that mirrors the sitemap is large,
costly to generate, and — from the evidence we reviewed — not something
consumers actually ingest. The value is a curated index a model can read
in a single pass. So `llms.txt` is bounded to a ~50KB budget (pages,
then newest posts until full) and links to the sitemap for the complete
archive rather than trying to be it.
**Bounded by size, not by post count.** A count-based cap can't protect
against a site whose posts are 40,000-word essays. Both files are
bounded by a byte budget so the output stays ingestible regardless of
individual post length, and when a file is deliberately cut short a
truncation note says so — so a consumer doesn't wrongly infer the site
only has N posts.
**Discoverability belongs in the file, not in headers.** We looked at
`Link` / `X-Llms-Txt`-style discovery headers and found nothing reliably
reads them. So the path to clean Markdown is stated inside both files,
where a consumer reading the index will actually see it — and every
index entry already links to the `.md` form, which both lands the reader
on clean Markdown and demonstrates the "append `.md`" convention by
example.
**Reflect the real public surface, and respect gating the same way the
site does.** A members-only or paid post's existence and excerpt are
already public, so the files should reflect that: those posts now appear
in the index with their public excerpt, and in `llms-full.txt` their
bodies are cut off to an excerpt-plus-notice. This mirrors exactly how
the rest of the public site treats gated content, and it's achieved by
browsing published content and letting the Content API's existing gating
strip the bodies — not by re-implementing visibility rules in this
service.