Skip to content

2. Structural Formatting

Structural Formatting is the practice of organizing your content using machine-readable formats and semantic markup so that AI systems can efficiently parse, categorize, and extract information.

AI systems don’t “read” pages like humans do. They process structured data far more reliably than free-form text. Proper structure means your content is more likely to be correctly interpreted and cited, rather than misunderstood or overlooked.

Structure content with proper headings (h1-h6), lists, tables, and semantic elements. Avoid using visual formatting (bold, font size) as a substitute for structural hierarchy.

Add Schema.org markup to your pages. At minimum, include:

  • Organization or Person for your identity
  • Article or WebPage for content pages
  • FAQPage for Q&A content

Create a /llms.txt file at your domain root following the llms.txt standard. This gives AI systems a concise, machine-friendly summary of your site.

Use a clear information architecture: broad categories → specific topics → detailed content. Mirror this in your URL structure and heading hierarchy.

When presenting comparisons, features, or specifications, use proper HTML/Markdown tables rather than prose descriptions.

Don’t inject every entity on every page. Common LLMO setups put Organization, Person, Service[], Book[], MusicGroup, and FAQPage into a shared layout, which means your 404 page, your privacy page, and every blog post all carry the same payload. AI systems read this as semantic noise — entities are advertised on pages that have nothing to do with them.

The cleaner pattern is a two-tier scope:

TierEntitiesWhere
Site-wideOrganization, WebSite, Person (founder/author)Shared layout <head>
Page-relevantService[], Book[], MusicGroup, FAQPage, ItemList, BreadcrumbList, ArticleThe page that the entities are about

A homepage with services and FAQ gets Service[] + FAQPage. A book catalog page gets Book[]. A music project page gets MusicGroup. The 404 page and the privacy page get only the site-wide tier.

Use stable @id values (https://example.com/#org, #founder, #website) so per-page entities can reference site-wide ones without re-declaring them.

A common silent failure: you write <script slot="head" type="application/ld+json"> in a page, but the layout has no matching <slot name="head" /> declaration. The framework drops the script with no warning. Your structured data never reaches users — and you don’t know.

Make output verification part of your build:

Terminal window
# Count JSON-LD blocks per page in the dist output
for p in dist/*/index.html dist/index.html; do
n=$(grep -oE '<script type="application/ld\+json">' "$p" | wc -l)
echo "$p: $n block(s)"
done
# Validate that each block parses
python3 -c "
import re, json, sys
with open('dist/index.html') as f: html = f.read()
for m in re.finditer(r'<script type=\"application/ld\+json\">(.+?)</script>', html, re.DOTALL):
try: json.loads(m.group(1))
except: sys.exit('INVALID JSON-LD: ' + m.group(1)[:200])
print('OK')
"

For a stronger guarantee, use the Schema.org Validator and Google’s Rich Results Test on representative pages. CI can call these via API.

❌ Unstructured:

We offer three plans. The basic plan costs $10 and includes 5 users. The pro plan costs $25 and includes 20 users. The enterprise plan is custom priced with unlimited users.

✅ Structured:

PlanPriceUsers
Basic$10/mo5
Pro$25/mo20
EnterpriseCustomUnlimited
  • Pages use proper heading hierarchy (h1 → h2 → h3)
  • JSON-LD structured data is present on key pages
  • An llms.txt file exists at the domain root
  • Content uses lists and tables where appropriate
  • URL structure reflects content hierarchy
  • Site-wide layout emits only Organization / WebSite / Person; page-relevant entities are scoped to their pages
  • Each entity has a stable @id for cross-page reference
  • Build pipeline verifies JSON-LD blocks emit and parse on every page (no silent drops)
  • Representative pages pass Google Rich Results Test and Schema.org Validator