Skip to content

Research Areas — 5 Focus Domains

The Open LLMO Research Initiative organizes its work into five research areas. Each area runs independently but eventually feeds into the metric set defined by the LLMOFramework Score.

AreaCore question
1. AI Citation AnalysisWhat content do LLMs cite, and under what conditions?
2. Grounding VisibilityHow do we make AI grounding sources visible?
3. LLM Retrieval OptimizationHow should documents be optimized for the LLM retrieval layer?
4. AI-native DocumentationWhat document formats do LLMs handle best?
5. Agent-oriented Information ArchitectureWhat information structures are easiest for AI agents to navigate?

Analysis of which content gets cited by LLMs (ChatGPT, Claude, Gemini, Perplexity) for a given topic. The observations cover citation frequency, structural features of cited documents, and the retrieval path that led to the citation.

  • How much do cited domains overlap across LLMs for the same topic?
  • Can we identify the structural features (heading hierarchy, tables, statistical density, external link count) of cited documents?
  • Can we build a post-hoc checklist for making content more likely to be cited?

Data collection for AI citation observation is underway. Phase 1 plan: ship Citation Visibility as a metric in the OSS llmo-checker.


Visualization of grounding for AI responses. Covers what an LLM relied on to produce an answer, and whether that source can be traced back to a verifiable primary reference.

  • Can a standard reverse-lookup method from AI response to source document be defined?
  • Does making grounding “visible” on a site (explicit sources, data references, citation formatting) correlate with higher AI citation rates?
  • Is hallucination correlated with weak grounding?

Already partially addressed as Citation Signals (the fifth component of the LLMO Framework). Phase 1 plan: quantify it as a Grounding Stability metric.


Document-side optimization for the LLM retrieval layer (RAG, embedding retrieval, web search plugins, etc.). Covers chunking strategy, semantic structure, document length, and heading design.

  • How does the relationship between chunk size and retrieval accuracy vary across topics?
  • What is the retrieval efficiency gap between Markdown, HTML, and JSON-LD?
  • How does internal link density contribute to context expansion in AI search?

llmoframework.com itself serves as an implementation reference. Phase 1 plan: publish a chunking comparison experiment.


Research on document formats that LLMs can read and write well. Covers llms.txt, Markdown conventions, and the optimal form of AI-targeted metadata.

  • Which LLMs and crawlers actually consult llms.txt?
  • Where is the optimal balance between retrieval efficiency and expressive power for Markdown versus HTML?
  • Does AI-targeted structured metadata (JSON-LD, etc.) affect citation rates?

llms.txt implementation and effect measurement are ongoing. Phase 1 plan: publish the llms.txt-validator OSS tool.


5. Agent-oriented Information Architecture

Section titled “5. Agent-oriented Information Architecture”

Research on information architecture for AI agents (Claude Code, Cursor, autonomous agents, etc.). Covers MCP (Model Context Protocol) exposure, API documentation design, and discoverability.

  • Do sites that expose MCP servers have an advantage in AI search visibility?
  • Are agent-readable API docs (OpenAPI + natural language) more discoverable than plain API references?
  • Can we establish methods for observing autonomous agent exploration behavior?

Experiments on the impact of MCP exposure on search visibility are underway. Phase 1 plan: propose a preliminary Agent Visibility metric.


AreaPhase 1 planned deliverable
AI Citation AnalysisCitation Visibility metric in llmo-checker
Grounding VisibilityGrounding Stability metric + evaluation dataset
LLM Retrieval OptimizationChunking comparison experiment report
AI-native Documentationllms.txt-validator OSS
Agent-oriented IAPreliminary Agent Visibility metric

Progress on each area is published in the Changelog and GitHub Issues.