Stop Treating Schema Like a Database: How AI Really Uses Structured Data

📖 In This Issue

Featured Snippets: (News & Resources)
Cover Story: Stop Treating Schema Like a Database: How AI Really Uses Structured Data
Learn This: Semantic Similarity

📰 Featured Snippets (News & Resources)

Kevin Indig released a user behavior study that shows how consumers search for high-stakes purchases in AI Mode.

Glenn Gabe writes about Google testing jumping to AI Mode from AI Overviews over at SER.

Thomas McKinlay over at “Science Says“ tells us that AI-driven traffic is still mostly hype. Not surprising for most of us tracking it internally.

John Collison and Elad Gil recently interviewed Google CEO Sundar Pichai, about the future of AI at Google, and how that will impact search.

Stop Treating Schema Like a Database: How AI Really Uses Structured Data

Structured data is one of the few SEO inputs that feels like it should transfer cleanly into the LLM era. It’s explicit. It’s standardized. It’s machine-readable on purpose.

But it doesn’t transfer cleanly.

The same JSON-LD can end up in front of very different “machines” with very different incentives and failure modes. Search engines treat structured data like a claim you’re making about a page. LLM experiences often treat it like extra context at best, and like noise at worst. That difference matters when you’re the one responsible for rollout quality, governance, and the blast radius of mistakes.

So the practical question for in-house teams isn’t “does schema matter for AI?” It’s “where does it matter, how does it get used, and what breaks when we scale it?”

This piece maps the overlap and the gap: what structured data does for search engines, what it does indirectly (and sometimes directly) for LLM experiences, and what assumptions teams should stop inheriting.

Schema.org: not just “for SEO,” but invented by search engines

Schema.org is the defining authority on structured data standards for the open web. It’s not a random community standard that search engines later adopted. It was launched as a joint initiative by major search engines to create a common vocabulary for structured data markup.

That origin story matters because it explains the default mental model most SEOs still operate with: “If I mark it up, the engine will understand it.” In classical search, that model is directionally right. In LLM surfaces, it’s often incomplete.

How search engines actually process structured data today

Modern search engines tend to process structured data in two distinct ways.

First, structured data is used as a reliable source of information for rich results. This is the bread-and-butter use case: product details, review stars, breadcrumbs, FAQs, events, recipes, it drives enhanced SERP presentation driven by markup. Google is unusually explicit here: it uses structured data to understand content and show it with a richer appearance, and it publishes feature-specific requirements and policies tied to eligibility.

Second, structured data can help map relationships between content and entities, supporting the engine’s ability to connect “this page” to “this thing” in a broader system (knowledge graph, entity understanding, disambiguation). This is the part SEOs tend to over-romanticize, because it feels like building a canonical database. The reality is more conditional: structured data can help, but the engine decides what to trust and when to use it. Even Google support docs regularly reinforce that eligibility does not guarantee inclusion.

In other words, schema in search is a negotiated contract. You provide explicit claims. The engine decides whether it trusts those claims enough to show enhancements and connect entities.

Why LLMs don’t “store schema” the way search engines do

LLMs work differently than search engines in the parts that matter here: storage, retrieval, and how meaning gets represented.

A search engine can index fields and treat structured data as a semi-structured input it can validate against known rules. An LLM is not indexing your JSON-LD into a neat table. In many LLM experiences, the model is either (a) generating from internal parameters, (b) retrieving documents, or (c) grounding on live web/search results and then generating with those inputs in context.

That’s why “we added schema, why didn’t ChatGPT/Bing/AI Overviews reward us?” is the wrong diagnostic frame. You’re assuming the system is taking your markup as a canonical fact source. LLM surfaces simply aren’t built that way.

The two ways structured data can still influence LLM experiences

Structured data can matter for LLM visibility, but the influence is more indirect than most teams assume; and it comes with different failure modes.

SERP grounding and the data the LLM is allowed to see

Many AI answer experiences lean on web search results as part of grounding, which means the model is heavily influenced by what appears in the results page and the documents it retrieves. Microsoft is explicit that its search experiences (including Copilot) involve discovering and surfacing content across search experiences, and it also offers web search mechanisms via Bing APIs in Copilot/agent contexts.

Here’s the structured data connection: if your markup earns rich-result enhancements, you’ve changed the inputs that appear on a results page. Review stars, product attributes, breadcrumbs, and other enhancements can become part of the “surface area” the model (or the retrieval system feeding it) uses for context.

You don’t need to claim the LLM is “reading JSON-LD” for this to be true. If the LLM is grounded on search results, and your structured data changes what those results contain or how they’re interpreted, you’ve influenced the grounding layer.

Bing has even started exposing AI citation performance reporting for publishers, explicitly tying visibility to being used as references in AI answers and to “grounding query phrases.” That’s not schema-specific reporting, but it’s a strong signal that “classic search visibility inputs” and “AI answer inputs” overlap operationally.

The upside: traditional structured data work can continue to pay dividends in AI contexts because the pipeline still runs through search.

The risk: you can accidentally optimize the wrong layer. You might chase markup coverage while your actual content is thin, inconsistent, or not competitive; so you win eligibility but not selection.

As an extra layer of on-page context during content extraction

The second influence is on-page. When an LLM reads a web page, it typically extracts text from the HTML (sometimes with simplification) and turns it into embeddings: a numeric representations used for retrieval and similarity. If your JSON-LD contains descriptive text fields (like names, descriptions, offers, attributes, relationships), that content may get “seen” again during extraction.

This is the “second chance” effect: the same core facts appear in the visible page copy and in a structured block. When systems build representations from what they can extract, repetition and consistency can reinforce meaning.

But there’s a catch: this only helps when the markup is consistent with visible content and doesn’t introduce contradictions. If your JSON-LD says a product is in stock and the page copy says “sold out,” you’ve created ambiguity. Search engines may ignore the enhancement; retrieval systems may embed conflicting statements; the LLM may pick whichever is closer to the prompt. None of those outcomes are what an in-house team wants under load.

Why understanding these differences gives you an edge

If structured data can influence both search engines and LLM experiences, why obsess over the differences?

Because “same input” does not mean “same behavior.”

Search engines have tight validation, known feature requirements, and clear incentives around user trust and SERP integrity. They can choose not to show rich results even when your markup is valid.

LLM experiences have different constraints. They may be grounded on search results (where schema impacts presentation). They may retrieve and embed page content (where schema can reinforce or confuse meaning). They may cite sources (where being selected as a reference is the real prize). And they may change quickly as product teams tune systems.

A deep understanding of those differences gives you leverage over teams still stuck in acronym wars. “SEO and GEO are the same thing” is the kind of statement that sounds decisive and leads to brittle strategy. Infrastructure work survives system changes. Folk theories don’t.

What you need to know:

Structured data still matters. But its value is no longer a single straight line from “add JSON-LD” to “get benefit.”

In search, schema is a negotiated contract: you provide explicit claims; the engine decides whether it trusts them enough to show enhancements and connect entities.

In LLM surfaces, schema is more like a routing and clarity layer. It can improve what the model picks up from SERP grounding by shaping rich-result inputs. It can reinforce on-page meaning when content is extracted and embedded. But it won’t be treated as a canonical database the way some teams assume, and it can introduce new ambiguity when it drifts from visible content.

The win for in-house teams isn’t chasing a new acronym war. It’s building structured data that is consistent, verifiable, and aligned with what a user can actually see; because consistency is what survives system changes.

Treat schema as infrastructure. Do it to reduce ambiguity and increase trust across systems, not because you expect any one platform to “reward” you forever.

Learn This:

Semantic Similarity: A measure of how closely related two pieces of text are in meaning.

Learn More

One more thing: AI is only as good as it’s operator, and if you are reading this newsletter, you are better than most!

Till next time,

Joe Hall

PS: Let me know what you think of this issue, or anything else here: [email protected]

Stop Treating Schema Like a Database: How AI Really Uses Structured Data

📖 In This Issue

📰 Featured Snippets (News & Resources)

Stop Treating Schema Like a Database: How AI Really Uses Structured Data

Schema.org: not just “for SEO,” but invented by search engines

How search engines actually process structured data today

Why LLMs don’t “store schema” the way search engines do

The two ways structured data can still influence LLM experiences

SERP grounding and the data the LLM is allowed to see

As an extra layer of on-page context during content extraction

Why understanding these differences gives you an edge

What you need to know:

Learn This:

Joe Hall

Keep Reading

ashla.ai™