📖 In This Issue

Featured Snippets: (News & Resources)
Cover Story: Creating Content For LLMs: To Chunk, Or Not To Chunk?
Operator of Interest: Jenny Halasz
Learn This: How LLMs Actually Generate Text

📰 Featured Snippets (News & Resources)

Microsoft is likely to abandon OpenAI as financial fears grow. They will likely develop their own AI models which should have massive impact on Bing and the rest of their AI dependent suit.

Profound provides a detailed report of the types of social media content often cited by ChatGPT. Perfect for learning more about building brand citations and visibility in LLMs.

CloudFlare announced that they will start serving HTML pages to AI agents in Markdown format when requested via HTTP.

Bing unveiled an “AI Performance Dashboard” inside it’s Webmaster Tools. The news left many SEOs asking, why not Google?

Creating Content For LLMs: To Chunk, Or Not To Chunk?

If you’ve been in SEO long enough, you’ve seen this movie.

A new system shows up. A new “optimization” trend forms around it. Then Google (or someone adjacent) says: please stop doing that. And half the industry hears “it doesn’t work,” while the other half hears “it works so well they’re begging us not to.”

That’s basically where chunking sits right now.

Googler, Danny Sullivan, has explicitly pushed back on the idea of turning content into “bite-sized chunks” as a tactic for LLM-driven surfaces.

And to be fair, he’s warning against a real failure mode: teams creating two versions of content (one for humans, one for machines), then eating the operational cost when the system shifts.

But there’s a second reality that doesn’t go away just because we wish it would:

AI search systems increasingly select, extract, and recombine passages, not pages. That means the unit of “being understood” is often smaller than the webpage you publish.

So the actual question isn’t “should we chunk?” It’s: should we make our content resilient when it gets pulled apart?

What is chunking (really)?

Chunking is one of those terms that got overloaded.

1) Chunking as a system operation (what RAG does)

In Retrieval Augmented Generation (RAG) pipelines, chunking is the act of breaking content into retrievable components so the system can fetch only the most relevant passages for a query.

This isn’t hypothetical. Google has discussed passage-level behavior for years (often described as “passage indexing” / passage ranking).

2) Chunking as a content practice (what you control)

In the SEO community, “chunking” usually means structuring content so passages and statements perform better when retrieved, without turning the page into a chopped-up mess.

But Mike King reminds us, chunking isn’t “chopping paragraphs into smaller paragraphs and using more headings and hoping for the best.”

That distinction matters, because most chunking advice fails by confusing “format” with meaning.

Here’s the honest answer: yes, sometimes, and also, it can absolutely waste your time.

The “yes” case: why it can work

A paragraph that covers two ideas is often a worse candidate than two paragraphs that each cover one idea. That’s not “writing for robots.” That’s just making each passage a clean unit of meaning.

Modern systems evaluate relevance at the passage level, and chunking lets you intentionally shape content where relevance is actually computed.

The “no” case: where teams get burned

Sullivan’s warning is basically about incentives: when people optimize for a perceived system quirk, they create fragile work that doesn’t age well.

And you can see the failure mode in the wild: Pages that read like an exploded FAQ. Chunks that are missing context, so they’re easy to retrieve and easy to misinterpret. Layouts that prioritize extraction over persuasion.

Chunking helps when it produces clarity.
Chunking hurts when it produces fragmented language.

Chunking plus content structure is the key

The practical move is to stop thinking “How do I break this up?” and start thinking:

How do I structure this so each passage survives retrieval without losing meaning?

Because even if models get bigger context windows, the system still has constraints: cost, latency, routing, summarization, and selection. Structure remains important because it creates units of meaning.

That matches what we see in AI search interfaces: your page is less a destination and more a dataset.

So you need structure that serves: Humans (clarity, trust, action), and Machines (extractability, coherence, unambiguous claims). Not two versions. One version that survives both.

Best practices for content structure (so you’re extractable and credible)

The goal is not “more better units of meaning”.

1) One idea per paragraph (but keep the context inside the paragraph)

Short, focused paragraphs reduce the chance the model extracts the wrong part, or skips you because your answer is buried.

A good paragraph can be quoted out of context and still be correct.

2) Use a consecutive heading hierarchy (H1 → H2 → H3)

Think of headings as retrieval scaffolding. LLMs and readers both use hierarchy to understand relationships between ideas.

If everything is an H2 (or worse) everything looks like flat.

3) Frontload the definition / takeaway

Don’t hide the lead. Put the definition, constraint, or key recommendation early, then expand with nuance.

This is good writing.

Lists, steps, tables, “common mistakes,” “when to do X vs Y.” These formats reduce ambiguity.

If you want to be cited, make it easy to quote without distortion.

5) Use e takeaway,” “Common mistake,” and “In summary” can help models identify what a passage is doing.

If your brand voice hates those phrases, fine. You can still add the section with different clear direct langue.

6) Reduce retrieval noise (DOM clutter is a content problem now)

Pop-ups, modal CTAs, and disjointed carousels can pollute what the model sees, even if users close them.

7) Don’t treat schema as a substitute for structure

Schema still matters. But it’s not a magic bullet.

Google’s own guidance continues to emphasize structured data as useful, especially when it matches visible content.

Don’t forget this: Structure and clarity first; Structured Markup reinforces, it doesn’t rescue.

Carolyn Shelby drives this point home: “This is why poorly structured content – even if it’s keyword-rich and marked up with schema – can fail to show up in AI summaries, while a clear, well-formatted blog post without a single line of JSON-LD might get cited or paraphrased directly.“

A defensible policy you can hand to your team

Here’s a simple rule that won’t age badly:

Optimize for meaning that survives extraction. Not for “LLM tricks.”

Guardrails:

No “LLM version” vs “human version.” That’s debt. (And AI multiplies existing quality or existing debt.)
Any chunk you create must be independently accurate, with its own context and constraints.
If structuring the page makes it worse to read, you’re not optimizing, you’re trading trust for a gamble.

Takeaway: chunking isn’t the strategy. It’s the symptom.

Chunking is what systems do to content. Your job is to make sure your meaning doesn’t break when that happens.

So yes, structure for passage-level content is good for the same reason you do most good SEO infrastructure work:

it reduces ambiguity,
it improves comprehension,
it scales without drama,
and it still makes sense six months later.

That’s the only kind of optimization worth defending.

👤 Operator of Interest: Jenny Halasz

Jenny Halasz

Known for: Author of AI-Powered Content Marketing and SEO
Works at: New Media Advisors
Follow: LinkedIn

Learn This: How LLMs Actually Generate Text

One more thing: AI is only as good as it’s operator, and if you are reading this newsletter, you are better than most!

Till next time,

Joe Hall

PS: Let me know what you think of this issue, or anything else here: [email protected]

Creating Content For LLMs: To Chunk, Or Not To Chunk?

📖 In This Issue

📰 Featured Snippets (News & Resources)

Creating Content For LLMs: To Chunk, Or Not To Chunk?

What is chunking (really)?

1) Chunking as a system operation (what RAG does)

2) Chunking as a content practice (what you control)

The “yes” case: why it can work

The “no” case: where teams get burned

Chunking plus content structure is the key

Best practices for content structure (so you’re extractable and credible)

1) One idea per paragraph (but keep the context inside the paragraph)

2) Use a consecutive heading hierarchy (H1 → H2 → H3)

3) Frontload the definition / takeaway

5) Use e takeaway,” “Common mistake,” and “In summary” can help models identify what a passage is doing.

6) Reduce retrieval noise (DOM clutter is a content problem now)

7) Don’t treat schema as a substitute for structure

A defensible policy you can hand to your team

Takeaway: chunking isn’t the strategy. It’s the symptom.

👤 Operator of Interest: Jenny Halasz

Learn This: How LLMs Actually Generate Text

Joe Hall

Keep Reading

ashla.ai™