← The Engine Log

The Recommendation Gap

The last two issues taught you to be cited. I need to take it further, because citation is no longer the win. An AI engine can footnote you as a source and still recommend your competitor in the sentence your buyer actually reads. That distance — between being cited and being recommended — is the Recommendation Gap, and it is the part almost no one is optimizing. The research says the brands that close it are not the biggest ones.

Linara Bozieva15 min read
Watercolor illustration: the Ravenopus stands alone in a bright spotlight on a stage while a machine voice speaks its name, and below the stage a long dim row of equally real books sits in a footer marked "Sources" — present, cited, but unspoken.

Two issues ago I named the unit: you no longer rank a page, you supply a quotable sentence, built as a Citable Unit that answers in its first line and survives being lifted out of context. Last issue I aimed that idea at the most authoritative content operation in marketing and showed the Authority Leak — how a domain can hold enormous authority and still be passed over for the quote.

Both of those issues were about one thing: getting cited. I need to take it further now, because I have come to think being cited is the easier half of the problem, and the half everyone is rushing to solve. There is a second gap above it that almost no one is working on, and it is the one that actually decides whose product gets bought.

Here is the uncomfortable version. An AI engine can pull your page, cite it in its sources, and then recommend your competitor in the paragraph your buyer reads. You will be in the footnotes and absent from the answer. You did the work to be retrievable, and the recommendation went to someone else anyway. I have watched it happen, and once you see it you cannot unsee it: the citation list is not the answer. The answer is the answer.

Citation and recommendation are two different decisions

When a generative engine answers "what's the best tool for X," it does not perform one act. It performs two, and they have different inputs.

The first act is retrieval: out of everything it could pull, which sources does it gather? This is the citation decision, and it is the one AEO optimizes — be liftable, be structured, be the self-contained claim, and you get pulled into the source set. Earn the footnote.

The second act is synthesis: out of everything it gathered, what does it actually say? Which brands get named in the recommendation, in what order, with what framing — hero, fine-option, cautionary aside, or not at all? This is a separate decision with separate inputs, and it is the one almost no one optimizes, because until very recently we did not have language for it.

You can win the first and lose the second on the same query at the same moment. The engine can gather your page, find it accurate, list it in eleven sources at the bottom — and build its actual recommendation around two other names it found more convincing. Retrieval got you into the room. Synthesis decided who got spoken. Most brands working on AI visibility are optimizing the room and ignoring the sentence.

That gap has a shape, and it deserves a name.

The Recommendation Gap

The Recommendation Gap is the distance between being cited by an AI engine and being recommended by it — when a generative engine lists you among its sources but does not name, describe, or recommend you inside the synthesized answer the user actually reads.

Notice what the gap is not. It is not failing to be retrieved; you were retrieved — that is what makes it a gap rather than a leak. It is not a ranking problem or even, strictly, a citation problem; the citation is right there in the list. The gap is one layer higher than the one I wrote about last time. The Authority Leak was about not being quoted. The Recommendation Gap is about being quoted and still not being recommended — cited as a source feeding an answer that sends the sale somewhere else. It is the difference between the model knowing your page exists and the model telling your buyer to use you. The first is table stakes now. The second is the whole game.

And here is why this is its own discipline and not a footnote to the last two issues.

AEO and GEO are not the same sport

There are two crafts here, and conflating them is why so much "AI search" work underperforms.

AEO — Answer Engine Optimization — is the work of getting cited. It is extractive: structure the page so the engine can retrieve and quote it, lead with the liftable answer, mark it with schema, make each passage stand alone. Everything I have written so far lives here. AEO gets you quoted.

GEO — Generative Engine Optimization — is the work of getting synthesized into the recommendation. The term is not mine; it comes from peer-reviewed research out of Princeton, presented at KDD 2024 — the first academic study to define and measure the discipline. GEO is generative, not extractive: it optimizes for inclusion and favorable framing inside the text the model generates, whether or not you are formally cited. GEO gets you recommended.

They overlap — being cited is often the on-ramp to being synthesized — but they are different decisions optimized by different moves, and the gap between them is exactly the Recommendation Gap. AEO closes the distance between ranking and citation. GEO closes the distance between citation and recommendation. If you only do the first, you build a brand that AI engines reliably cite and reliably decline to recommend, which is a strange and expensive place to end up.

So the real question is the one the research set out to answer: what actually moves the second decision?

What actually moves the recommendation

This is the part that should reorganize a content budget, and it is the part I find genuinely surprising, because it is not what a decade of SEO trained any of us to expect.

The Princeton/KDD study built a benchmark of real queries and ran nine candidate optimization methods through generative engines to see which ones measurably increased a source's visibility inside the generated answer — not its rank, not its citation count, its presence in the synthesized text. The methods that worked were not authority levers or keyword levers. They were content-substance levers:

  • Citing credible sources inside your own content was the single largest move — up to a 115 percent lift in generative visibility, and tellingly, its biggest gains went to lower-ranked content.
  • Adding statistics — concrete, quantified claims — lifted visibility by roughly 41 percent.
  • Adding quotations from credible voices lifted it by roughly 28 percent.
  • Across the methods, the average improvement landed near 40 percent.

And the thing that did not work is as important as the things that did: keyword stuffing and similar density tricks — the reflexes of the old game — did little to nothing. The generative engine is not counting your keywords. It is evaluating whether your passage reads like something it can stand behind: sourced, specific, quotable, authoritative in substance rather than in domain size.

Read that back slowly, because it inverts the strategy of the last fifteen years. You do not earn the recommendation by being the biggest voice in the room. You earn it by being the best-sourced sentence in the room. The model is rewarding how you write, not how large you are.

Why this is the underdog's channel

Here is the line in the research I cannot stop thinking about: the largest lever — citing credible sources — delivered its biggest gain to lower-ranked, lower-visibility content.

Sit with what that means. The single most powerful move in generative optimization is one that helps the challenger more than the incumbent. The engine is weighing the construction of the passage, not the authority of the domain behind it, so a two-person company that writes in well-sourced, statistic-dense, quotable claims can be synthesized into the recommendation on the same query where a competitor with a hundred times its backlinks is merely cited — or left out of the sentence entirely.

For fifteen years, distribution rewarded accumulation: outspend, out-publish, out-link the category until you were too big to skip. That game protected incumbents by design, because the asset was size and size compounds. GEO does not work that way. It is the first major distribution channel in years where the small, rigorous, well-sourced challenger has a structural advantage — where being smaller is not a tax you pay but, on the largest measured lever, an edge you hold. I find that genuinely exciting, and not abstractly. It is the reason a company like mine can write its way into answers that a category giant is paying an agency to merely get cited in.

The teardown

A method you only ever describe is a claim. So I built the instrument and ran it.

geo_audit.py is a measurement tool I wrote to make the Recommendation Gap visible. It takes the questions a real buyer types, runs them across the generative engines those buyers use — Perplexity, ChatGPT, Gemini, Claude — and scores, in the answers themselves, two things the old tools cannot see: share of voice (how often a brand is named inside the recommendation across the whole question set) and prominence (how early and how fully it appears when it is named). It scores the brand and its competitors against the same answers, so the comparison is exact. What comes back is not a metaphor for the gap — it is a measurement of it: how much of the recommendation each brand actually holds, and how that share moves as the question sharpens.

I picked two of the hardest possible cases for my own argument — not fading incumbents, but beloved, modern, AI-forward products nobody thinks of as vulnerable. Start with Notion. It is the productivity world's darling, the workspace a generation of operators swears by, the tool that promised to replace all your scattered apps with a single home for everything. If the Recommendation Gap only showed up on dying brands, Notion is exactly where it should fail to appear.

It appears anyway — steeply — and the way it appears is the lesson. Notion won by being horizontal: one workspace for docs, tasks, notes, and databases alike. But a generative engine answering a specific job does not reward breadth; it reaches for whoever owns that exact job. So the recommendation migrates as the question sharpens. Ask for "one app to organize your work and your life" and Notion is the answer, correctly. Ask for "a tool for agile project management," "for spreadsheets and data analysis," or "a CRM for a small sales team" — and Notion doesn't just slip, it disappears: zero engines in four name it, the recommendation handed whole to Jira, to a spreadsheet, to a dedicated CRM. Notion is perfectly capable of every one of those jobs; the engines simply don't recommend it for them. It keeps the recommendation only where it still owns the job outright — documentation, networked notes — and loses it everywhere a specialist has written more precisely about the exact task.

And this is not Notion's peculiar problem. Run the identical gradient on Canva — different category, different product, more than two hundred million users — and the curve is the same shape, only gentler. Canva is the answer for "an easy free tool to make social graphics," correctly, and slides toward a specialist (a CapCut, a Figma, an Affinity) as the job narrows to editing a Reel, designing an app interface, or replacing Photoshop — while holding firm where it genuinely is the right call: decks, logos, print, marketing images. Two unrelated generalists, in two unrelated categories, tracing the same curve. That is what makes this a law and not one brand's bad week: the breadth that built the moat is exactly what the synthesis layer discounts the moment intent gets specific.

Here are the receipts, captured 2026-07-01, across 23 buyer questions per brand arranged as a gradient — from generic ("easiest tool to make social graphics") to specific-job ("edit a Reel / design an app interface / a Photoshop alternative") — run across all four generative engines (Perplexity, ChatGPT, Gemini, Claude) with geo_audit.py. I ran it on two independent generalists — Notion first, then Canva — to check this is a pattern, not one brand's bad luck. It is a pattern.

The headline is the gradient, and it is the same shape for both brands. Notion holds a 100% share of voice on the generic questions and collapses to 38% on the specific-job questions. Canva, gentler, slides from 100% to 70%. The sharper the question, the less the generalist is named — and the cleaner the hand-off to whatever specialist owns that job. The table is the whole argument.

Query tier Notion — share of voice Canva — share of voice
Generic (the fair baseline) 100% 100%
Semi-specific 90% 95%
Specific job 38% 70%

Where the recommendation goes when it leaves the generalist: for Notion, to Jira (agile/sprints), Asana/ClickUp (team tasks at scale), a spreadsheet or Airtable (data), a dedicated CRM; for Canva, to CapCut/Descript (video), Figma (interfaces), Affinity/Photopea (photo editing). Both hold the jobs they genuinely own — Notion on documentation and networked notes; Canva on decks, logos, print and marketing images.

A few exhibits from the audit (point-in-time; generative outputs vary by user, session, and time; the API answer proxies the consumer app):

  • The fair baseline: "one app to organize your work and your life" → Notion named by all four engines. On the generic job, the generalist rightly wins.
  • The migration: "best tool for agile project management" → Notion named by zero of four engines; every one reaches for Jira. The generalist disappears the instant the job has a specialist.
  • Capability isn't recommendation: Notion ships spreadsheet-style databases and Canva ships video — yet Notion scores zero of four on "spreadsheets and data analysis" and Canva one of four on "edit a short-form video for Reels." Being able to do the job is not being recommended for it.

Two caveats stated up front, because a teardown that hides its limits is a hit piece, not a diagnostic. Generative answers vary by user, session, and time, so this is a point-in-time snapshot, not a constant. And an API query is a faithful proxy for the consumer apps but not a perfect mirror of them — which is why the honest unit here is the delta across re-runs and across competitors measured in the same pass, not any single absolute number.

Where this reaches its limit

The clean version of this argument overshoots, so let me pull it back to what the evidence supports.

Citation is not worthless — it is the on-ramp. Being retrieved and cited is usually how you become eligible to be synthesized in the first place, so GEO does not replace AEO; it sits one floor above it. The honest claim is not "stop trying to be cited." It is "being cited is necessary and no longer sufficient, and the sufficiency layer — the recommendation — is the one almost no one is measuring or working." Everything I wrote in the last two issues still holds. This issue is the floor above it, not a correction to it.

Two more limits. The foundational GEO research is from 2024 and was run against the engines of its moment; the models have moved since, the exact percentages will have drifted, and the right way to use those numbers is as direction and rank-order, not as guarantees — cite-sources beats statistics beats quotations beats keyword tricks, even if the precise lift on your content this quarter differs. And my own audit, like last issue's, is a snapshot read through an API proxy; it is directional evidence with the exhibits attached so you can check my read, not a benchmark to subscribe to.

What survives all of that hedging is the structural point, and it is not fragile: an AI engine decides what to cite and, separately, what to recommend; those are two different decisions; which brands win the recommendation is measurable, and it is not the same as which ones get cited; and the levers that close the gap reward substance over size in a way that, uniquely, favors the smaller and better-sourced. The percentages will age. The gap will not.

The recommendation is the prize now, not the citation and not the rank. Being cited gets your name into the sources. Being recommended gets your product into the sentence. The distance between the two is the Recommendation Gap, and right now it is sitting open and unmeasured on most of the brands in your category — including, very possibly, the ones quietly winning the answers you thought you owned.


You cannot close a gap you cannot see, and there is no rank tracker for this one — the recommendation happens inside generated prose, in a layer the old tools never read. The honest way to measure it is to ask the engines directly: put the questions your buyers actually type to ChatGPT, Perplexity, Gemini, and Google's AI Mode, systematically, and score not just whether you are cited but whether you are recommended — and who gets recommended instead. The 72-Hour Growth Diagnostic now runs exactly that audit on your domain: your share of voice inside the answers your category's buyers are reading today, the competitors capturing the recommendation you are only cited in, and the highest-leverage rewrites to move from footnote to sentence. Not a score to subscribe to; a map of the gap and the work to close it. The output is the proof.

Linara Bozieva, Founder, Ravenopus


In one paragraph, and a few common questions

The Recommendation Gap is the distance between being cited by an AI engine and being recommended by it — when a generative engine lists you among its sources but does not name or recommend you inside the synthesized answer the buyer reads. An engine makes two separate decisions when it answers: which sources to retrieve (citation, the work of AEO) and what to actually say (recommendation, the work of GEO — Generative Engine Optimization, a discipline defined in Princeton research at KDD 2024). You can win the first and lose the second. What closes the gap is not the old SEO playbook: the research found that citing credible sources (up to a 115 percent visibility lift, concentrated in lower-ranked content), adding statistics (about 41 percent), and adding quotations (about 28 percent) move generative visibility, while keyword density does little to nothing — which means the engine rewards how you write over how big you are, and the largest lever favors the smaller, better-sourced challenger. The gap lives inside the generated prose, so you measure it by querying the engines directly and scoring share of voice and prominence in the answers themselves, not by reading a rank tracker. Citation earns the footnote; structure and sourcing earn the sentence.

How is GEO different from AEO? AEO gets you cited — retrieved and quoted as a source. GEO gets you recommended — synthesized into the generated answer, named and framed favorably. They overlap, but optimizing only for citation builds a brand that engines reliably cite and reliably decline to recommend.

Why does a cited brand still lose the recommendation? Because retrieval and synthesis are different decisions. The engine can gather your page, find it accurate, list it in its sources, and still build its actual recommendation around names it found more convincing — better-sourced, more specific, more quotable in substance.

How do you close the Recommendation Gap? Write the levers the research rewards: cite credible sources inside your own content, lead with statistics and concrete numbers, include quotable expert claims, and make each passage self-contained — substance over keyword density. Then measure it directly, by engine, against your competitors.

Why is this the underdog's channel? Because the largest measured lever — citing sources — helps lower-ranked content most. The model weighs construction over domain size, so a small, rigorous, well-sourced challenger can be recommended on the same query where a much larger incumbent is only cited.

Linara Bozieva, Founder, Ravenopus

The Engine Log

More like this in your inbox.

Operational artifacts, real protocols, real numbers. Sent when something is worth sending.

If the queue diagnosis applies to your current setup and you want to see what an agency without queues actually delivers, the 72-Hour Diagnostic is the smallest commitment we offer.

See the 72-Hour Diagnostic →