Why does Foglift flag deeply nested schemas?

AI engines tokenize JSON-LD differently than search crawlers. Schemas nested more than five levels deep are often partially extracted, with deeper branches dropped. The same applies to over-stuffed keyword lists (more than 20 entries) and descriptions longer than 600 characters, which AI citation panels truncate. Google's syntax check does not flag any of these patterns because they are valid JSON-LD; they just underperform during AI ingestion.

Structured Data Tester | AI Pickup Validator

What this tool actually checks

Standard structured data testers (including Google's Rich Results Test) answer one question: is the JSON-LD syntactically valid and does it have the fields needed for Google rich results. That is necessary, not sufficient. AI engines tokenize JSON-LD differently. They can ingest schema that Google flags AND skip schema that Google passes. This tool layers an AI Pickup Score on top of the syntax check, so you see both: the legacy SEO verdict, and the AI ingestion verdict.

The 5 AI Pickup dimensions

Identity

Does the schema have a name or headline AI engines can quote in citations. Sounds obvious. We see it missing constantly on Article schemas where the dev set headline only via metadata.

Entity disambiguation

url plus sameAs (Wikipedia, Wikidata, social URLs) plus a stable @id. AI engines need to reconcile this entity to known knowledge. Without it, you are a string, not an entity.

Freshness (content types)

Recent dateModified or datePublished. Zyppy / Digital Bloom IQ, 2025: content updated within 30 days gets 3.2x more AI citations. AI engines treat stale content as less citable.

Nested-entity hygiene

The pattern most testers miss. Nested Person, Organization, Brand, Product, Place, and LocalBusiness entities need name or @id. {"@type": "Person"} alone is invisible to AI engines.

Citation/content richness

Type-aware: Article gets points for citation, mentions, and about. FAQ gets points for 3+ Q&A pairs. Product gets points for aggregateRating, review, offers, brand. Organization gets points for description, sameAs, logo, contact.

Risk flags (separate)

Beyond the score: deeply nested schemas (depth >5), over-stuffed keyword lists (>20), descriptions longer than 600 chars (truncated in citation panels), string-only authors. These do not subtract from the score but are surfaced as warnings.

How AI engines actually use structured data

ChatGPT, Claude, Perplexity, and Google AI Overviews each parse JSON-LD on ingestion. The parsing is lossy. They look for a small set of high-signal patterns:

•Entity reconciliation. sameAs pointing to Wikipedia or Wikidata is the strongest signal. It connects your schema to the AI's training-time knowledge graph.
•Q&A extraction. FAQPage schemas are the single highest-cited type in AI answer panels because they pre-format question-answer pairs that match the prompt-response shape.
•Trust signals. Product schemas with aggregateRating and reviewCount get surfaced in comparison answers. Without them, you are not in the comparison.
•Authority chains. Article.citation referencing CreativeWork or ScholarlyArticle gives AI engines a verifiable source path. Most blogs ignore this field. AI engines reward it.

Frequently Asked Questions

How is this different from Google's Rich Results Test?

Google's tester answers a binary question: will Google render rich results from this schema. Foglift answers a different question: will AI engines pick up this schema when they crawl the page. The two checks overlap on syntax (valid JSON, required fields present), but diverge on what they reward. Google's tester does not flag unnamed nested entities, over-nesting, or sparse citation metadata. Foglift does, because those are the patterns that AI engines silently deprioritize during ingestion. Use both. Google's tester is necessary for traditional rich results. Foglift's is necessary for AI citations.

What does the AI Pickup Score actually measure?

Five dimensions, 20 points each. Identity (does the schema have a name AI can quote). Entity disambiguation (url + sameAs + @id, so AI can reconcile to known knowledge). Freshness, for content types (recent dateModified). Nested-entity hygiene (do nested Person, Organization, Brand entities have name or @id). Citation richness (type-aware: Article expects citation/mentions/about; FAQ expects 3+ Q&As; Product expects rating/review/offers/brand; Organization expects description/sameAs/logo/contact). The score is local to this tool, not the same as the Foglift Website Audit's AI Readiness Score, which evaluates the whole page across additional signals.

Why is over-nesting a problem if the JSON is valid?

AI engines do not parse JSON-LD the same way Google's structured data parser does. Schemas nested more than five levels deep get partially extracted: deeper branches are dropped or summarized. The same applies to over-stuffed fields: keywords lists with more than 20 entries, descriptions longer than 600 characters (citation panels truncate around 200 to 400 chars), and FAQs with more than 25 questions (typically only the first 10 to 15 are ingested). All of these are syntactically valid. Google's tester passes them. AI engines silently deprioritize them.

What does an unnamed nested entity look like in practice?

A common shape: an Article with author set to {"@type": "Person"} and no name field. Syntactically valid. Google accepts it. AI engines, however, cannot extract a Person entity that has no label, so the authorship signal is silently lost. Same pattern with brand under Product, publisher under Article, location under Event. Foglift's tester walks the schema tree, finds these unnamed Person, Organization, Brand, Product, Place, and LocalBusiness nodes, and counts them against your AI Pickup Score.

Which schema types should I add first if I'm starting from zero?

Three to start. Organization (or LocalBusiness) on the homepage with name, url, sameAs, logo, description. Article or BlogPosting on every blog post with headline, author as a Person object (not a string), datePublished, dateModified, and a citation field if you can populate it. FAQPage on any page with question/answer content (this is the highest-cited type in AI answer panels). Then layer in Product, BreadcrumbList, and HowTo as relevant.

Can I have multiple schemas on one page?

Yes, and you should. A typical post combines Article, Organization, BreadcrumbList, and FAQPage. Each can live in its own application/ld+json script tag, or be combined under @graph. The AI Pickup Score grades each schema independently and then computes a site-wide average, so weak schemas pull the page-level score down even if individual schemas pass. Fix the lowest-scoring schemas first.

Schema GeneratorBuild AI-search-aware JSON-LD with the same Pickup scoring on the output.Structured Data Testing GuideRead the guide.

What this tool actually checks

The 5 AI Pickup dimensions

Identity

Does the schema have a name or headline AI engines can quote in citations. Sounds obvious. We see it missing constantly on Article schemas where the dev set headline only via metadata.

Entity disambiguation

url plus sameAs (Wikipedia, Wikidata, social URLs) plus a stable @id. AI engines need to reconcile this entity to known knowledge. Without it, you are a string, not an entity.

Freshness (content types)

Recent dateModified or datePublished. Zyppy / Digital Bloom IQ, 2025: content updated within 30 days gets 3.2x more AI citations. AI engines treat stale content as less citable.

Nested-entity hygiene

The pattern most testers miss. Nested Person, Organization, Brand, Product, Place, and LocalBusiness entities need name or @id. {"@type": "Person"} alone is invisible to AI engines.

Citation/content richness

Risk flags (separate)

How AI engines actually use structured data

ChatGPT, Claude, Perplexity, and Google AI Overviews each parse JSON-LD on ingestion. The parsing is lossy. They look for a small set of high-signal patterns:

•Entity reconciliation. sameAs pointing to Wikipedia or Wikidata is the strongest signal. It connects your schema to the AI's training-time knowledge graph.
•Q&A extraction. FAQPage schemas are the single highest-cited type in AI answer panels because they pre-format question-answer pairs that match the prompt-response shape.
•Trust signals. Product schemas with aggregateRating and reviewCount get surfaced in comparison answers. Without them, you are not in the comparison.
•Authority chains. Article.citation referencing CreativeWork or ScholarlyArticle gives AI engines a verifiable source path. Most blogs ignore this field. AI engines reward it.

Frequently Asked Questions

How is this different from Google's Rich Results Test?

What does the AI Pickup Score actually measure?

Why is over-nesting a problem if the JSON is valid?

What does an unnamed nested entity look like in practice?

Which schema types should I add first if I'm starting from zero?

Can I have multiple schemas on one page?

Structured Data AI Pickup Validator

What this tool actually checks

The 5 AI Pickup dimensions

Identity

Entity disambiguation

Freshness (content types)

Nested-entity hygiene

Citation/content richness

Risk flags (separate)

How AI engines actually use structured data

Frequently Asked Questions

Related

Structured Data AI Pickup Validator

What this tool actually checks

The 5 AI Pickup dimensions

Identity

Entity disambiguation

Freshness (content types)

Nested-entity hygiene

Citation/content richness

Risk flags (separate)

How AI engines actually use structured data

Frequently Asked Questions

Related