← Command Centre · SEO / Content

Content Gap Audit — Skill Build & Methodology v1 SHIPPED

A compliance-aware Surfer SEO alternative, built and tested 2026-05-14 → 16. Session 46. This page is the full write-up: the thinking, the decisions, what was built, where it lives, and how to use it.

00TL;DR

The question: Surfer SEO scrapes top-ranking pages and tells a writer what entities/topics their draft is missing. Can we do the same with our own stack instead of paying for Surfer?

The answer: Yes — and better. Surfer is content-agnostic; it has no idea Brainzyme® copy has a banned-term gate. We built a pipeline that does the same job and routes risky terms intelligently. It costs ~$0.30–0.50 per article vs Surfer's £59–99/month — breakeven at ~3 articles/month.

What shipped: a new skill, str-content-gap-audit, plus a head-to-head test proving it beats a real Surfer pass on compliance, topic coverage, and article structure.

01The Journey — what we were thinking

This started as a question and turned into a build. The thinking evolved at each step, and two AI reviewers (Codex 5.5 and Gemini 2.5 Pro) materially changed the design.

02Key Decisions

DecisionWhat we choseWhy
Separate skill or merge into str-seo-audit? Separate skill Different job: str-seo-audit is technical/site diagnosis; str-content-gap-audit is content/page generation. Bundled by orchestrator, not by skill.
Compliance flag design 4-value handling_strategy, not a binary ban Gemini observer: a binary flag conflates “Brainzyme can't claim X” (true) with “the article can't mention X” (false, SEO-harmful).
Entity extraction Map-reduce (per page, then merge) Codex: single-shot loses per-page attribution and lets one long page dominate. Map-reduce gives an auditable competitor count per entity.
Metrics Computed in code Codex: never trust an LLM's word count. Word/heading counts are len(re.findall(...)).
Cross-tie / discoverability Related Skills blocks + orchestrator phase You raised that workflows don't always call the right tool. Solved with explicit cross-links in both SKILL.md files + an auto-run phase in mkt-content-pipeline — not a new Context-Matrix column (too invasive for a 40-row table).
Stack DataForSEO + Firecrawl + Gemini 2.5 Pro ~$0.30–0.50/article vs Surfer £59–99/mo. All three already in the connections registry with service contracts.
The compliance trade-off, resolved. The binary strip was an over-correction. An SEO article can say “many nootropic formulas rely on caffeine” — it just cannot say “Brainzyme® contains caffeine.” The handling strategy is what holds that line: reframe the term, position Brainzyme as the alternative, never imply the brand claim.

03A/B/C Test — Surfer vs Our Pipeline

One article — “Buying Nootropics UK?” — three versions, same metric battery.

MetricTab 1
Original
Tab 2
Surfer
Tab 3
Ours
Winner
Brainzyme-banned terms (lower better) 46420 Ours
Compliant entity coverage (of 21) 81420 Ours
Structural sections present (of 8) 458 Ours
Costmanual£59–99/mo ~$0.30–0.50Ours
The headline finding. Surfer's rewrite kept all 42 banned-for-Brainzyme terms that the draft already had — and added two more. It cannot know about them; the banned-copy gate and the 4-class lexicon are proprietary infrastructure Surfer has no access to. That is the moat.
Honest caveats. (1) n=1 — this is one article, a data point not a proof. Validate across several before trusting the pipeline as a system. (2) The prototype's binary strip drove banned terms to a literal 0; the v1 skill's handling-strategy will instead show a small non-zero count of editorially-reframed mentions — which is correct, not a regression. (3) Gemini's independent review rated our article the better read too — “not even close” — on structure and scannability, not just metrics.

04What We Built — the str-content-gap-audit skill

A self-contained pipeline. Given a draft article and a target keyword, it returns a gap report: what the top-ranking competitors cover that the draft is missing, and how to add it without breaching the banned-copy gate.

The 8-stage pipeline

#StageToolWhat it does
1SERP fetchDataForSEO RESTTop-10 organic + People Also Asked
2Intent classificationheuristic regex Labels each result editorial / shop / product / forum. Picks editorial scrape targets. Flags mixed-intent SERPs.
3Competitor scrapeFirecrawl RESTTop-N pages → clean markdown
4Metricscode (regex)Word / heading / image counts — computed, never asked of an LLM
5Extraction — MAPGemini 2.5 ProPer-page entity list with placement + mention count
6Extraction — REDUCEGemini 2.5 ProMerge, dedupe, cluster, rank-weight across pages
7Handling-strategy routingdeterministic mapEach term → include / reframe / define / avoid
8Gap reportcodeJSON + human-readable markdown

The handling strategy — the part Surfer cannot do

StrategyMeaningExample terms
includeSafe, on-brand — add freely ashwagandha, ginseng, vitamin B12, adaptogens
reframe_as_alternative Use the term editorially; position Brainzyme as the alternative; never imply the brand contains itcaffeine, lion's mane
define_and_differentiate Name it to explain legal/category status; do not target it piracetam, modafinil, smart drugs
avoid Never appears, even editorially — ASA hard banneurodivergent

Grey-listed terms (reframe / define) carry needs_review: true — a human compliance owner signs off the actual wording against the canonical-messaging hybrid before publish. The skill recommends; it does not self-authorise grey-area copy.

05Where The Tools Live

WhatPath
The skill folderF:/Agentic-OS/.claude/skills/str-content-gap-audit/
Skill definition (triggers, steps, cross-links)…/SKILL.md
The pipeline script…/scripts/content_gap_audit.py
Methodology + Surfer comparison…/references/methodology.md
API auth setup…/references/api-setup.md
Prototype artifacts — A/B/C report, 3 article versions, gap audits F:/Agentic-OS/.tmp-drive-pull/ (scratch — not committed)
Service contracts (auth, failure modes) F:/Agentic-OS/reference/services/{dataforseo,firecrawl,gemini}.md

Registered in: AGENTS.md (Skill Registry + Context Matrix), README.md, context/learnings.md, reference/tool-map.md — tool-map drift validator passes (41 skills in sync).

06URLs

WhatURL
This dashboardapps.nutritionalproducts.org/content-gap-audit/
Command Centre homeapps.nutritionalproducts.org
On-Page Audit dashboard (has the new “Decisions Needed” tab — Tab 11) apps.nutritionalproducts.org/onpage-audit/
SEO Master Sheet (session 46) docs.google.com/…/1y3pgPgdpVxO14
Related work this session — the Decisions tab. The On-Page Audit dashboard gained a Tab 11 — “Decisions Needed”: 12 decision blocks (schema deploy path, P0/P1 compliance fixes, legal-approved copy fields) with a Send to Claude button that pushes your answers straight to F:/Agentic-OS/inbox/onpage-audit/. End-to-end verified.

07How To Use It

Just ask Claude

The skill triggers on natural phrasing. Any of these route to it:

“content gap audit” · “what is my article missing” · “compare my draft to competitors” · “surfer alternative” · “why is my blog losing to competitors” · “audit this draft against top-ranking pages”

Or run it directly

python F:/Agentic-OS/.claude/skills/str-content-gap-audit/scripts/content_gap_audit.py \
  --keyword "buying nootropics uk" \
  --draft path/to/draft.md \
  --market uk \
  --top-n 5 \
  --out-dir projects/str-content-gap-audit/my-article/

Re-running? Pass --serp-json and/or --scraped-dir to reuse saved artifacts and skip the paid API calls.

What you get back

gap-audit.md (human view) + gap-audit.json (machine view): ranked entities to add, semantic clusters, missing structural sections, People-Also-Asked questions, a competitor word-count benchmark, and the handling strategy for every risky term.

The workflow

  1. Run the audit on your draft + target keyword.
  2. Check the mixed-intent warning — if the SERP is mostly shop pages, the signal is noisier; treat recommendations as directional.
  3. Apply the include entities and missing_sections freely.
  4. For reframe / define terms — apply the strategy, then a human signs off the wording (these carry needs_review).
  5. Run the rewrite through check_copy_against_canonical.py before publish.
In the content pipeline. If you run the full mkt-content-pipeline orchestrator, this audit now runs automatically as Phase 3.5 for any SEO article — you don't have to invoke it by hand.

08What Skills Are

A skill is a self-contained capability folder under .claude/skills/. Each has a SKILL.md — YAML frontmatter (name + trigger phrases) and a body (steps, dependencies, references). When you describe a task, Claude matches it against every skill's trigger phrases and loads the matching one. Skills make Claude do a job the same reliable way every time, instead of improvising.

This skill's category

str- = Strategy. Siblings: str-seo-audit (technical SEO), str-ai-seo, str-programmatic-seo, str-schema-markup, str-campaign-strategy, str-trending-research.

The cross-tie

You flagged that workflows don't always reach for the right tool. Three fixes wired in:

1. Related Skills blocks in both str-seo-audit and str-content-gap-audit — each names the other with a routing rule.
2. Orchestrator phasemkt-content-pipeline calls the audit automatically (Phase 3.5).
3. Registry rows — AGENTS.md + tool-map keep the index honest.

The routing rule, in one line: site fault (indexing, meta, speed) → str-seo-audit. Thin / shallow page content → str-content-gap-audit. Run both for the full picture.

09What's Not Done — v1.1 candidates

ItemWhy it mattersPriority
Gemini 503 retry-onceOne transient API failure in the test run — the pipeline degraded gracefully (skipped the page), but a retry would be cleaner. low
From-scratch mode (no draft yet) Currently requires --draft. A competitor-analysis-only mode would suit briefing a writer before the first draft exists. medium
Validate beyond n=1The head-to-head is one article. Run it on 5–10 more before trusting the pipeline as a system.medium
Live content scoreSurfer scores as you type. This is a batch audit; re-run after each rewrite to see movement.backlog