The question: Surfer SEO scrapes top-ranking pages and tells a writer what entities/topics their draft is missing. Can we do the same with our own stack instead of paying for Surfer?
The answer: Yes — and better. Surfer is content-agnostic; it has no idea Brainzyme® copy has a banned-term gate. We built a pipeline that does the same job and routes risky terms intelligently. It costs ~$0.30–0.50 per article vs Surfer's £59–99/month — breakeven at ~3 articles/month.
What shipped: a new skill, str-content-gap-audit, plus a
head-to-head test proving it beats a real Surfer pass on compliance, topic coverage, and
article structure.
This started as a question and turned into a build. The thinking evolved at each step, and two AI reviewers (Codex 5.5 and Gemini 2.5 Pro) materially changed the design.
str-content-gap-audit with every Codex MEDIUM and
Gemini's handling-strategy baked in. Tested end-to-end, fixed one regex bug, registered
and cross-tied it into the skill system.| Decision | What we chose | Why |
|---|---|---|
Separate skill or merge into str-seo-audit? |
Separate skill | Different job: str-seo-audit is technical/site diagnosis;
str-content-gap-audit is content/page generation. Bundled by orchestrator,
not by skill. |
| Compliance flag design | 4-value handling_strategy, not a binary ban |
Gemini observer: a binary flag conflates “Brainzyme can't claim X” (true) with “the article can't mention X” (false, SEO-harmful). |
| Entity extraction | Map-reduce (per page, then merge) | Codex: single-shot loses per-page attribution and lets one long page dominate. Map-reduce gives an auditable competitor count per entity. |
| Metrics | Computed in code | Codex: never trust an LLM's word count. Word/heading counts are
len(re.findall(...)). |
| Cross-tie / discoverability | Related Skills blocks + orchestrator phase |
You raised that workflows don't always call the right tool. Solved with explicit
cross-links in both SKILL.md files + an auto-run phase in mkt-content-pipeline
— not a new Context-Matrix column (too invasive for a 40-row table). |
| Stack | DataForSEO + Firecrawl + Gemini 2.5 Pro | ~$0.30–0.50/article vs Surfer £59–99/mo. All three already in the connections registry with service contracts. |
One article — “Buying Nootropics UK?” — three versions, same metric battery.
| Metric | Tab 1 Original | Tab 2 Surfer |
Tab 3 Ours | Winner |
|---|---|---|---|---|
| Brainzyme-banned terms (lower better) | 46 | 42 | 0 | Ours |
| Compliant entity coverage (of 21) | 8 | 14 | 20 | Ours |
| Structural sections present (of 8) | 4 | 5 | 8 | Ours |
| Cost | manual | £59–99/mo | ~$0.30–0.50 | Ours |
str-content-gap-audit skillA self-contained pipeline. Given a draft article and a target keyword, it returns a gap report: what the top-ranking competitors cover that the draft is missing, and how to add it without breaching the banned-copy gate.
| # | Stage | Tool | What it does |
|---|---|---|---|
| 1 | SERP fetch | DataForSEO REST | Top-10 organic + People Also Asked |
| 2 | Intent classification | heuristic regex | Labels each result editorial / shop / product / forum. Picks editorial scrape targets. Flags mixed-intent SERPs. |
| 3 | Competitor scrape | Firecrawl REST | Top-N pages → clean markdown |
| 4 | Metrics | code (regex) | Word / heading / image counts — computed, never asked of an LLM |
| 5 | Extraction — MAP | Gemini 2.5 Pro | Per-page entity list with placement + mention count |
| 6 | Extraction — REDUCE | Gemini 2.5 Pro | Merge, dedupe, cluster, rank-weight across pages |
| 7 | Handling-strategy routing | deterministic map | Each term → include / reframe / define / avoid |
| 8 | Gap report | code | JSON + human-readable markdown |
| Strategy | Meaning | Example terms |
|---|---|---|
| include | Safe, on-brand — add freely | ashwagandha, ginseng, vitamin B12, adaptogens |
| reframe_as_alternative | Use the term editorially; position Brainzyme as the alternative; never imply the brand contains it | caffeine, lion's mane |
| define_and_differentiate | Name it to explain legal/category status; do not target it | piracetam, modafinil, smart drugs |
| avoid | Never appears, even editorially — ASA hard ban | neurodivergent |
Grey-listed terms (reframe / define) carry needs_review: true — a human
compliance owner signs off the actual wording against the canonical-messaging hybrid before
publish. The skill recommends; it does not self-authorise grey-area copy.
| What | Path |
|---|---|
| The skill folder | F:/Agentic-OS/.claude/skills/str-content-gap-audit/ |
| Skill definition (triggers, steps, cross-links) | …/SKILL.md |
| The pipeline script | …/scripts/content_gap_audit.py |
| Methodology + Surfer comparison | …/references/methodology.md |
| API auth setup | …/references/api-setup.md |
| Prototype artifacts — A/B/C report, 3 article versions, gap audits | F:/Agentic-OS/.tmp-drive-pull/ (scratch — not committed) |
| Service contracts (auth, failure modes) | F:/Agentic-OS/reference/services/{dataforseo,firecrawl,gemini}.md |
Registered in: AGENTS.md (Skill Registry + Context Matrix), README.md,
context/learnings.md, reference/tool-map.md — tool-map drift
validator passes (41 skills in sync).
| What | URL |
|---|---|
| This dashboard | apps.nutritionalproducts.org/content-gap-audit/ |
| Command Centre home | apps.nutritionalproducts.org |
| On-Page Audit dashboard (has the new “Decisions Needed” tab — Tab 11) | apps.nutritionalproducts.org/onpage-audit/ |
| SEO Master Sheet (session 46) | docs.google.com/…/1y3pgPgdpVxO14 |
F:/Agentic-OS/inbox/onpage-audit/. End-to-end verified.The skill triggers on natural phrasing. Any of these route to it:
python F:/Agentic-OS/.claude/skills/str-content-gap-audit/scripts/content_gap_audit.py \
--keyword "buying nootropics uk" \
--draft path/to/draft.md \
--market uk \
--top-n 5 \
--out-dir projects/str-content-gap-audit/my-article/
Re-running? Pass --serp-json and/or --scraped-dir to reuse saved
artifacts and skip the paid API calls.
gap-audit.md (human view) + gap-audit.json (machine view): ranked
entities to add, semantic clusters, missing structural sections, People-Also-Asked questions,
a competitor word-count benchmark, and the handling strategy for every risky term.
include entities and missing_sections freely.reframe / define terms — apply the strategy, then a
human signs off the wording (these carry needs_review).check_copy_against_canonical.py before publish.mkt-content-pipeline orchestrator, this audit now runs automatically as
Phase 3.5 for any SEO article — you don't have to invoke it by hand.A skill is a self-contained capability folder under
.claude/skills/. Each has a SKILL.md — YAML frontmatter
(name + trigger phrases) and a body (steps, dependencies, references). When you describe a task,
Claude matches it against every skill's trigger phrases and loads the matching one. Skills make
Claude do a job the same reliable way every time, instead of improvising.
str- = Strategy. Siblings: str-seo-audit (technical SEO),
str-ai-seo, str-programmatic-seo, str-schema-markup,
str-campaign-strategy, str-trending-research.
You flagged that workflows don't always reach for the right tool. Three fixes wired in:
1. Related Skills blocks in both str-seo-audit and
str-content-gap-audit — each names the other with a routing rule.
2. Orchestrator phase — mkt-content-pipeline calls the
audit automatically (Phase 3.5).
3. Registry rows — AGENTS.md + tool-map keep the index honest.
str-seo-audit. Thin / shallow page content →
str-content-gap-audit. Run both for the full picture.| Item | Why it matters | Priority |
|---|---|---|
| Gemini 503 retry-once | One transient API failure in the test run — the pipeline degraded gracefully (skipped the page), but a retry would be cleaner. | low |
| From-scratch mode (no draft yet) | Currently requires --draft. A competitor-analysis-only mode would suit
briefing a writer before the first draft exists. |
medium |
| Validate beyond n=1 | The head-to-head is one article. Run it on 5–10 more before trusting the pipeline as a system. | medium |
| Live content score | Surfer scores as you type. This is a batch audit; re-run after each rewrite to see movement. | backlog |