--- name: content-researcher description: Research, outline, draft, and optimize SEO web content (service pages, blog posts, product pages) against Cora SEO reports. Create new content. Entity, LSI, and keyword density optimization. Generate entity test blocks (hidden divs). --- # Content Research & Creation Skill Write and optimize SEO web content — service pages, blog posts, product pages, landing pages. Covers the full pipeline: competitor research, outline, drafting, and quantitative optimization against a Cora SEO report (XLSX). --- ## Invocation Use this skill when the user asks to write, research, outline, draft, or optimize web content. Common triggers: - "Write a service page about [topic]" - "Let's work on the [topic] page" - "Create content about [topic] for [company]" - "I have a Cora report for [keyword]" - "Optimize this page against the Cora report" - "Help me build an outline for [topic]" - "Research [topic] and write an article" - Any mention of writing web pages, blog posts, or SEO content for a website **Routing logic — ask two questions up front:** 1. "Do you have a Cora report (XLSX) for this keyword?" 2. "Do you have existing content to optimize?" (could be a URL to a live page, pasted text, or a file path) | Cora report? | Existing content? | Start at | |--------------|-------------------|----------| | No | No | Phase 1, Step 1 (full research → draft workflow) | | Yes | No | Phase 1, Step 1 (research → outline using Cora targets → draft → optimize) | | Yes | Yes | Phase 2, Step 6 (load Cora, optimize existing content) | | No | Yes | Ask user to generate the Cora report first — optimization without Cora targets is guesswork | **Existing content from a URL:** If the user provides a URL to a live page (e.g. their WordPress site), **always use the BS4 competitor scraper** to pull the content — never `web_fetch`. The `web_fetch` tool runs content through an AI summarization layer that loses heading structure, drops sections, and can hallucinate product details. The scraper returns the actual HTML heading hierarchy and verbatim text. ```bash cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "URL" --output-dir ./working/competitor_content/ ``` Read the output file, then use the scraped heading structure and body text to build `./working/draft.md`. Preserve the original text verbatim — do not paraphrase or summarize product descriptions, specifications, or technical details. Only restructure headings and add entity/LSI terms where needed for optimization. The user does NOT need to paste or save the content manually. --- ## Phase 1: Research & First Draft ### Step 1 — Topic Input Collect from the user: - **Required:** Topic or keyword - **Optional:** Competitor URLs to examine, industry context, pasted research they've already done, target audience - **For service pages:** Company name, what services/capabilities they actually offer, what they do NOT offer. This prevents writing claims about capabilities the company doesn't have. Ask explicitly: "Is this a service page? If so, what does the company offer and what should I avoid mentioning?" For informational/educational articles, company details are less critical — the content is about the topic, not the company. For service pages, company context is mandatory before drafting. If the user provides their own research (pasted text, notes, URLs), use that as the primary input. Do not redo research the user has already done. ### Step 2 — Competitor Research Research what competitors are publishing on this topic. Three modes depending on user input: **Mode A — Claude researches (default):** Use `web_search` to find the top competitor content for the topic. Use the BS4 competitor scraper (not `web_fetch`) to read the most relevant 5-10 results — this preserves accurate heading structure and verbatim text. Focus on: - What subtopics they cover - How they structure their content (H2/H3 breakdown) - What angles or claims they make - What they leave out (gaps) **Mode B — User provides URLs:** If the user gives specific URLs, use the competitor scraper to bulk-fetch them: ```bash cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py URL1 URL2 URL3 --output-dir ./working/competitor_content/ ``` Then read the output files and analyze them. **Mode C — User provides research:** If the user pastes in research, notes, or analysis, skip scraping and work from what they gave you. **Output:** Write a research summary covering: 1. Common themes across competitors (what everyone covers) 2. Content structure patterns (how they organize it) 3. Key entities, terms, and concepts mentioned repeatedly 4. Gaps — what competitors miss or cover poorly 5. Potential unique angles Save the research summary to `./working/research_summary.md`. ### Step 3 — Build Outline Using the research summary, build a structured outline: 1. **Generate fan-out queries** — Before structuring the outline, generate 10-15 search queries you would use to thoroughly research this topic. These are the natural "next searches" someone would run after the primary keyword — questions, comparisons, material/process specifics, use-case queries. Examples for "cnc swiss screw machining": - "what is swiss screw machining" - "swiss screw machining vs cnc turning" - "swiss machining tolerances" - "what materials can be swiss machined" - "swiss screw machining for medical devices" - "when to use swiss machining vs conventional lathe" These queries represent the search cluster around the topic. The more of them the content answers, the more authoritative it becomes across related searches. 2. **Cover the common ground** — Include the themes that all/most competitors address. Missing these makes content look incomplete. 3. **Identify 1-2 unique angles** — Find something competitors are NOT covering well. This is the content's differentiator. 4. **Shape H3 headings from fan-out queries** — Map the strongest fan-out queries to H3 headings. Headings that match real search patterns give the content more surface area across the query cluster. A heading like "What Materials Can Be Swiss Machined?" is better than "Materials" because it mirrors how people actually search. 5. **Structure for scanning** — Use clear H2 sections with H3 subsections. Each H2 should address one major subtopic. 6. **Include notes on each section** — Brief description of what goes in each section and why. Consult `references/content_frameworks.md` for structural templates (how-to, listicle, comparison, etc.) and select the best fit for the topic. **IMPORTANT: YOU NEED A CORA REPORT BEFORE building the outline.** The Cora report provides: - Heading count targets (H2, H3 counts) that shape the outline structure - Entity lists that inform heading names (pack entity terms into H2/H3 headings) - Word count targets that determine section depth - Structure targets (entities per heading level, variations per heading level) that guide how keyword-rich headings should be If the user has not yet provided the Cora XLSX, **ask for it before proceeding with the outline.** Research can happen without Cora, but the outline should not be built without it. Save the outline to `./working/outline.md`. ### Step 4 — HUMAN REVIEW (STOP AND WAIT) **Present the outline to the user and ask:** > "Here's the outline based on the research. Review it and let me know: > 1. Any sections to add, remove, or reorder? > 2. Are the unique angles worth pursuing? > 3. Any specific points or data you want included? > 4. Anything else before I draft?" **Do NOT proceed until the user responds.** This is a critical gate. Incorporate all feedback before moving on. ### Step 5 — Write First Draft Write the full content based on the approved outline: - Follow the structure exactly as approved - Consult `references/brand_guidelines.md` for voice and tone guidance - Write in clear, scannable paragraphs (max 4 sentences per paragraph) - Use subheadings every 2-4 paragraphs - Include lists, examples, and concrete details where appropriate - Aim for the word count the user specified. **Fan-Out Query (FOQ) Section:** After the main content, write a separate FOQ section using the fan-out queries from the outline. This section is **excluded** from word count and heading count targets — it lives outside the core article. - Each FOQ is an H3 heading phrased as a question - Answer in 2-3 sentences max, self-contained - **Restate the question in the answer** — this is the format LLMs and featured snippets prefer for citation: "How does X work? X works by..." - The user may style these as accordions, FAQ schema, or hidden divs - Mark the section clearly (e.g. ``) so it's easy to separate from the main content Save the draft to `./working/draft.md`. Tell the user: "First draft is ready. If you have a Cora report for this keyword, provide the XLSX path and I'll optimize against it. Otherwise, let me know what changes you'd like." --- ## Phase 2: Cora Optimization This phase begins when the user provides a Cora XLSX report. The draft may come from Phase 1, or the user may provide an existing draft to optimize. ### Step 6 — Load Cora Report Parse the Cora XLSX and display a summary of targets: ```bash cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet summary ``` Show the user: - Search term and keyword variations - Entity count and deficit count - LSI keyword count and deficit count - Word count target (cluster target, not raw average) - Density targets (variation, entity, LSI) - Key optimization rules that will be applied ### Step 7 — Entity Optimization Run the entity optimizer against the draft: ```bash cd {skill_dir}/scripts && uv run --with openpyxl python entity_optimizer.py "{draft_path}" "{cora_xlsx_path}" --top-n 30 ``` Review the output and apply the top recommendations: - Focus on entities with high relevance AND high remaining deficit - Add entities naturally — they must fit the context of the section - Prioritize adding entities to H2 and H3 headings first (these are primary optimization targets) - Do NOT force entities where they don't make sense — readability always wins - H1: exactly 1, always. Do not add a second H1. - H5, H6: ignore completely - H4: only add if most competitors have them After applying entity changes, save the updated draft. ### Step 8 — LSI Keyword Optimization Run the LSI optimizer: ```bash cd {skill_dir}/scripts && uv run --with openpyxl python lsi_optimizer.py "{draft_path}" "{cora_xlsx_path}" --min-correlation 0.2 --top-n 50 ``` Apply LSI keyword recommendations: - Focus on keywords with strongest correlation (highest absolute value = most ranking impact) - Many LSI keywords are common phrases that may already appear naturally - Add missing keywords in body text, not just headings - Some LSI keywords overlap with entities — count these once, benefit twice After applying LSI changes, save the updated draft. ### Step 9 — Structure & Density Check Check the overall structure against Cora targets: ```bash cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet structure --format json cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet densities --format json ``` Verify and adjust: - **Heading counts:** Compare H1, H2, H3, H4 counts against Page 1 Average targets. Add or consolidate headings as needed. - **Entities per heading level:** Check that each heading level has enough entity mentions vs. the Structure sheet targets. - **Variations in headings:** Ensure keyword variations appear in H2/H3 headings at target levels. - **Density targets:** Check variation density, entity density, and LSI density against the Strategic Overview percentages. - **Word count:** Compare against the cluster target (NOT the raw average). If below target, identify which sections could be expanded. **Important density note:** Adding content to meet one target changes the denominator for ALL density calculations. After significant word count changes, re-check densities. Usually 1-2 optimization passes are sufficient. ### Step 10 — Keyword Density Check (Optional) If a quick keyword density check is useful: ```bash cd {skill_dir}/scripts && uv run --with openpyxl python seo_optimizer.py "{draft_path}" --cora-xlsx "{cora_xlsx_path}" ``` Key rules: - Exact match keyword density: 2% minimum, no upper limit - Variations capture exact match — hitting variation density targets covers exact match - Do NOT flag keyword stuffing. There is no practical upper limit that hurts rankings. ### Step 11 — Meta Title, Meta Description, and URL Slug Generate meta tags and add them as an HTML comment block at the top of the draft file. **Meta title format:** Pack keyword variations into a pipe-separated title tag. Google reads far more than the ~60 characters it displays — a long title tag with variations gives the page more surface area across related searches. You can go up to 500 characters but do not have to. Format: `Exact Search Term | Variation 1 | Variation 2 | ... | Company Name` Use the keyword variations from the Cora report. Only include variations that have a page1_avg > 0 (competitors actually use them). Put the highest-value variations first. **Meta description:** Write a keyword-rich summary (~350-500 characters) that hits the primary keyword, key variations, materials, sizes, and company name. This is not just a copy of the intro paragraph — it should be independently optimized. **URL slug:** Short, keyword-focused. Example: `/custom-spun-hemispheres` Add to the top of the draft file: ```html ``` ### Step 12 — Image & Diagram Placement Read through the draft md file and identify where visuals would enhance the content: For each recommendation, specify: - **Location:** After which heading or paragraph - **Type:** Photo, diagram, chart, infographic, screenshot, illustration - **Description:** What the visual should show - **Rationale:** Why it adds value at that point (breaks up text, illustrates a process, makes data tangible, etc.) Common placement triggers: - Sections describing a process or workflow (diagram) - Sections with comparative data (chart or table) - Long text-only stretches (break up with a relevant image) - Technical concepts that benefit from visual explanation (diagram) - Before/after scenarios (side-by-side images) ### Step 13 — HUMAN REVIEW (STOP AND WAIT) **Present the final draft, optimization summary, and image suggestions to the user:** > "Here's the optimized draft. Summary of changes: > - [X] entities added across [Y] sections > - [X] LSI keywords incorporated > - Word count: [current] (target: [target]) > - Variation density: [current]% (target: [target]%) > - Entity density: [current]% (target: [target]%) > - [X] image/diagram placements suggested > > Review the draft. What needs adjusting?" **Do NOT finalize until the user approves.** ### Step 14 — HTML Export After the user approves the draft, convert the markdown to plain HTML for WordPress. Save as `./working/draft.html` (or `draft_normal.html`, `draft_storybrand.html` if multiple versions exist). Rules: - **Plain HTML only** — no classes, no divs, no wrappers. Just `
`, `
` tags)
### Step T7 — Run Validation Script (Programmatic)
Run `test_block_validate.py` for a deterministic before/after comparison:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_validate.py "{content_path}" {cwd}/working/test_block.md "{cora_xlsx_path}" --format json -o {cwd}/working/validation_report.json
```
This produces a report showing every metric before and after, with targets and status:
- Word count, distinct entities, entity density %, variation density %, LSI density %
- Heading counts (H2, H3), entities/variations in headings
- List of all new 0->1 entities introduced
- All numbers are from the same counting code — no mixing of data sources
Present the validation report to the user. Flag any metric that dropped below target after the test block was added.
---
## Optimization Rules
These override any data from the Cora report:
| Rule | Detail |
|------|--------|
| H1 count | Exactly 1, always |
| H2, H3 | Primary optimization targets — focus entity/variation additions here |
| H4 | Low priority — only add if most competitors have them |
| H5, H6 | Ignore completely |
| Word count | Target the nearest competitive cluster, not the raw average. Up to ~1,500 words is always acceptable even if the target is lower. |
| Exact match density | 2% minimum, no upper limit |
| Keyword stuffing | Do NOT flag or warn about keyword stuffing |
| Variations include exact match | Optimizing variation density inherently covers exact match |
| Density is interdependent | Adding content changes ALL density calculations — re-check after big changes |
| Optimization passes | 1-2 passes is typically sufficient |
| Competitor names | NEVER use competitor company names as entities or LSI keywords. Do not mention competitors by name in content. |
| Measurement entities | Ignore measurements (dimensions, tolerances, etc.) as entities — skip these in entity optimization |
| Organization entities | Organizations like ISO, ANSI, ASTM are fine — keep these as entities |
| Entity correlation filter | Only entities with Best of Both <= -0.19 are included. Best of Both is the lower of Spearman's or Pearson's correlation to ranking position (1=top, 100=bottom), so more negative = stronger ranking signal. This filter is applied in `cora_parser.py` and affects all downstream consumers. To disable, set `entity_correlation_threshold` to `None` in `OPTIMIZATION_RULES`. Added 2026-03-20 — revert if entity coverage feels too thin. |
---
## Scripts Reference
All scripts are in `{skill_dir}/scripts/`. Run them with `uv run --with openpyxl python` (or `--with requests,beautifulsoup4` for the scraper).
### cora_parser.py
Foundation module. Reads a Cora XLSX and extracts structured data.
```
uv run --with openpyxl python cora_parser.py