---
name: content-researcher
description: Research, outline, draft, and optimize SEO web content (service pages, blog posts, product pages) against Cora SEO reports. Create new content.  Entity, LSI, and keyword density optimization. Generate entity test blocks (hidden divs).

---

# Content Research & Creation Skill

Write and optimize SEO web content — service pages, blog posts, product pages, landing pages. Covers the full pipeline: competitor research, outline, drafting, and quantitative optimization against a Cora SEO report (XLSX).

---

## Invocation

Use this skill when the user asks to write, research, outline, draft, or optimize web content. Common triggers:

- "Write a service page about [topic]"
- "Let's work on the [topic] page"
- "Create content about [topic] for [company]"
- "I have a Cora report for [keyword]"
- "Optimize this page against the Cora report"
- "Help me build an outline for [topic]"
- "Research [topic] and write an article"
- Any mention of writing web pages, blog posts, or SEO content for a website

**Routing logic — ask two questions up front:**

1. "Do you have a Cora report (XLSX) for this keyword?"
2. "Do you have existing content to optimize?" (could be a URL to a live page, pasted text, or a file path)

| Cora report? | Existing content? | Start at |
|--------------|-------------------|----------|
| No | No | Phase 1, Step 1 (full research → draft workflow) |
| Yes | No | Phase 1, Step 1 (research → outline using Cora targets → draft → optimize) |
| Yes | Yes | Phase 2, Step 6 (load Cora, optimize existing content) |
| No | Yes | Ask user to generate the Cora report first — optimization without Cora targets is guesswork |

**Existing content from a URL:** If the user provides a URL to a live page (e.g. their WordPress site), **always use the BS4 competitor scraper** to pull the content — never `web_fetch`. The `web_fetch` tool runs content through an AI summarization layer that loses heading structure, drops sections, and can hallucinate product details. The scraper returns the actual HTML heading hierarchy and verbatim text.

```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "URL" --output-dir ./working/competitor_content/
```

Read the output file, then use the scraped heading structure and body text to build `./working/draft.md`. Preserve the original text verbatim — do not paraphrase or summarize product descriptions, specifications, or technical details. Only restructure headings and add entity/LSI terms where needed for optimization. The user does NOT need to paste or save the content manually.

---

## Phase 1: Research & First Draft

### Step 1 — Topic Input

Collect from the user:
- **Required:** Topic or keyword
- **Optional:** Competitor URLs to examine, industry context, pasted research they've already done, target audience
- **For service pages:** Company name, what services/capabilities they actually offer, what they do NOT offer. This prevents writing claims about capabilities the company doesn't have. Ask explicitly: "Is this a service page? If so, what does the company offer and what should I avoid mentioning?"

For informational/educational articles, company details are less critical — the content is about the topic, not the company. For service pages, company context is mandatory before drafting.

If the user provides their own research (pasted text, notes, URLs), use that as the primary input. Do not redo research the user has already done.

### Step 2 — Competitor Research

Research what competitors are publishing on this topic. Three modes depending on user input:

**Mode A — Claude researches (default):**
Use `web_search` to find the top competitor content for the topic. Use the BS4 competitor scraper (not `web_fetch`) to read the most relevant 5-10 results — this preserves accurate heading structure and verbatim text. Focus on:
- What subtopics they cover
- How they structure their content (H2/H3 breakdown)
- What angles or claims they make
- What they leave out (gaps)

**Mode B — User provides URLs:**
If the user gives specific URLs, use the competitor scraper to bulk-fetch them:
```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py URL1 URL2 URL3 --output-dir ./working/competitor_content/
```
Then read the output files and analyze them.

**Mode C — User provides research:**
If the user pastes in research, notes, or analysis, skip scraping and work from what they gave you.

**Output:** Write a research summary covering:
1. Common themes across competitors (what everyone covers)
2. Content structure patterns (how they organize it)
3. Key entities, terms, and concepts mentioned repeatedly
4. Gaps — what competitors miss or cover poorly
5. Potential unique angles

Save the research summary to `./working/research_summary.md`.

### Step 3 — Build Outline

Using the research summary, build a structured outline:

1. **Generate fan-out queries** — Before structuring the outline, generate 10-15 search queries you would use to thoroughly research this topic. These are the natural "next searches" someone would run after the primary keyword — questions, comparisons, material/process specifics, use-case queries. Examples for "cnc swiss screw machining":
   - "what is swiss screw machining"
   - "swiss screw machining vs cnc turning"
   - "swiss machining tolerances"
   - "what materials can be swiss machined"
   - "swiss screw machining for medical devices"
   - "when to use swiss machining vs conventional lathe"
   These queries represent the search cluster around the topic. The more of them the content answers, the more authoritative it becomes across related searches.

2. **Cover the common ground** — Include the themes that all/most competitors address. Missing these makes content look incomplete.
3. **Identify 1-2 unique angles** — Find something competitors are NOT covering well. This is the content's differentiator.
4. **Shape H3 headings from fan-out queries** — Map the strongest fan-out queries to H3 headings. Headings that match real search patterns give the content more surface area across the query cluster. A heading like "What Materials Can Be Swiss Machined?" is better than "Materials" because it mirrors how people actually search.
5. **Structure for scanning** — Use clear H2 sections with H3 subsections. Each H2 should address one major subtopic.
6. **Include notes on each section** — Brief description of what goes in each section and why.

Consult `references/content_frameworks.md` for structural templates (how-to, listicle, comparison, etc.) and select the best fit for the topic.

**IMPORTANT: YOU NEED A CORA REPORT BEFORE building the outline.** The Cora report provides:
- Heading count targets (H2, H3 counts) that shape the outline structure
- Entity lists that inform heading names (pack entity terms into H2/H3 headings)
- Word count targets that determine section depth
- Structure targets (entities per heading level, variations per heading level) that guide how keyword-rich headings should be

If the user has not yet provided the Cora XLSX, **ask for it before proceeding with the outline.** Research can happen without Cora, but the outline should not be built without it.

Save the outline to `./working/outline.md`.

### Step 4 — HUMAN REVIEW (STOP AND WAIT)

**Present the outline to the user and ask:**

> "Here's the outline based on the research. Review it and let me know:
> 1. Any sections to add, remove, or reorder?
> 2. Are the unique angles worth pursuing?
> 3. Any specific points or data you want included?
> 4. Anything else before I draft?"

**Do NOT proceed until the user responds.** This is a critical gate. Incorporate all feedback before moving on.

### Step 5 — Write First Draft

Write the full content based on the approved outline:

- Follow the structure exactly as approved
- Consult `references/brand_guidelines.md` for voice and tone guidance
- Write in clear, scannable paragraphs (max 4 sentences per paragraph)
- Use subheadings every 2-4 paragraphs
- Include lists, examples, and concrete details where appropriate
- Aim for the word count the user specified.

**Fan-Out Query (FOQ) Section:**
After the main content, write a separate FOQ section using the fan-out queries from the outline. This section is **excluded** from word count and heading count targets — it lives outside the core article.

- Each FOQ is an H3 heading phrased as a question
- Answer in 2-3 sentences max, self-contained
- **Restate the question in the answer** — this is the format LLMs and featured snippets prefer for citation: "How does X work? X works by..."
- The user may style these as accordions, FAQ schema, or hidden divs
- Mark the section clearly (e.g. `<!-- FOQ SECTION START -->`) so it's easy to separate from the main content

Save the draft to `./working/draft.md`.

Tell the user: "First draft is ready. If you have a Cora report for this keyword, provide the XLSX path and I'll optimize against it. Otherwise, let me know what changes you'd like."

---

## Phase 2: Cora Optimization

This phase begins when the user provides a Cora XLSX report. The draft may come from Phase 1, or the user may provide an existing draft to optimize.

### Step 6 — Load Cora Report

Parse the Cora XLSX and display a summary of targets:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet summary
```

Show the user:
- Search term and keyword variations
- Entity count and deficit count
- LSI keyword count and deficit count
- Word count target (cluster target, not raw average)
- Density targets (variation, entity, LSI)
- Key optimization rules that will be applied

### Step 7 — Entity Optimization

Run the entity optimizer against the draft:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python entity_optimizer.py "{draft_path}" "{cora_xlsx_path}" --top-n 30
```

Review the output and apply the top recommendations:
- Focus on entities with high relevance AND high remaining deficit
- Add entities naturally — they must fit the context of the section
- Prioritize adding entities to H2 and H3 headings first (these are primary optimization targets)
- Do NOT force entities where they don't make sense — readability always wins
- H1: exactly 1, always. Do not add a second H1.
- H5, H6: ignore completely
- H4: only add if most competitors have them

After applying entity changes, save the updated draft.

### Step 8 — LSI Keyword Optimization

Run the LSI optimizer:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python lsi_optimizer.py "{draft_path}" "{cora_xlsx_path}" --min-correlation 0.2 --top-n 50
```

Apply LSI keyword recommendations:
- Focus on keywords with strongest correlation (highest absolute value = most ranking impact)
- Many LSI keywords are common phrases that may already appear naturally
- Add missing keywords in body text, not just headings
- Some LSI keywords overlap with entities — count these once, benefit twice

After applying LSI changes, save the updated draft.

### Step 9 — Structure & Density Check

Check the overall structure against Cora targets:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet structure --format json
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet densities --format json
```

Verify and adjust:
- **Heading counts:** Compare H1, H2, H3, H4 counts against Page 1 Average targets. Add or consolidate headings as needed.
- **Entities per heading level:** Check that each heading level has enough entity mentions vs. the Structure sheet targets.
- **Variations in headings:** Ensure keyword variations appear in H2/H3 headings at target levels.
- **Density targets:** Check variation density, entity density, and LSI density against the Strategic Overview percentages.
- **Word count:** Compare against the cluster target (NOT the raw average). If below target, identify which sections could be expanded.

**Important density note:** Adding content to meet one target changes the denominator for ALL density calculations. After significant word count changes, re-check densities. Usually 1-2 optimization passes are sufficient.

### Step 10 — Keyword Density Check (Optional)

If a quick keyword density check is useful:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python seo_optimizer.py "{draft_path}" --cora-xlsx "{cora_xlsx_path}"
```

Key rules:
- Exact match keyword density: 2% minimum, no upper limit
- Variations capture exact match — hitting variation density targets covers exact match
- Do NOT flag keyword stuffing. There is no practical upper limit that hurts rankings.

### Step 11 — Meta Title, Meta Description, and URL Slug

Generate meta tags and add them as an HTML comment block at the top of the draft file.

**Meta title format:** Pack keyword variations into a pipe-separated title tag. Google reads far more than the ~60 characters it displays — a long title tag with variations gives the page more surface area across related searches.  You can go up to 500 characters but do not have to.

Format: `Exact Search Term | Variation 1 | Variation 2 | ... | Company Name`

Use the keyword variations from the Cora report. Only include variations that have a page1_avg > 0 (competitors actually use them). Put the highest-value variations first.

**Meta description:** Write a keyword-rich summary (~350-500 characters) that hits the primary keyword, key variations, materials, sizes, and company name. This is not just a copy of the intro paragraph — it should be independently optimized.

**URL slug:** Short, keyword-focused. Example: `/custom-spun-hemispheres`

Add to the top of the draft file:
```html
<!--
META TITLE: Exact Search Term | Variation 1 | Variation 2 | Company Name
META DESCRIPTION: Keyword-rich summary here.
URL SLUG: /url-slug-here
-->
```

### Step 12 — Image & Diagram Placement

Read through the draft md file and identify where visuals would enhance the content:

For each recommendation, specify:
- **Location:** After which heading or paragraph
- **Type:** Photo, diagram, chart, infographic, screenshot, illustration
- **Description:** What the visual should show
- **Rationale:** Why it adds value at that point (breaks up text, illustrates a process, makes data tangible, etc.)

Common placement triggers:
- Sections describing a process or workflow (diagram)
- Sections with comparative data (chart or table)
- Long text-only stretches (break up with a relevant image)
- Technical concepts that benefit from visual explanation (diagram)
- Before/after scenarios (side-by-side images)

### Step 13 — HUMAN REVIEW (STOP AND WAIT)

**Present the final draft, optimization summary, and image suggestions to the user:**

> "Here's the optimized draft. Summary of changes:
> - [X] entities added across [Y] sections
> - [X] LSI keywords incorporated
> - Word count: [current] (target: [target])
> - Variation density: [current]% (target: [target]%)
> - Entity density: [current]% (target: [target]%)
> - [X] image/diagram placements suggested
>
> Review the draft. What needs adjusting?"

**Do NOT finalize until the user approves.**

### Step 14 — HTML Export

After the user approves the draft, convert the markdown to plain HTML for WordPress. Save as `./working/draft.html` (or `draft_normal.html`, `draft_storybrand.html` if multiple versions exist).

Rules:
- **Plain HTML only** — no classes, no divs, no wrappers. Just `<h2>`, `<h3>`, `<p>`, `<ul>/<li>`, and `<strong>` tags.
- **Omit the H1** — WordPress sets the page title separately. Do not include an `<h1>` tag in the HTML.
- **Keep the meta comment block** at the top (META TITLE, META DESCRIPTION, URL SLUG).
- **Keep the FOQ comment markers** (`<!-- FOQ SECTION START -->` / `<!-- FOQ SECTION END -->`) so the user can identify that section for special styling.
- The user pastes this HTML into WordPress Gutenberg's Code Editor view, where it maps directly to blocks.

---

## Phase 3: Quick Test Block

A standalone workflow for testing whether adding entities, keywords, and headings moves rankings before investing in full content optimization. The output is a minimal text block placed in a hidden div on the page for A/B testing.

**Key principle:** The LLM handles all intelligence — filtering entities for topical relevance, writing headings, creating body templates. Python scripts handle all math — slot filling, density tracking, stop conditions, validation. There are NO per-entity mention targets — only aggregate density percentages and distinct entity counts.

### When to Use

User says "test block," "hidden div," "quick test," "test the entities," or similar. This is NOT part of Phase 2 — it is an independent workflow. Requirements: a Cora report and existing content (URL or file).

### Step T1 — Load Inputs

- Pull existing content via BS4 scraper if a URL is provided, or read from file if a path is given.
- Save existing content to `{cwd}/working/existing_content.md` if fetched from URL.

```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "{url}" --output-dir {cwd}/working/
```

Then rename the output file to `{cwd}/working/existing_content.md`.

### Step T2 — Run Prep Script (Programmatic)

Run `test_block_prep.py` to extract all deficit data:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_prep.py "{content_path}" "{cora_xlsx_path}" --format json -o {cwd}/working/prep_data.json
```

This outputs structured JSON with:
- Word count vs target + deficit
- Distinct entity count vs target + deficit + list of missing 0-count entities
- Variation density % vs target (Cora row 46)
- Entity density % vs target (Cora row 47)
- LSI density % vs target (Cora row 48)
- Heading structure deficits (H2, H3 counts; entities/variations in headings)
- **Template instructions**: how many templates to generate, how many slots per template, target word count

Review the prep output. All numbers come from deterministic script analysis — no estimation.

### Step T3 — Filter Entities for Topical Relevance (LLM Step)

Read the `missing_entities` list from `{cwd}/working/prep_data.json`. This list contains ALL entities with 0 mentions on the existing page, sorted by Cora relevance score. **Many of these will be noise** — navigation terms, competitor names, unrelated concepts that happen to appear on ranking pages.

Review every entity and keep ONLY those that are topically relevant to the page's subject matter. Ask: "Would a subject matter expert writing about [page topic] naturally mention this term?"

**Remove:**
- Competitor company names and brands
- People (athletes, historical figures, etc.)
- Web furniture (blog, menu, privacy, FAQ, social media platforms)
- Geographic entities unrelated to the topic
- Software, media, organisms, and other off-topic typed entities
- Generic terms that only appear due to page chrome (calculator, glossary, children, etc.)

**Keep:**
- Terms directly related to the product/service/topic
- Materials, processes, components, and industry terms
- Related applications and industries where the product is used
- Technical specifications and engineering concepts

Save the filtered entity names to `{cwd}/working/filtered_entities.txt`, one entity per line, ordered from most to least relevant.

### Step T4 — Generate Headings and Body Templates (LLM Creative Step)

This step has two parts. Read the prep JSON for the numbers you need:
- `headings.h2.deficit`: how many H2 headings to generate
- `headings.h3.deficit`: how many H3 headings to generate
- `headings.entities_in_headings.deficit`: how many entity mentions needed across all headings
- `template_instructions.num_templates`: how many body templates to create
- `template_instructions.slots_per_sentence`: how many `{N}` slots per body template
- `template_instructions.avg_words_per_template`: target words per template (~15)

**Part 1 — Write headings:**

Using the filtered entity list from T3 and your understanding of the page topic, write topically relevant H2 and H3 headings. These are final text — NOT templates, no `{N}` slots. The headings should:
- Read like real section headings a subject matter expert would write
- Naturally incorporate entities from the filtered list (aim to hit the entities_in_headings deficit)
- Be relevant to the page's topic and the types of content that would appear under them

**Part 2 — Write body templates:**

Generate body sentence templates with numbered placeholder slots. Follow the numbers from `template_instructions`:
- Create `num_templates` templates
- Each template gets `slots_per_sentence` numbered slots: `{1}`, `{2}`, `{3}`, etc. Slots MUST be numbered — the generator regex matches `{1}`, `{2}`, NOT `{N}`.
- Templates must be topically relevant to the page's subject matter
- Templates should be grammatically coherent but brevity wins over polish
- Do NOT try to specify which entities go in which slot — the generator script handles that

Save everything to `{cwd}/working/templates.txt`, one per line. Headings are prefixed with `H2:` or `H3:`, body templates are plain text with `{N}` slots.

Example for an expansion joints page:
```
H2: Bellows Expansion Joints for Industrial Piping Systems
H2: Metal and Rubber Expansion Joint Applications in Water Treatment
H3: Gasket and Flange Connections for Expansion Joints
{1} and {2} are critical components used to absorb thermal movement and reduce stress in piping systems.
{1} provide reliable performance in demanding {2} environments where thermal cycling is constant.
```

### Step T5 — Run Generator Script (Programmatic)

Run `test_block_generator.py` to fill body template slots and assemble the test block. The script requires the LLM-curated entity list from T3:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_generator.py {cwd}/working/templates.txt {cwd}/working/prep_data.json "{cora_xlsx_path}" --entities-file {cwd}/working/filtered_entities.txt --output-dir {cwd}/working/ --min-sentences 5
```

The script:
1. Loads the LLM-curated entity list — uses ONLY these entities for slot filling (no script-level filtering)
2. Builds a term queue: filtered entities first, then keyword variations
3. Inserts pre-written headings as-is (no slot filling on heading lines)
4. Fills body template slots, rotating through the term queue (no duplicates within a sentence)
5. Tracks projected densities: (baseline_mentions + new_mentions) / (baseline_words + new_words)
6. Stops when: all density targets met, distinct entity deficit closed, word count deficit closed, AND minimum sentence count reached

Output files:
- `{cwd}/working/test_block.md` — Markdown version
- `{cwd}/working/test_block.html` — Plain HTML version
- `{cwd}/working/test_block_stats.json` — Generation stats (mentions added, entities introduced, projected densities)

### Step T6 — Rewrite Body Sentences for Readability (LLM Step — use Haiku)

The generator produces grammatically rough sentences because entities get slotted into positions where they don't naturally fit. This step rewrites each body sentence to read naturally while preserving entity strings exactly.

**Use Haiku for this step** — it's fast and cheap enough to handle sentence-by-sentence rewrites.

Read `{cwd}/working/test_block.md`. For each body sentence (NOT headings — leave all H2/H3 lines exactly as they are):

1. Identify which entity terms from `{cwd}/working/filtered_entities.txt` appear in the sentence
2. Rewrite the sentence so it is grammatically correct and reads naturally
3. **Preserve every entity string exactly** — same spelling, same case. Do not paraphrase, hyphenate, abbreviate, or pluralize entity terms. "stainless steel" must remain "stainless steel", not "stainless-steel" or "SS".
4. Keep the sentence under 20 words
5. The rewrite should be topically relevant to the page subject

Reassemble the test block with:
- Same `<!-- HIDDEN TEST BLOCK START -->` / `<!-- HIDDEN TEST BLOCK END -->` markers
- Same headings in the same positions
- Rewritten body sentences grouped into paragraphs (4 sentences per paragraph)

Overwrite both files:
- `{cwd}/working/test_block.md` (markdown format)
- `{cwd}/working/test_block.html` (HTML format with `<h2>`, `<p>` tags)

### Step T7 — Run Validation Script (Programmatic)

Run `test_block_validate.py` for a deterministic before/after comparison:

```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_validate.py "{content_path}" {cwd}/working/test_block.md "{cora_xlsx_path}" --format json -o {cwd}/working/validation_report.json
```

This produces a report showing every metric before and after, with targets and status:
- Word count, distinct entities, entity density %, variation density %, LSI density %
- Heading counts (H2, H3), entities/variations in headings
- List of all new 0->1 entities introduced
- All numbers are from the same counting code — no mixing of data sources

Present the validation report to the user. Flag any metric that dropped below target after the test block was added.

---

## Optimization Rules

These override any data from the Cora report:

| Rule | Detail |
|------|--------|
| H1 count | Exactly 1, always |
| H2, H3 | Primary optimization targets — focus entity/variation additions here |
| H4 | Low priority — only add if most competitors have them |
| H5, H6 | Ignore completely |
| Word count | Target the nearest competitive cluster, not the raw average. Up to ~1,500 words is always acceptable even if the target is lower. |
| Exact match density | 2% minimum, no upper limit |
| Keyword stuffing | Do NOT flag or warn about keyword stuffing |
| Variations include exact match | Optimizing variation density inherently covers exact match |
| Density is interdependent | Adding content changes ALL density calculations — re-check after big changes |
| Optimization passes | 1-2 passes is typically sufficient |
| Competitor names | NEVER use competitor company names as entities or LSI keywords. Do not mention competitors by name in content. |
| Measurement entities | Ignore measurements (dimensions, tolerances, etc.) as entities — skip these in entity optimization |
| Organization entities | Organizations like ISO, ANSI, ASTM are fine — keep these as entities |
| Entity correlation filter | Only entities with Best of Both <= -0.19 are included. Best of Both is the lower of Spearman's or Pearson's correlation to ranking position (1=top, 100=bottom), so more negative = stronger ranking signal. This filter is applied in `cora_parser.py` and affects all downstream consumers. To disable, set `entity_correlation_threshold` to `None` in `OPTIMIZATION_RULES`. Added 2026-03-20 — revert if entity coverage feels too thin. |

---

## Scripts Reference

All scripts are in `{skill_dir}/scripts/`. Run them with `uv run --with openpyxl python` (or `--with requests,beautifulsoup4` for the scraper).

### cora_parser.py
Foundation module. Reads a Cora XLSX and extracts structured data.
```
uv run --with openpyxl python cora_parser.py <xlsx_path> [--sheet SHEET] [--format json|text]
```
Sheets: `summary`, `entities`, `lsi`, `variations`, `structure`, `densities`, `targets`, `wordcount`, `results`, `tunings`, `all`

### entity_optimizer.py
Counts entities in a draft against Cora targets, recommends additions sorted by (relevance x deficit).
```
uv run --with openpyxl python entity_optimizer.py <draft_path> <cora_xlsx_path> [--format json|text] [--top-n 30]
```

### lsi_optimizer.py
Counts LSI keywords in a draft against Cora targets, recommends additions sorted by (|correlation| x deficit).
```
uv run --with openpyxl python lsi_optimizer.py <draft_path> <cora_xlsx_path> [--format json|text] [--min-correlation 0.2] [--top-n 50]
```

### seo_optimizer.py
Keyword density, structure, and readability checks. Optional Cora integration.
```
uv run --with openpyxl python seo_optimizer.py <draft_path> [--keyword <kw>] [--cora-xlsx <path>] [--format json|text]
```

### competitor_scraper.py
Utility for bulk-fetching URLs when the user provides a list.
```
uv run --with requests,beautifulsoup4 python competitor_scraper.py <url1> <url2> ... [--output-dir ./working/competitor_content/]
```

### test_block_prep.py

Extracts all deficit data from existing content + Cora XLSX. Outputs structured JSON with word count, entity/variation/LSI density deficits, heading deficits, missing entities list, and calculated template instructions (num_templates, slots_per_sentence).

```
uv run --with openpyxl python test_block_prep.py <content_path> <cora_xlsx_path> [--format json|text] [-o PATH]
```

### test_block_generator.py

Fills body template slots with entities from an LLM-curated entity list. Inserts pre-written headings as-is (no slot filling). Tracks aggregate densities in real-time, stops when all targets are met. Outputs test_block.md, test_block.html, and test_block_stats.json.

```
uv run --with openpyxl python test_block_generator.py <templates_path> <prep_json_path> <cora_xlsx_path> --entities-file <path> [--output-dir DIR] [--min-sentences N]
```

### test_block_validate.py

Deterministic before/after comparison. Runs the same counting logic on existing content alone vs existing content + test block. Shows every metric with before, after, target, and status.

```
uv run --with openpyxl python test_block_validate.py <content_path> <test_block_path> <cora_xlsx_path> [--format json|text] [-o PATH]
```

---

## Reference Files

- `references/content_frameworks.md` — Article templates (how-to, listicle, comparison, case study, thought leadership), persuasion frameworks (AIDA, PAS), introduction and conclusion patterns.
- `references/brand_guidelines.md` — Voice archetypes, writing principles, tone spectrums, language preferences, pre-publication checklist.

---

## Working Directory

**CRITICAL: All output files MUST be written to `{cwd}/working/` — the `working/` subfolder inside the user's current project directory (where Claude Code was launched). NEVER write files to the skill directory, scripts directory, or any location outside the project folder. When running scripts, always use absolute paths for output flags (`-o`, `--output-dir`) pointing to `{cwd}/working/`.**

All intermediate files go in `{cwd}/working/` (the user's project directory):
- `working/research_summary.md` — Research output from Step 2
- `working/outline.md` — Outline from Step 3
- `working/draft.md` — Content draft (updated in place during optimization)
- `working/competitor_content/` — Scraped competitor text files (if URLs were fetched)
- `working/existing_content.md` — BS4-scraped existing page content (Phase 3)
- `working/prep_data.json` — Deficit analysis output from test_block_prep.py (Phase 3)
- `working/filtered_entities.txt` — LLM-curated entity list, one per line (Phase 3, Step T3)
- `working/templates.txt` — Pre-written headings + body templates with numbered slots (Phase 3, Step T4)
- `working/test_block.md` — Quick test block in markdown (Phase 3)
- `working/test_block.html` — Quick test block in plain HTML (Phase 3)
- `working/test_block_stats.json` — Generation stats: mentions added, entities introduced, projected densities (Phase 3)
- `working/validation_report.json` — Before/after comparison from test_block_validate.py (Phase 3)