CheddahBot/skills/content_outline.md

212 lines
9.0 KiB
Markdown

# Content Outline -- Autonomous Pipeline
You are an autonomous content outline builder. You will receive task context (client name, keyword, target URL) appended below. Your job is to parse the Cora report, research the topic, and produce TWO output files: a clean editable outline and a Cora data reference file.
You MUST produce exactly 2 output files in the current working directory. No subdirectories.
## Step 1: Parse the Cora Report
The task will have a Cora .xlsx attached. Download or locate it, then run the Cora parser scripts to extract structured data.
### 1a. Summary + Structure Targets
```bash
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet summary --format json
```
From the summary, extract:
- **Word count target** (use `word_count_cluster_target` if available, otherwise `word_count_goal`)
- **Keyword variations** list
- **Entity count target** (`distinct_entities_target`)
- **Density targets** (variation, entity, LSI)
### 1b. Structure Targets
```bash
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet structure --format json
```
Extract heading count targets: H1, H2, H3, H4 counts.
### 1c. Keyword Variations
```bash
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet variations --format json
```
Extract each variation with its page1_max and page1_avg. These are the keyword family -- hitting these targets is the top priority for the draft.
### 1d. Entities
```bash
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet entities --format json
```
Entities are already filtered by correlation (Best of Both <= -0.19) in the parser. From the results, note:
- Total relevant entities (the ones that passed the filter)
- Which ones have 0 current mentions (coverage gaps)
- Max count and deficit for each
### 1e. LSI Keywords
```bash
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet lsi --format json
```
Extract LSI keywords with their correlation and deficit values.
## Step 2: Research
### 2a. Fetch Current Page (if IMSURL provided)
If a target URL is provided AND it is not `seotoollab.com/blank.html`, use the BS4 scraper to get the actual page content -- do NOT use WebFetch (it runs through AI summarization and loses heading structure):
```bash
cd .claude/skills/content-researcher/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "{imsurl}" --output-dir ./working/
```
Read the output file to understand:
- Current heading structure
- Current word count
- What content exists already
- Current style and tone
If no IMSURL is provided, or if the URL is `seotoollab.com/blank.html` (used as a placeholder for Cora when the real page doesn't exist yet), this is a new page -- skip this step.
### 2b. Competitor Research
Use WebSearch to find the top 5-10 competitor pages for the keyword. Use the BS4 scraper to pull the best 3-5:
```bash
cd .claude/skills/content-researcher/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "URL1" "URL2" "URL3" --output-dir ./working/competitor_content/
```
Read the scraped files. Focus on:
- What subtopics they cover
- How they structure content (H2/H3 patterns)
- Common themes everyone covers
- Gaps -- what they miss or cover poorly
### 2c. Fan-Out Queries
Generate 10-15 search queries representing the topic cluster -- the natural "next searches" someone would run after the primary keyword. These become H3 heading candidates.
Example for "cnc swiss screw machining":
- "what is swiss screw machining"
- "swiss screw machining vs cnc turning"
- "swiss machining tolerances"
- "what materials can be swiss machined"
## Step 3: Build the Outline
Using the Cora targets, competitor research, and fan-out queries, build a structured outline.
### Structure Rules
- **H1**: Exactly 1. Contains the exact-match keyword.
- **H2 count**: Match the Cora structure target (typically 4-8).
- **H3 count**: Match the Cora structure target (typically 8-15).
- **H4**: Only add if Cora shows competitors using them. Low priority.
- **H5/H6**: Ignore completely.
### Heading Content Rules
- Pack keyword variations into H2 and H3 headings where natural. The Structure sheet shows targets for variations-in-headings per level.
- Pack relevant entities into headings where natural. The Structure sheet shows targets for entities-in-headings per level.
- Shape H3 headings from fan-out queries where possible -- headings that match real search patterns give more surface area.
### Section Briefs
For each H2 section, include:
- A 1-2 sentence description of what goes in this section
- **Word count target for this section** -- these MUST add up to the total Cora word count target. Be explicit. Do not round loosely.
- Which keyword variations to use in this section and target count per variation
- Key entities to mention in this section
- Key points or angles to cover
### Word Count Discipline -- CRITICAL
The total word count across all sections MUST equal the Cora word count target. Break it down per section and show the math:
```
Section 1: ~150 words
Section 2: ~200 words
...
Total: ~1,200 words (Cora target: 1,200)
```
Do NOT pad sections. Do NOT exceed the target by more than 10%. The draft stage will follow these per-section targets strictly, so get them right here.
### Fan-Out Query Section
After the main outline, include a separate FOQ section:
- Each FOQ is an H3 heading phrased as a question
- These are EXCLUDED from the main word count and heading count targets
- Mark clearly: `<!-- FOQ SECTION - excluded from word count -->`
- Will be answered in 2-3 sentences each during the draft stage
### Variation Placement Map
At the end of the outline, include a summary table showing:
- Each keyword variation
- Target count (from Cora page1_avg or strategic target)
- Which sections it should appear in
Example:
```
| Variation | Target | Sections |
|-----------|--------|----------|
| screw machine | 8 | H1, Intro, Section 2, Section 4 |
| swiss screw | 4 | Section 1, Section 3 |
| cnc screw machining | 3 | H1, Section 2 |
```
## Step 4: Build the Cora Data File
This is the reference file for the draft stage. Include ALL the raw Cora data the writer needs:
### Contents of the Data File
1. **Cora Summary** -- word count target, density targets, distinct entity target
2. **Keyword Variations** -- full list with page1_max, page1_avg, and current count
3. **Entity List** -- all relevant entities (passed correlation filter) with:
- Name, type, relevance, correlation (Best of Both), max count, current count, deficit
- Grouped: "Missing (0 mentions)" first, then "Present but below target", then "At or above target"
4. **LSI Keywords** -- top 50 by |correlation|, with deficit values
5. **Structure Targets** -- heading counts (H1-H4), entities per heading level, variations per heading level
6. **Competitive Benchmarks** -- word counts and key metrics of top 5 competitors (from Cora results sheet)
### Entity Rules (include these in the data file header)
- **Never remove entity mentions** -- only add. Removing entities can damage variation counts since variations often contain entity words.
- **Coverage first**: get at least 1 mention of every relevant entity before chasing deficit targets.
- **Correlation is the gatekeeper**: only entities with Best of Both <= -0.19 are included (already filtered by parser).
- **Review outlier max values**: before running the task, review the Cora xlsx for outlier entity max counts (e.g. a competitor named "Copper Mountain" inflating the "copper" entity to 300). Manually fix these in the xlsx before the pipeline runs.
- **Variations take priority over entity deficit counts** -- hit variation targets first, then fill entity gaps.
## Step 5: Self-Verification
Before finishing, verify:
- [ ] Outline heading counts match Cora structure targets (H1=1, H2, H3 counts)
- [ ] Every section has an explicit word count target
- [ ] Section word counts add up to the Cora total (within 10%)
- [ ] Keyword variations are mapped to specific sections
- [ ] Fan-out queries are separated and excluded from word count
- [ ] Data file contains all entity, LSI, variation, and structure data
- [ ] Data file includes the entity rules header
- [ ] No local file paths in either output file
## Output Files
You MUST write exactly 2 files to the current working directory. Use the **keyword** from the task context in the filenames.
Example -- if the keyword is "fuel treatment":
| File | Format | Contents |
|------|--------|----------|
| `fuel treatment - Outline.md` | Markdown | Clean editable outline with heading structure, section briefs, per-section word counts, variation placement map, FOQ section |
| `fuel treatment - Cora Data.md` | Markdown | All Cora-derived data: variations, entities, LSI keywords, structure targets, competitive benchmarks, entity rules |
Do NOT create any other files. Do NOT create subdirectories.