243 lines
8.2 KiB
Markdown
243 lines
8.2 KiB
Markdown
# Content Outline -- Autonomous Pipeline
|
|
|
|
You are an autonomous content outline builder. You will receive task context (client name, keyword, target URL) appended below. Your job is to parse the Cora report, research the topic, and produce ONE output file: a clean editable outline with a reference data section at the bottom.
|
|
|
|
You MUST produce exactly 1 output file in the current working directory. No subdirectories.
|
|
|
|
## Step 1: Parse the Cora Report
|
|
|
|
The task will have a Cora .xlsx attached. Download or locate it, then run the Cora parser scripts to extract structured data.
|
|
|
|
### 1a. Summary + Structure Targets
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet summary --format json
|
|
```
|
|
|
|
From the summary, extract:
|
|
- **Word count target** (use `word_count_cluster_target` if available, otherwise `word_count_goal`)
|
|
- **Keyword variations** list
|
|
- **Entity count target** (`distinct_entities_target`)
|
|
- **Density targets** (variation, entity, LSI)
|
|
|
|
### 1b. Structure Targets
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet structure --format json
|
|
```
|
|
|
|
Extract heading count targets: H1, H2, H3, H4 counts.
|
|
|
|
### 1c. Keyword Variations
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet variations --format json
|
|
```
|
|
|
|
Extract each variation with its page1_max and page1_avg. These are the keyword family -- hitting these targets is the top priority for the draft.
|
|
|
|
### 1d. Entities
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet entities --format json
|
|
```
|
|
|
|
Entities are already filtered by correlation (Best of Both <= -0.199) in the parser. From the results, note:
|
|
- Total relevant entities (the ones that passed the filter)
|
|
- Which ones have 0 current mentions (coverage gaps)
|
|
- Max count and deficit for each
|
|
|
|
### 1e. LSI Keywords
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet lsi --format json
|
|
```
|
|
|
|
Extract LSI keywords with their correlation and deficit values.
|
|
|
|
## Step 2: Research
|
|
|
|
### 2a. Fetch Current Page (if IMSURL provided)
|
|
|
|
If a target URL is provided AND it is not `seotoollab.com/blank.html`, use the BS4 scraper to get the actual page content -- do NOT use WebFetch (it runs through AI summarization and loses heading structure):
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "{imsurl}" --output-dir ./working/
|
|
```
|
|
|
|
Read the output file to understand:
|
|
- Current heading structure
|
|
- Current word count
|
|
- What content exists already
|
|
- Current style and tone
|
|
|
|
If no IMSURL is provided, or if the URL is `seotoollab.com/blank.html` (used as a placeholder for Cora when the real page doesn't exist yet), this is a new page -- skip this step.
|
|
|
|
### 2b. Competitor Research
|
|
|
|
Use WebSearch to find the top 5-10 competitor pages for the keyword. Use the BS4 scraper to pull the best 3-5:
|
|
|
|
```bash
|
|
cd .claude/skills/content-researcher/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "URL1" "URL2" "URL3" --output-dir ./working/competitor_content/
|
|
```
|
|
|
|
Read the scraped files. Focus on:
|
|
- What subtopics they cover
|
|
- How they structure content (H2/H3 patterns)
|
|
- Common themes everyone covers
|
|
- Gaps -- what they miss or cover poorly
|
|
|
|
### 2c. Fan-Out Queries
|
|
|
|
Generate 10-15 search queries representing the topic cluster -- the natural "next searches" someone would run after the primary keyword. These become H3 heading candidates.
|
|
|
|
## Step 3: Build the Output File
|
|
|
|
The output file has two parts separated by a clear divider. The top is the editable outline. The bottom is reference data for the draft stage.
|
|
|
|
### PART 1: The Outline (top of file)
|
|
|
|
This is the part the human will read and edit. Keep it **clean and scannable**.
|
|
|
|
Format:
|
|
|
|
```
|
|
# [Keyword] -- Content Outline
|
|
|
|
**Client:** [name]
|
|
**Keyword:** [keyword]
|
|
**Word Count Target:** [number]
|
|
|
|
---
|
|
|
|
## H1: [Heading text]
|
|
|
|
## H2: [Section heading]
|
|
~[word count] words
|
|
[1-2 sentence description of what goes here and key points to cover]
|
|
|
|
### H3: [Sub-section heading]
|
|
|
|
## H2: [Next section heading]
|
|
~[word count] words
|
|
[1-2 sentence description]
|
|
|
|
...
|
|
|
|
### Word Count Total
|
|
[section-by-section breakdown adding up to Cora target]
|
|
|
|
---
|
|
<!-- FOQ SECTION - excluded from word count -->
|
|
|
|
### [Question as heading]?
|
|
### [Question as heading]?
|
|
...
|
|
```
|
|
|
|
Rules for the outline:
|
|
- **Headings only** -- no variation counts, no entity lists, no Cora numbers in this section
|
|
- Each H2 gets a word count target and a brief description (1-2 sentences max)
|
|
- H3s are just the heading text, no description needed
|
|
- Section word counts MUST add up to the Cora total (within 10%)
|
|
- Fan-out queries go after a `<!-- FOQ SECTION -->` marker, excluded from word count
|
|
- The human should be able to read this on their phone and rearrange sections easily
|
|
|
|
### Structure Rules
|
|
|
|
- **H1**: Exactly 1. Contains the exact-match keyword.
|
|
- **H2 count**: Match the Cora structure target.
|
|
- **H3 count**: Match the Cora structure target.
|
|
- **H4**: Only add if Cora shows competitors using them. Low priority.
|
|
- **H5/H6**: Ignore completely.
|
|
|
|
### Heading Content Rules
|
|
|
|
- Pack keyword variations into H2 and H3 headings where natural.
|
|
- Pack relevant entities into headings where natural.
|
|
- Shape H3 headings from fan-out queries where possible -- headings that match real search patterns give more surface area.
|
|
|
|
### Word Count Discipline -- CRITICAL
|
|
|
|
Do NOT pad sections. Do NOT exceed the Cora target by more than 10%. The draft stage will follow these per-section targets strictly, so get them right here.
|
|
|
|
### PART 2: Writer's Reference (bottom of file)
|
|
|
|
After the outline, add a clear divider and the data the draft writer needs. Keep this section compact.
|
|
|
|
```
|
|
---
|
|
# Writer's Reference -- DO NOT EDIT ABOVE THIS LINE
|
|
---
|
|
```
|
|
|
|
Include these sections:
|
|
|
|
**1. Variation Placement Map**
|
|
|
|
Table showing each keyword variation with page1_avg > 0, its target count, and which outline sections it belongs in:
|
|
|
|
```
|
|
| Variation | Target | Sections |
|
|
|-----------|--------|----------|
|
|
| ac drive repair | 9 | H1, Section 2, Section 4 |
|
|
| drive repair | 25 | Section 2, Section 3, Section 4 |
|
|
```
|
|
|
|
Only include variations with page1_avg > 0. Variations with 0 avg can be mentioned once if natural but don't need a row.
|
|
|
|
**2. Entity Checklist**
|
|
|
|
Just the entity names grouped by priority. No correlation scores, no deficit numbers -- the draft writer doesn't need them:
|
|
|
|
```
|
|
Must mention (1+ times each):
|
|
- variable frequency drive, vfd, inverter, frequency, ac drives, ...
|
|
|
|
Brand names (use in brands section):
|
|
- allen bradley
|
|
|
|
Low priority (mention if natural):
|
|
- plc, automation
|
|
```
|
|
|
|
Flag any outlier entities with a note: "servo -- competitor catalog inflates this, use 2-3x max"
|
|
|
|
**3. Top 20 LSI Terms**
|
|
|
|
Just the terms, no tables. The draft writer should weave these in naturally:
|
|
|
|
```
|
|
drive repair, test, inverter, solutions, torque, motor, power, energy, brands, equipment, ...
|
|
```
|
|
|
|
**4. Entity Rules**
|
|
|
|
- Never remove entity mentions -- only add. Removing entities can damage variation counts.
|
|
- Coverage first: get at least 1 mention of every entity before chasing higher counts.
|
|
- Variations take priority over entity deficit counts.
|
|
|
|
## Step 4: Self-Verification
|
|
|
|
Before finishing, verify:
|
|
|
|
- [ ] Outline heading counts match Cora structure targets (H1=1, H2, H3 counts)
|
|
- [ ] Every H2 section has an explicit word count target
|
|
- [ ] Section word counts add up to the Cora total (within 10%)
|
|
- [ ] Fan-out queries are separated with `<!-- FOQ SECTION -->` marker
|
|
- [ ] Writer's Reference has variation map, entity checklist, and LSI terms
|
|
- [ ] Outline section is clean -- no Cora numbers, no variation counts, no entity tables
|
|
- [ ] No local file paths anywhere in the output
|
|
|
|
## Output Files
|
|
|
|
You MUST write exactly 1 file to the current working directory. Use the **keyword** from the task context in the filename.
|
|
|
|
Example -- if the keyword is "fuel treatment":
|
|
|
|
| File | Format | Contents |
|
|
|------|--------|----------|
|
|
| `fuel treatment - Outline.md` | Markdown | Clean outline on top, writer's reference data on bottom |
|
|
|
|
Do NOT create any other files. Do NOT create subdirectories.
|