418 lines
17 KiB
Markdown
418 lines
17 KiB
Markdown
# CheddahBot Task Pipeline Flows — Complete Reference
|
|
|
|
## ClickUp Statuses Used
|
|
|
|
These are the ClickUp task statuses that CheddahBot reads and writes:
|
|
|
|
| Status | Set By | Meaning |
|
|
|--------|--------|---------|
|
|
| `to do` | Human (or default) | Task is waiting to be picked up |
|
|
| `automation underway` | CheddahBot | Bot is actively working on this task |
|
|
| `running cora` | CheddahBot (AutoCora) | Cora report is being generated by external worker |
|
|
| `outline review` | CheddahBot (Content) | Phase 1 outline is ready for human review |
|
|
| `outline approved` | Human | Human reviewed the outline, ready for Phase 2 |
|
|
| `pr needs review` | CheddahBot (Press Release) | Press release pipeline finished, PRs ready for human review |
|
|
| `internal review` | CheddahBot (Content/OPT) | Content/OPT pipeline finished, deliverables ready for human review |
|
|
| `complete` | CheddahBot (Link Building) | Pipeline fully done |
|
|
| `error` | CheddahBot | Something failed, needs attention |
|
|
| `in progress` | (configured but not used in automation) | — |
|
|
|
|
**What CheddahBot polls for:** `["to do", "outline approved"]` (config.yaml line 45)
|
|
|
|
---
|
|
|
|
## ClickUp Custom Fields Used
|
|
|
|
| Field Name | Type | Used By | What It Holds |
|
|
|------------|------|---------|---------------|
|
|
| `Work Category` | Dropdown | All pipelines | Determines which pipeline runs: "Press Release", "Link Building", "On Page Optimization", "Content Creation" |
|
|
| `PR Topic` | Text | Press Release | Press release topic/keyword (e.g. "Peek Plastic") — required |
|
|
| `Customer` | Text | Press Release | Client/company name — required |
|
|
| `Keyword` | Text | Link Building, Content, OPT | Target SEO keyword |
|
|
| `IMSURL` | Text | All pipelines | Target page URL (money site) — required for Press Release |
|
|
| `SocialURL` | Text | Press Release | Branded/social URL for the PR |
|
|
| `LB Method` | Dropdown | Link Building | "Cora Backlinks" or other methods |
|
|
| `CustomAnchors` | Text | Link Building | Custom anchor text overrides |
|
|
| `BrandedPlusRatio` | Number | Link Building | Ratio for branded anchors (default 0.7) |
|
|
| `CLIFlags` | Text | Link Building, Content, OPT | Extra flags passed to tools (e.g., "service") |
|
|
| `CoraFile` | Text | Link Building | Path to Cora xlsx file |
|
|
|
|
**Tags:** Tasks are tagged with month in `mmmyy` format (e.g., `feb26`, `mar26`).
|
|
|
|
---
|
|
|
|
## Background Threads
|
|
|
|
CheddahBot runs 6 daemon threads. All start at boot and run until shutdown.
|
|
|
|
| Thread | Interval | What It Does |
|
|
|--------|----------|-------------|
|
|
| **poll** | 60 seconds | Runs cron-scheduled tasks from the database |
|
|
| **heartbeat** | 30 minutes | Reads HEARTBEAT.md checklist, takes action if needed |
|
|
| **clickup** | 20 minutes | Polls ClickUp for tasks to auto-execute (only Press Releases currently) |
|
|
| **folder_watch** | 40 minutes | Scans `//PennQnap1/SHARE1/cora-inbox` for .xlsx files → triggers Link Building |
|
|
| **autocora** | 5 minutes | Submits Cora jobs for today's tasks + polls for results |
|
|
| **content_watch** | 40 minutes | Scans `//PennQnap1/SHARE1/content-cora-inbox` for .xlsx files → triggers Content/OPT Phase 1 |
|
|
| **cora_distribute** | 40 minutes | Scans `//PennQnap1/SHARE1/Cora-For-Human` for .xlsx files → distributes to pipeline inboxes |
|
|
|
|
---
|
|
|
|
## Pipeline 1: PRESS RELEASE
|
|
|
|
**Work Category:** "Press Release"
|
|
**auto_execute:** TRUE — the only pipeline that runs automatically from ClickUp polling
|
|
**Tool:** `write_press_releases`
|
|
|
|
### Flow
|
|
|
|
```
|
|
CLICKUP POLL (every 20 min)
|
|
│
|
|
├─ Finds task with Work Category = "Press Release", status = "to do", due within 3 weeks
|
|
│
|
|
▼
|
|
CHECK LOCAL DB
|
|
│ Key: clickup:task:{id}:state
|
|
│ If state = "executing" or "completed" or "failed" → SKIP (already handled)
|
|
│
|
|
▼
|
|
SET STATUS → "automation underway"
|
|
│ ClickUp API: PUT /task/{id} status
|
|
│ Local DB: state = "executing"
|
|
│
|
|
▼
|
|
STEP 1: Generate 7 Headlines (chat brain - GPT-4o-mini)
|
|
│ Uses configured chat model
|
|
│ Saves to: data/generated/press_releases/{company}/{slug}_headlines.txt
|
|
│
|
|
▼
|
|
STEP 2: AI Judge Picks Best 2 (chat brain)
|
|
│ Filters out rule-violating headlines (colons, superlatives, etc.)
|
|
│ Falls back to first 2 if judge returns < 2
|
|
│
|
|
▼
|
|
STEP 3: Write 2 Full Press Releases (execution brain - Claude Code CLI)
|
|
│ For each winning headline:
|
|
│ - Claude writes full 575-800 word PR
|
|
│ - Validates anchor phrase
|
|
│ - Saves .txt and .docx
|
|
│ - Uploads .docx to ClickUp as attachment
|
|
│
|
|
▼
|
|
STEP 4: Generate JSON-LD Schemas (execution brain - Sonnet)
|
|
│ For each PR:
|
|
│ - Generates NewsArticle schema
|
|
│ - Saves .json file
|
|
│
|
|
▼
|
|
SET STATUS → "internal review"
|
|
│ ClickUp API: comment with results + PUT status
|
|
│ Local DB: state = "completed"
|
|
│
|
|
▼
|
|
DONE — Human reviews in ClickUp
|
|
```
|
|
|
|
### ClickUp Fields Read
|
|
- `PR Topic` → press release topic/keyword (required)
|
|
- `Customer` → company name in PR (required)
|
|
- `IMSURL` → target URL for anchor link (required)
|
|
- `SocialURL` → branded URL (optional)
|
|
|
|
### What Can Go Wrong
|
|
- **BUG: Crash mid-step → stuck forever.** DB says "executing", never retries. Manual reset needed.
|
|
- **BUG: DB says "completed" but ClickUp API failed → out of sync.** DB written before API call.
|
|
- **BUG: Attachment upload fails silently.** Task marked complete, files missing from ClickUp.
|
|
- Headline generation returns empty → tool exits with error, task marked "failed"
|
|
- Schema JSON invalid → warning logged but task still completes
|
|
|
|
---
|
|
|
|
## Pipeline 2: LINK BUILDING (Cora Backlinks)
|
|
|
|
**Work Category:** "Link Building"
|
|
**auto_execute:** FALSE — triggered by folder watcher, not ClickUp polling
|
|
**Tool:** `run_cora_backlinks`
|
|
|
|
### Full Lifecycle (3 stages)
|
|
|
|
```
|
|
STAGE A: AUTOCORA SUBMITS CORA JOB
|
|
══════════════════════════════════
|
|
|
|
AUTOCORA LOOP (every 5 min)
|
|
│
|
|
├─ Calls submit_autocora_jobs(target_date = today)
|
|
│ Finds tasks: Work Category in ["Link Building", "On Page Optimization", "Content Creation"]
|
|
│ status = "to do"
|
|
│ due date = TODAY (exact 24h window) ← ★ BUG: misses overdue tasks
|
|
│
|
|
├─ Groups tasks by Keyword (case-insensitive)
|
|
│ If same keyword across multiple tasks → one job covers all
|
|
│
|
|
├─ For each keyword group:
|
|
│ Check local DB: autocora:job:{keyword_lower}
|
|
│ If already submitted → SKIP
|
|
│
|
|
▼
|
|
WRITE JOB FILE
|
|
│ Path: //PennQnap1/SHARE1/AutoCora/jobs/{job-id}.json
|
|
│ Content: {"keyword": "...", "url": "IMSURL", "task_ids": ["id1", "id2"]}
|
|
│ Local DB: autocora:job:{keyword} = {status: "submitted", job_id: "..."}
|
|
│
|
|
▼
|
|
SET ALL TASK STATUSES → "automation underway"
|
|
|
|
|
|
STAGE B: EXTERNAL WORKER RUNS CORA (not CheddahBot code)
|
|
═════════════════════════════════════════════════════════
|
|
|
|
Worker on another machine:
|
|
│ Watches //PennQnap1/SHARE1/AutoCora/jobs/
|
|
│ Picks up .json, runs Cora SEO tool
|
|
│ Writes .xlsx report to Z:/cora-inbox/ ← auto-deposited
|
|
│ Writes //PennQnap1/SHARE1/AutoCora/results/{job-id}.result = "SUCCESS" or "FAILURE: reason"
|
|
|
|
|
|
STAGE C: AUTOCORA POLLS FOR RESULTS
|
|
════════════════════════════════════
|
|
|
|
AUTOCORA LOOP (every 5 min)
|
|
│
|
|
├─ Scans local DB for autocora:job:* with status = "submitted"
|
|
│ For each: checks if results/{job-id}.result exists
|
|
│
|
|
├─ If SUCCESS:
|
|
│ Local DB: status = "completed"
|
|
│ ClickUp: all task_ids → status = "running cora"
|
|
│ ClickUp: comment "Cora report completed for keyword: ..."
|
|
│
|
|
├─ If FAILURE:
|
|
│ Local DB: status = "failed"
|
|
│ ClickUp: all task_ids → status = "error"
|
|
│ ClickUp: comment with failure reason
|
|
│
|
|
└─ If no result file yet: skip, check again in 5 min
|
|
|
|
|
|
STAGE D: FOLDER WATCHER TRIGGERS LINK BUILDING
|
|
═══════════════════════════════════════════════
|
|
|
|
FOLDER WATCHER (every 60 min)
|
|
│
|
|
├─ Scans Z:/cora-inbox/ for .xlsx files
|
|
│ Skips: ~$ temp files, already-completed files (via local DB)
|
|
│
|
|
├─ For each new .xlsx:
|
|
│ Normalize filename: "anti-vibration-rubber-mounts.xlsx" → "anti vibration rubber mounts"
|
|
│
|
|
▼
|
|
MATCH TO CLICKUP TASK
|
|
│ Queries all tasks in space with Work Category = "Link Building"
|
|
│ Fuzzy matches Keyword field against normalized filename:
|
|
│ - Exact match
|
|
│ - Substring match (either direction)
|
|
│ - >80% word overlap
|
|
│
|
|
├─ NO MATCH → local DB: status = "unmatched", notification sent, retry next scan
|
|
│
|
|
├─ MATCH FOUND but IMSURL empty → local DB: status = "blocked", ClickUp → "error"
|
|
│
|
|
▼
|
|
SET STATUS → "automation underway"
|
|
│
|
|
▼
|
|
STEP 1: Ingest CORA Report (Big-Link-Man subprocess)
|
|
│ Runs: E:/dev/Big-Link-Man/.venv/Scripts/python.exe main.py ingest-cora -f {xlsx} -n {keyword} ...
|
|
│ BLM parses xlsx, creates project, writes job file
|
|
│ Timeout: 30 minutes
|
|
│ ClickUp: comment "CORA report ingested. Project ID: ..."
|
|
│
|
|
▼
|
|
STEP 2: Generate Content Batch (Big-Link-Man subprocess)
|
|
│ Runs: python main.py generate-batch -j {job_file} --continue-on-error
|
|
│ BLM generates content for each prospect
|
|
│ Moves job file to jobs/done/
|
|
│
|
|
▼
|
|
SET STATUS → "complete"
|
|
│ ClickUp: comment with results
|
|
│ Move .xlsx to Z:/cora-inbox/processed/
|
|
│ Local DB: linkbuilding:watched:{filename} = {status: "completed"}
|
|
│
|
|
▼
|
|
DONE
|
|
```
|
|
|
|
### ClickUp Fields Read
|
|
- `Keyword` → matches against .xlsx filename + used as project name
|
|
- `IMSURL` → money site URL (required)
|
|
- `LB Method` → must be "Cora Backlinks" or empty
|
|
- `CustomAnchors`, `BrandedPlusRatio`, `CLIFlags` → passed to BLM
|
|
|
|
### What Can Go Wrong
|
|
- **BUG: AutoCora only checks today's tasks.** Due date missed = never gets a Cora report.
|
|
- **BUG: Crash mid-step → stuck "executing".** Same as PR pipeline.
|
|
- No ClickUp task with matching Keyword → file sits unmatched, notification sent
|
|
- IMSURL empty → blocked, ClickUp set to "error"
|
|
- BLM subprocess timeout (30 min) or crash → task fails
|
|
- Network share offline → can't write job file or read results
|
|
|
|
### Retry Behavior
|
|
- "processing", "blocked", "unmatched" .xlsx files → retried on next scan (KV entry deleted)
|
|
- "completed", "failed" → never retried
|
|
|
|
---
|
|
|
|
## Pipeline 3: CONTENT CREATION
|
|
|
|
**Work Category:** "Content Creation"
|
|
**auto_execute:** FALSE — triggered by content folder watcher
|
|
**Tool:** `create_content` (two-phase)
|
|
|
|
### Flow
|
|
|
|
```
|
|
STAGE A: AUTOCORA SUBMITS CORA JOB (same as Link Building Stage A)
|
|
══════════════════════════════════════════════════════════════════
|
|
Same AutoCora loop, same BUG with today-only filtering.
|
|
Worker generates .xlsx → deposits in Z:/content-cora-inbox/
|
|
|
|
|
|
STAGE B: CONTENT WATCHER TRIGGERS PHASE 1
|
|
══════════════════════════════════════════
|
|
|
|
CONTENT WATCHER (every 60 min)
|
|
│
|
|
├─ Scans Z:/content-cora-inbox/ for .xlsx files
|
|
│ Same skip/retry logic as link building watcher
|
|
│
|
|
├─ Normalize filename, fuzzy match to ClickUp task
|
|
│ Matches: Work Category in ["Content Creation", "On Page Optimization"]
|
|
│
|
|
├─ NO MATCH → "unmatched", notification
|
|
│
|
|
▼
|
|
PHASE 1: Research + Outline (execution brain - Claude Code CLI)
|
|
│
|
|
│ ★ BUG: Does NOT set "automation underway" status (link building watcher does)
|
|
│
|
|
│ Build prompt based on content type:
|
|
│ - If IMSURL present → "optimize existing page" (scrape it, analyze, outline improvements)
|
|
│ - If IMSURL empty → "new content" (competitor research, outline from scratch)
|
|
│ - If Cora .xlsx found → "use this Cora report for keyword targets and entities"
|
|
│ - If CLIFlags contains "service" → includes service page template
|
|
│
|
|
│ Claude Code runs: web searches, scrapes competitors, reads Cora report
|
|
│ Generates outline with entity recommendations
|
|
│
|
|
▼
|
|
SAVE OUTLINE
|
|
│ Path: Z:/content-outlines/{keyword-slug}/outline.md
|
|
│ Local DB: clickup:task:{id}:state = {state: "outline_review", outline_path: "..."}
|
|
│
|
|
▼
|
|
SET STATUS → "outline review"
|
|
│ ClickUp: comment "Outline ready for review"
|
|
│
|
|
│ ★ BUG: .xlsx NOT moved to processed/ (link building watcher moves files)
|
|
│
|
|
▼
|
|
WAITING FOR HUMAN
|
|
│ Human opens outline at Z:/content-outlines/{slug}/outline.md
|
|
│ Human edits/approves
|
|
│ Human moves ClickUp task to "outline approved"
|
|
|
|
|
|
STAGE C: CLICKUP POLL TRIGGERS PHASE 2
|
|
═══════════════════════════════════════
|
|
|
|
CLICKUP POLL (every 20 min)
|
|
│
|
|
├─ Finds task with status = "outline approved" (in poll_statuses list)
|
|
│
|
|
├─ Check local DB: clickup:task:{id}:state
|
|
│ Sees state = "outline_review" → this means Phase 2 is ready
|
|
│ ★ BUG: If DB was wiped, no entry → runs Phase 1 AGAIN, overwrites outline
|
|
│
|
|
▼
|
|
PHASE 2: Write Full Content (execution brain - Claude Code CLI)
|
|
│
|
|
│ Reads outline from path stored in local DB (outline_path)
|
|
│ ★ BUG: If outline file was deleted → Phase 2 fails every time, no recovery
|
|
│
|
|
│ Claude Code writes full content using the approved outline
|
|
│ Includes entity optimization, keyword density targets from Cora
|
|
│
|
|
▼
|
|
SAVE FINAL CONTENT
|
|
│ Path: Z:/content-outlines/{keyword-slug}/final-content.md
|
|
│ Local DB: state = "completed"
|
|
│
|
|
▼
|
|
SET STATUS → "internal review"
|
|
│ ClickUp: comment with content path
|
|
│
|
|
▼
|
|
DONE — Human reviews final content
|
|
```
|
|
|
|
### ClickUp Fields Read
|
|
- `Keyword` → target keyword, used for Cora matching and content generation
|
|
- `IMSURL` → if present = optimization, if empty = new content
|
|
- `CLIFlags` → hints like "service" for service page template
|
|
|
|
### What Can Go Wrong
|
|
- **BUG: AutoCora only checks today → Cora report never generated for overdue tasks**
|
|
- **BUG: DB wipe → Phase 2 reruns Phase 1, destroys approved outline**
|
|
- **BUG: Outline file deleted → Phase 2 permanently fails**
|
|
- **BUG: No "automation underway" set during Phase 1 from watcher**
|
|
- **BUG: .xlsx not moved to processed/**
|
|
- Network share offline → can't save outline or read it back
|
|
|
|
---
|
|
|
|
## Pipeline 4: ON PAGE OPTIMIZATION
|
|
|
|
**Work Category:** "On Page Optimization"
|
|
**auto_execute:** FALSE
|
|
**Tool:** `create_content` (same as Content Creation)
|
|
|
|
### Flow
|
|
|
|
Identical to Content Creation except:
|
|
- Phase 1 prompt says "optimize existing page at {IMSURL}" instead of "create new content"
|
|
- Phase 1 scrapes the existing page first, then builds optimization outline
|
|
- IMSURL is always present (it's the page being optimized)
|
|
|
|
Same bugs apply.
|
|
|
|
---
|
|
|
|
## The Local DB (KV Store) — What It Tracks
|
|
|
|
| Key Pattern | What It Stores | Read By | Actually Needed? |
|
|
|---|---|---|---|
|
|
| `clickup:task:{id}:state` | Full task execution state (status, timestamps, outline_path, errors) | ClickUp poll dedup check, Phase 2 detection | **PARTIALLY** — outline_path is needed for Phase 2, but dedup could use ClickUp status instead |
|
|
| `autocora:job:{keyword}` | Job submission tracking (job_id, status, task_ids) | AutoCora result poller | **YES** — maps keyword to job_id for result file lookup |
|
|
| `linkbuilding:watched:{filename}` | File processing state (processing/completed/failed/unmatched/blocked) | Folder watcher scan | **YES** — prevents re-processing files |
|
|
| `content:watched:{filename}` | Same as above for content files | Content watcher scan | **YES** — prevents re-processing |
|
|
| `pipeline:status` | Current step text for UI ("Step 2/4: Judging...") | Gradio UI polling | **NO** — just a display string, could be in-memory |
|
|
| `linkbuilding:status` | Same for link building UI | Gradio UI polling | **NO** — same |
|
|
| `system:loop:*:last_run` (x6) | Timestamp of last loop run | Dashboard API | **NO** — informational only, never used in logic |
|
|
|
|
---
|
|
|
|
## Summary of All Bugs
|
|
|
|
| # | Bug | Severity | Pipelines Affected |
|
|
|---|-----|----------|-------------------|
|
|
| 1 | AutoCora only submits for today's due date | HIGH | Link Building, Content, OPT |
|
|
| 2 | DB wipe → Phase 2 reruns Phase 1 | HIGH | Content, OPT |
|
|
| 3 | Stuck "executing" after crash, no recovery | HIGH | All 4 |
|
|
| 4 | Content watcher missing "automation underway" | MEDIUM | Content, OPT |
|
|
| 5 | Content watcher doesn't move .xlsx to processed/ | MEDIUM | Content, OPT |
|
|
| 6 | KV written before ClickUp API → out of sync | MEDIUM | All 4 |
|
|
| 7 | Silent attachment upload failures | MEDIUM | Press Release |
|
|
| 8 | Phase 2 fails permanently if outline file gone | LOW | Content, OPT |
|