CheddahBot/cora-link.md

288 lines
15 KiB
Markdown

# Link Building Agent Plan
## Context
CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (`E:/dev/Big-Link-Man/`). The current workflow is manual: run Cora on another machine → get .xlsx → manually run `main.py ingest-cora` → manually run `main.py generate-batch`. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.).
## Decisions Made
- **Watch folder**: `Z:/cora-inbox` (network drive, Cora machine accessible)
- **File→task matching**: Fuzzy match .xlsx filename stem against ClickUp task's `Keyword` custom field
- **New ClickUp field "LB Method"**: Dropdown with initial option "Cora Backlinks" (more added later)
- **Dashboard**: API endpoint + NotificationBus events only (no frontend work — separate project)
- **Sidecar files**: Not needed — all metadata comes from the matching ClickUp task
- **Tool naming**: Orchestrator pattern — `run_link_building` is a thin dispatcher that reads `LB Method` and routes to the specific pipeline tool (e.g., `run_cora_backlinks`). Future link building methods get their own tools and slot into the orchestrator.
## Files to Create
### 1. `cheddahbot/tools/linkbuilding.py` — Main tool module
Four `@tool`-decorated functions + private helpers:
**`run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- **Orchestrator/dispatcher** — reads `lb_method` (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool
- If `lb_method` is "Cora Backlinks" or empty (default): calls `run_cora_backlinks()`
- Future: if `lb_method` is "MCP Link Building": calls `run_mcp_link_building()` (not yet implemented)
- Passes all other args through to the sub-tool
- This is what the ClickUp skill_map always routes to
**`run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- The actual Cora pipeline — runs ingest-cora → generate-batch
- Step 1: Build CLI args, call `_run_blm_command(["ingest-cora", ...])`, parse stdout for job file path
- Step 2: Call `_run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])`
- Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern)
- Returns `## ClickUp Sync` in output to signal scheduler that sync was handled internally
- Can also be called directly from chat for explicit Cora runs
**`blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- Standalone ingest — runs ingest-cora only, returns project ID and job file path
- For cases where user wants to ingest but not generate yet
**`blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)`**
- Standalone generate — runs generate-batch only on an existing job file
- For re-running generation or running a manually-created job
**Private helpers:**
- `_run_blm_command(args, timeout=1800)` — subprocess wrapper, runs `uv run python main.py <args>` from BLM_DIR, injects `-u`/`-p` from `BLM_USERNAME`/`BLM_PASSWORD` env vars
- `_parse_ingest_output(stdout)` — regex extract project_id + job_file path
- `_parse_generate_output(stdout)` — extract completion stats
- `_build_ingest_args(...)` — construct CLI argument list from tool params
- `_set_status(ctx, message)` — write pipeline status to KV store (for UI polling)
- `_sync_clickup(ctx, task_id, step, message)` — post comment + update state
**Critical: always pass `-m` flag** to ingest-cora to prevent interactive stdin prompt from blocking the subprocess.
### 2. `skills/linkbuilding.md` — Skill file
YAML frontmatter linking to `[run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder]` tools and `[link_builder, default]` agents. Markdown body describes when to use, default flags, workflow steps.
### 3. `tests/test_linkbuilding.py` — Test suite (~40 tests)
All tests mock `subprocess.run` — never call Big-Link-Man. Categories:
- Output parser unit tests (`_parse_ingest_output`, `_parse_generate_output`)
- CLI arg builder tests (all flag combinations, missing required params)
- Full pipeline integration (happy path, ingest failure, generate failure)
- ClickUp state machine (executing → completed, executing → failed)
- Folder watcher scan logic (new files, skip processed, missing ClickUp match)
## Files to Modify
### 4. `cheddahbot/config.py` — Add LinkBuildingConfig
```python
@dataclass
class LinkBuildingConfig:
blm_dir: str = "E:/dev/Big-Link-Man"
watch_folder: str = "" # empty = disabled
watch_interval_minutes: int = 60
default_branded_plus_ratio: float = 0.7
```
Add `link_building: LinkBuildingConfig` field to `Config` dataclass. Add YAML loading block in `load_config()` (same pattern as memory/scheduler/shell). Add env var override for `BLM_DIR`.
### 5. `config.yaml` — Three additions
**New top-level section:**
```yaml
link_building:
blm_dir: "E:/dev/Big-Link-Man"
watch_folder: "Z:/cora-inbox"
watch_interval_minutes: 60
default_branded_plus_ratio: 0.7
```
**New skill_map entry under clickup:**
```yaml
"Link Building":
tool: "run_link_building"
auto_execute: false # Cora Backlinks triggered by folder watcher, not scheduler
complete_status: "complete" # Override: use "complete" instead of "internal review"
error_status: "internal review" # On failure, move to internal review
field_mapping:
lb_method: "LB Method"
project_name: "task_name"
money_site_url: "IMSURL"
custom_anchors: "CustomAnchors"
branded_plus_ratio: "BrandedPlusRatio"
cli_flags: "CLIFlags"
xlsx_path: "CoraFile"
```
**New agent:**
```yaml
- name: link_builder
display_name: Link Builder
tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory]
memory_scope: ""
```
### 6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread)
**New thread `_folder_watch_loop`** alongside existing poll, heartbeat, and ClickUp threads:
- Starts if `config.link_building.watch_folder` is non-empty
- Runs every `watch_interval_minutes` (default 60)
- `_scan_watch_folder()` globs `*.xlsx` in watch folder
- For each file, checks KV store `linkbuilding:watched:{filename}` — skip if already processed
- **Fuzzy-matches filename stem against ClickUp tasks** with `LB Method = "Cora Backlinks"` and status "to do":
- Queries ClickUp for Link Building tasks
- Compares normalized filename stem against each task's `Keyword` custom field
- If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc.
- If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task
- On match: executes `run_link_building` tool with args from the ClickUp task fields
- On completion: moves .xlsx to `Z:/cora-inbox/processed/` subfolder, updates KV state
- On failure: updates KV state with error, notifies via NotificationBus
**File handling after pipeline:**
- On success: .xlsx moved from `Z:/cora-inbox/``Z:/cora-inbox/processed/`
- On failure: .xlsx stays in `Z:/cora-inbox/` (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry)
**Also adds `scan_cora_folder` tool** (can live in linkbuilding.py):
- Chat-invocable utility for the agent to check what's in the watch folder
- Returns list of unprocessed .xlsx files with ClickUp match status
- Internal agent tool, not a dashboard concern
### 7. `cheddahbot/clickup.py` — Add field creation method
Add `create_custom_field(list_id, name, field_type, type_config=None)` method that calls `POST /list/{list_id}/field`. Used by the setup tool to auto-create custom fields across lists.
### 8. `cheddahbot/__main__.py` — Add API endpoint
Add before Gradio mount:
```python
@fastapi_app.get("/api/linkbuilding/status")
async def linkbuilding_status():
"""Return link building status for dashboard consumption."""
# Returns:
# {
# "pending_cora_runs": [
# {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"},
# ...
# ],
# "in_progress": [...], # Currently executing pipelines
# "completed": [...], # Recently completed (last 7 days)
# "failed": [...] # Failed tasks needing attention
# }
```
The `pending_cora_runs` section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's `Keyword` field and `IMSURL` (copiable URL) so the user can see exactly which Cora reports need to be run.
Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support.
No other `__main__.py` changes needed — agent wiring is automatic from config.yaml.
## ClickUp Custom Fields (Auto-Created)
New custom fields to be created programmatically:
| Field | Type | Purpose |
|-------|------|---------|
| `LB Method` | Dropdown | Link building subtype. Initial option: "Cora Backlinks" |
| `Keyword` | Short Text | Target keyword (used for file matching) |
| `CoraFile` | Short Text | Path to .xlsx file (optional, set by agent after file match) |
| `CustomAnchors` | Short Text | Comma-separated anchor text overrides |
| `BrandedPlusRatio` | Short Text | Override for `-bp` flag (e.g., "0.7") |
| `CLIFlags` | Short Text | Raw additional CLI flags (e.g., "-r 5 -t 0.3") |
Fields that already exist and will be reused: `Client`, `IMSURL`, `Work Category` (add "Link Building" option).
### Auto-creation approach
- Add `create_custom_field(list_id, name, type, type_config=None)` method to `cheddahbot/clickup.py` — calls `POST /list/{list_id}/field`
- Add a `setup_linkbuilding_fields` tool (category="linkbuilding") that:
1. Gets all list IDs in the space
2. For each list, checks if fields already exist (via `get_custom_fields`)
3. Creates missing fields via the new API method
4. For `LB Method` dropdown, creates with `type_config` containing "Cora Backlinks" option
5. For `Work Category`, adds "Link Building" option if missing
- This tool runs once during initial setup, or can be re-run if new lists are added
- Also add "Link Building" as an option to the existing `Work Category` dropdown if not present
## Data Flow & Status Lifecycle
### Primary Trigger: Folder Watcher (Cora Backlinks)
The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora.
```
1. ClickUp task created:
Work Category="Link Building", LB Method="Cora Backlinks", status="to do"
Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc.
→ Appears on dashboard as "needs Cora run"
2. User runs Cora manually, drops .xlsx in Z:/cora-inbox
3. Folder watcher (_scan_watch_folder, runs every 60 min):
→ Finds precision-cnc-machining.xlsx
→ Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks
→ Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.)
→ Sets CoraFile field on the ClickUp task to the file path
→ Moves task to "in progress"
→ Posts comment: "Starting Cora Backlinks pipeline..."
4. Pipeline runs:
→ Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json"
→ Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers."
5. On success:
→ Move task to "complete"
→ Post summary comment with stats
→ Move .xlsx to Z:/cora-inbox/processed/
6. On failure:
→ Move task to "internal review"
→ Post error comment with details
→ .xlsx stays in Z:/cora-inbox (can retry)
```
### Secondary Trigger: Chat
```
User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx"
→ Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method)
→ Tool auto-looks up matching ClickUp task via Keyword field (if exists)
→ Same pipeline + ClickUp sync as above
→ If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only
```
### Future Trigger: ClickUp Scheduler (other LB Methods)
Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The `run_link_building` orchestrator checks `lb_method`:
- "Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these)
- Future methods → can execute directly from ClickUp task data
### ClickUp Skill Map Note
The skill_map entry for "Link Building" exists primarily for **field mapping reference** (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but `run_link_building` will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher.
## Implementation Order
1. **Config** — Add `LinkBuildingConfig` to config.py, add `link_building:` section to config.yaml, add `link_builder` agent to config.yaml
2. **Core tools** — Create `cheddahbot/tools/linkbuilding.py` with `_run_blm_command`, parsers, `run_link_building` orchestrator, and `run_cora_backlinks` pipeline
3. **Standalone tools** — Add `blm_ingest_cora` and `blm_generate_batch`
4. **Tests** — Create `tests/test_linkbuilding.py`, verify with `uv run pytest tests/test_linkbuilding.py -v`
5. **ClickUp field creation** — Add `create_custom_field` to clickup.py, add `setup_linkbuilding_fields` tool
6. **ClickUp integration** — Add skill_map entry, add ClickUp state tracking to tools
7. **Folder watcher** — Add `_folder_watch_loop` to scheduler.py, add `scan_cora_folder` tool
8. **API endpoint** — Add `/api/linkbuilding/status` to `__main__.py`
9. **Skill file** — Create `skills/linkbuilding.md`
10. **ClickUp setup** — Run `setup_linkbuilding_fields` to auto-create custom fields across all lists
11. **Full test run**`uv run pytest -v --no-cov`
## Verification
1. **Unit tests**: `uv run pytest tests/test_linkbuilding.py -v` — all pass with mocked subprocess
2. **Full suite**: `uv run pytest -v --no-cov` — no regressions
3. **Lint**: `uv run ruff check .` + `uv run ruff format .`
4. **Manual e2e**: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs
5. **ClickUp e2e**: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution
6. **Chat e2e**: Ask CheddahBot to "run link building for [keyword]" via chat UI
7. **API check**: Hit `http://localhost:7860/api/linkbuilding/status` and verify data returned
## Key Reference Files
- `cheddahbot/tools/press_release.py` — Reference pattern for multi-step pipeline tool
- `cheddahbot/scheduler.py:55-76` — Where to add 4th daemon thread
- `cheddahbot/config.py:108-200` — load_config() pattern for new config sections
- `E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md` — Full CLI reference
- `E:/dev/Big-Link-Man/src/cli/commands.py` — Exact output formats to parse