# Link Building Agent Plan ## Context CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (`E:/dev/Big-Link-Man/`). The current workflow is manual: run Cora on another machine → get .xlsx → manually run `main.py ingest-cora` → manually run `main.py generate-batch`. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.). ## Decisions Made - **Watch folder**: `Z:/cora-inbox` (network drive, Cora machine accessible) - **File→task matching**: Fuzzy match .xlsx filename stem against ClickUp task's `Keyword` custom field - **New ClickUp field "LB Method"**: Dropdown with initial option "Cora Backlinks" (more added later) - **Dashboard**: API endpoint + NotificationBus events only (no frontend work — separate project) - **Sidecar files**: Not needed — all metadata comes from the matching ClickUp task - **Tool naming**: Orchestrator pattern — `run_link_building` is a thin dispatcher that reads `LB Method` and routes to the specific pipeline tool (e.g., `run_cora_backlinks`). Future link building methods get their own tools and slot into the orchestrator. ## Files to Create ### 1. `cheddahbot/tools/linkbuilding.py` — Main tool module Four `@tool`-decorated functions + private helpers: **`run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`** - **Orchestrator/dispatcher** — reads `lb_method` (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool - If `lb_method` is "Cora Backlinks" or empty (default): calls `run_cora_backlinks()` - Future: if `lb_method` is "MCP Link Building": calls `run_mcp_link_building()` (not yet implemented) - Passes all other args through to the sub-tool - This is what the ClickUp skill_map always routes to **`run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`** - The actual Cora pipeline — runs ingest-cora → generate-batch - Step 1: Build CLI args, call `_run_blm_command(["ingest-cora", ...])`, parse stdout for job file path - Step 2: Call `_run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])` - Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern) - Returns `## ClickUp Sync` in output to signal scheduler that sync was handled internally - Can also be called directly from chat for explicit Cora runs **`blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`** - Standalone ingest — runs ingest-cora only, returns project ID and job file path - For cases where user wants to ingest but not generate yet **`blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)`** - Standalone generate — runs generate-batch only on an existing job file - For re-running generation or running a manually-created job **Private helpers:** - `_run_blm_command(args, timeout=1800)` — subprocess wrapper, runs `uv run python main.py ` from BLM_DIR, injects `-u`/`-p` from `BLM_USERNAME`/`BLM_PASSWORD` env vars - `_parse_ingest_output(stdout)` — regex extract project_id + job_file path - `_parse_generate_output(stdout)` — extract completion stats - `_build_ingest_args(...)` — construct CLI argument list from tool params - `_set_status(ctx, message)` — write pipeline status to KV store (for UI polling) - `_sync_clickup(ctx, task_id, step, message)` — post comment + update state **Critical: always pass `-m` flag** to ingest-cora to prevent interactive stdin prompt from blocking the subprocess. ### 2. `skills/linkbuilding.md` — Skill file YAML frontmatter linking to `[run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder]` tools and `[link_builder, default]` agents. Markdown body describes when to use, default flags, workflow steps. ### 3. `tests/test_linkbuilding.py` — Test suite (~40 tests) All tests mock `subprocess.run` — never call Big-Link-Man. Categories: - Output parser unit tests (`_parse_ingest_output`, `_parse_generate_output`) - CLI arg builder tests (all flag combinations, missing required params) - Full pipeline integration (happy path, ingest failure, generate failure) - ClickUp state machine (executing → completed, executing → failed) - Folder watcher scan logic (new files, skip processed, missing ClickUp match) ## Files to Modify ### 4. `cheddahbot/config.py` — Add LinkBuildingConfig ```python @dataclass class LinkBuildingConfig: blm_dir: str = "E:/dev/Big-Link-Man" watch_folder: str = "" # empty = disabled watch_interval_minutes: int = 60 default_branded_plus_ratio: float = 0.7 ``` Add `link_building: LinkBuildingConfig` field to `Config` dataclass. Add YAML loading block in `load_config()` (same pattern as memory/scheduler/shell). Add env var override for `BLM_DIR`. ### 5. `config.yaml` — Three additions **New top-level section:** ```yaml link_building: blm_dir: "E:/dev/Big-Link-Man" watch_folder: "Z:/cora-inbox" watch_interval_minutes: 60 default_branded_plus_ratio: 0.7 ``` **New skill_map entry under clickup:** ```yaml "Link Building": tool: "run_link_building" auto_execute: false # Cora Backlinks triggered by folder watcher, not scheduler complete_status: "complete" # Override: use "complete" instead of "internal review" error_status: "internal review" # On failure, move to internal review field_mapping: lb_method: "LB Method" project_name: "task_name" money_site_url: "IMSURL" custom_anchors: "CustomAnchors" branded_plus_ratio: "BrandedPlusRatio" cli_flags: "CLIFlags" xlsx_path: "CoraFile" ``` **New agent:** ```yaml - name: link_builder display_name: Link Builder tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory] memory_scope: "" ``` ### 6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread) **New thread `_folder_watch_loop`** alongside existing poll, heartbeat, and ClickUp threads: - Starts if `config.link_building.watch_folder` is non-empty - Runs every `watch_interval_minutes` (default 60) - `_scan_watch_folder()` globs `*.xlsx` in watch folder - For each file, checks KV store `linkbuilding:watched:{filename}` — skip if already processed - **Fuzzy-matches filename stem against ClickUp tasks** with `LB Method = "Cora Backlinks"` and status "to do": - Queries ClickUp for Link Building tasks - Compares normalized filename stem against each task's `Keyword` custom field - If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc. - If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task - On match: executes `run_link_building` tool with args from the ClickUp task fields - On completion: moves .xlsx to `Z:/cora-inbox/processed/` subfolder, updates KV state - On failure: updates KV state with error, notifies via NotificationBus **File handling after pipeline:** - On success: .xlsx moved from `Z:/cora-inbox/` → `Z:/cora-inbox/processed/` - On failure: .xlsx stays in `Z:/cora-inbox/` (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry) **Also adds `scan_cora_folder` tool** (can live in linkbuilding.py): - Chat-invocable utility for the agent to check what's in the watch folder - Returns list of unprocessed .xlsx files with ClickUp match status - Internal agent tool, not a dashboard concern ### 7. `cheddahbot/clickup.py` — Add field creation method Add `create_custom_field(list_id, name, field_type, type_config=None)` method that calls `POST /list/{list_id}/field`. Used by the setup tool to auto-create custom fields across lists. ### 8. `cheddahbot/__main__.py` — Add API endpoint Add before Gradio mount: ```python @fastapi_app.get("/api/linkbuilding/status") async def linkbuilding_status(): """Return link building status for dashboard consumption.""" # Returns: # { # "pending_cora_runs": [ # {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"}, # ... # ], # "in_progress": [...], # Currently executing pipelines # "completed": [...], # Recently completed (last 7 days) # "failed": [...] # Failed tasks needing attention # } ``` The `pending_cora_runs` section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's `Keyword` field and `IMSURL` (copiable URL) so the user can see exactly which Cora reports need to be run. Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support. No other `__main__.py` changes needed — agent wiring is automatic from config.yaml. ## ClickUp Custom Fields (Auto-Created) New custom fields to be created programmatically: | Field | Type | Purpose | |-------|------|---------| | `LB Method` | Dropdown | Link building subtype. Initial option: "Cora Backlinks" | | `Keyword` | Short Text | Target keyword (used for file matching) | | `CoraFile` | Short Text | Path to .xlsx file (optional, set by agent after file match) | | `CustomAnchors` | Short Text | Comma-separated anchor text overrides | | `BrandedPlusRatio` | Short Text | Override for `-bp` flag (e.g., "0.7") | | `CLIFlags` | Short Text | Raw additional CLI flags (e.g., "-r 5 -t 0.3") | Fields that already exist and will be reused: `Client`, `IMSURL`, `Work Category` (add "Link Building" option). ### Auto-creation approach - Add `create_custom_field(list_id, name, type, type_config=None)` method to `cheddahbot/clickup.py` — calls `POST /list/{list_id}/field` - Add a `setup_linkbuilding_fields` tool (category="linkbuilding") that: 1. Gets all list IDs in the space 2. For each list, checks if fields already exist (via `get_custom_fields`) 3. Creates missing fields via the new API method 4. For `LB Method` dropdown, creates with `type_config` containing "Cora Backlinks" option 5. For `Work Category`, adds "Link Building" option if missing - This tool runs once during initial setup, or can be re-run if new lists are added - Also add "Link Building" as an option to the existing `Work Category` dropdown if not present ## Data Flow & Status Lifecycle ### Primary Trigger: Folder Watcher (Cora Backlinks) The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora. ``` 1. ClickUp task created: Work Category="Link Building", LB Method="Cora Backlinks", status="to do" Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc. → Appears on dashboard as "needs Cora run" 2. User runs Cora manually, drops .xlsx in Z:/cora-inbox 3. Folder watcher (_scan_watch_folder, runs every 60 min): → Finds precision-cnc-machining.xlsx → Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks → Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.) → Sets CoraFile field on the ClickUp task to the file path → Moves task to "in progress" → Posts comment: "Starting Cora Backlinks pipeline..." 4. Pipeline runs: → Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json" → Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers." 5. On success: → Move task to "complete" → Post summary comment with stats → Move .xlsx to Z:/cora-inbox/processed/ 6. On failure: → Move task to "internal review" → Post error comment with details → .xlsx stays in Z:/cora-inbox (can retry) ``` ### Secondary Trigger: Chat ``` User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx" → Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method) → Tool auto-looks up matching ClickUp task via Keyword field (if exists) → Same pipeline + ClickUp sync as above → If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only ``` ### Future Trigger: ClickUp Scheduler (other LB Methods) Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The `run_link_building` orchestrator checks `lb_method`: - "Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these) - Future methods → can execute directly from ClickUp task data ### ClickUp Skill Map Note The skill_map entry for "Link Building" exists primarily for **field mapping reference** (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but `run_link_building` will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher. ## Implementation Order 1. **Config** — Add `LinkBuildingConfig` to config.py, add `link_building:` section to config.yaml, add `link_builder` agent to config.yaml 2. **Core tools** — Create `cheddahbot/tools/linkbuilding.py` with `_run_blm_command`, parsers, `run_link_building` orchestrator, and `run_cora_backlinks` pipeline 3. **Standalone tools** — Add `blm_ingest_cora` and `blm_generate_batch` 4. **Tests** — Create `tests/test_linkbuilding.py`, verify with `uv run pytest tests/test_linkbuilding.py -v` 5. **ClickUp field creation** — Add `create_custom_field` to clickup.py, add `setup_linkbuilding_fields` tool 6. **ClickUp integration** — Add skill_map entry, add ClickUp state tracking to tools 7. **Folder watcher** — Add `_folder_watch_loop` to scheduler.py, add `scan_cora_folder` tool 8. **API endpoint** — Add `/api/linkbuilding/status` to `__main__.py` 9. **Skill file** — Create `skills/linkbuilding.md` 10. **ClickUp setup** — Run `setup_linkbuilding_fields` to auto-create custom fields across all lists 11. **Full test run** — `uv run pytest -v --no-cov` ## Verification 1. **Unit tests**: `uv run pytest tests/test_linkbuilding.py -v` — all pass with mocked subprocess 2. **Full suite**: `uv run pytest -v --no-cov` — no regressions 3. **Lint**: `uv run ruff check .` + `uv run ruff format .` 4. **Manual e2e**: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs 5. **ClickUp e2e**: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution 6. **Chat e2e**: Ask CheddahBot to "run link building for [keyword]" via chat UI 7. **API check**: Hit `http://localhost:7860/api/linkbuilding/status` and verify data returned ## Key Reference Files - `cheddahbot/tools/press_release.py` — Reference pattern for multi-step pipeline tool - `cheddahbot/scheduler.py:55-76` — Where to add 4th daemon thread - `cheddahbot/config.py:108-200` — load_config() pattern for new config sections - `E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md` — Full CLI reference - `E:/dev/Big-Link-Man/src/cli/commands.py` — Exact output formats to parse