# Link Building Agent Plan

## Context

CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (`E:/dev/Big-Link-Man/`). The current workflow is manual: run Cora on another machine → get .xlsx → manually run `main.py ingest-cora` → manually run `main.py generate-batch`. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.).

## Decisions Made

- **Watch folder**: `Z:/cora-inbox` (network drive, Cora machine accessible)
- **File→task matching**: Fuzzy match .xlsx filename stem against ClickUp task's `Keyword` custom field
- **New ClickUp field "LB Method"**: Dropdown with initial option "Cora Backlinks" (more added later)
- **Dashboard**: API endpoint + NotificationBus events only (no frontend work — separate project)
- **Sidecar files**: Not needed — all metadata comes from the matching ClickUp task
- **Tool naming**: Orchestrator pattern — `run_link_building` is a thin dispatcher that reads `LB Method` and routes to the specific pipeline tool (e.g., `run_cora_backlinks`). Future link building methods get their own tools and slot into the orchestrator.

## Files to Create

### 1. `cheddahbot/tools/linkbuilding.py` — Main tool module

Four `@tool`-decorated functions + private helpers:

**`run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- **Orchestrator/dispatcher** — reads `lb_method` (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool
- If `lb_method` is "Cora Backlinks" or empty (default): calls `run_cora_backlinks()`
- Future: if `lb_method` is "MCP Link Building": calls `run_mcp_link_building()` (not yet implemented)
- Passes all other args through to the sub-tool
- This is what the ClickUp skill_map always routes to

**`run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- The actual Cora pipeline — runs ingest-cora → generate-batch
- Step 1: Build CLI args, call `_run_blm_command(["ingest-cora", ...])`, parse stdout for job file path
- Step 2: Call `_run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])`
- Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern)
- Returns `## ClickUp Sync` in output to signal scheduler that sync was handled internally
- Can also be called directly from chat for explicit Cora runs

**`blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- Standalone ingest — runs ingest-cora only, returns project ID and job file path
- For cases where user wants to ingest but not generate yet

**`blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)`**
- Standalone generate — runs generate-batch only on an existing job file
- For re-running generation or running a manually-created job

**Private helpers:**
- `_run_blm_command(args, timeout=1800)` — subprocess wrapper, runs `uv run python main.py <args>` from BLM_DIR, injects `-u`/`-p` from `BLM_USERNAME`/`BLM_PASSWORD` env vars
- `_parse_ingest_output(stdout)` — regex extract project_id + job_file path
- `_parse_generate_output(stdout)` — extract completion stats
- `_build_ingest_args(...)` — construct CLI argument list from tool params
- `_set_status(ctx, message)` — write pipeline status to KV store (for UI polling)
- `_sync_clickup(ctx, task_id, step, message)` — post comment + update state

**Critical: always pass `-m` flag** to ingest-cora to prevent interactive stdin prompt from blocking the subprocess.

### 2. `skills/linkbuilding.md` — Skill file

YAML frontmatter linking to `[run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder]` tools and `[link_builder, default]` agents. Markdown body describes when to use, default flags, workflow steps.

### 3. `tests/test_linkbuilding.py` — Test suite (~40 tests)

All tests mock `subprocess.run` — never call Big-Link-Man. Categories:
- Output parser unit tests (`_parse_ingest_output`, `_parse_generate_output`)
- CLI arg builder tests (all flag combinations, missing required params)
- Full pipeline integration (happy path, ingest failure, generate failure)
- ClickUp state machine (executing → completed, executing → failed)
- Folder watcher scan logic (new files, skip processed, missing ClickUp match)

## Files to Modify

### 4. `cheddahbot/config.py` — Add LinkBuildingConfig

```python
@dataclass
class LinkBuildingConfig:
    blm_dir: str = "E:/dev/Big-Link-Man"
    watch_folder: str = ""                    # empty = disabled
    watch_interval_minutes: int = 60
    default_branded_plus_ratio: float = 0.7
```

Add `link_building: LinkBuildingConfig` field to `Config` dataclass. Add YAML loading block in `load_config()` (same pattern as memory/scheduler/shell). Add env var override for `BLM_DIR`.

### 5. `config.yaml` — Three additions

**New top-level section:**
```yaml
link_building:
  blm_dir: "E:/dev/Big-Link-Man"
  watch_folder: "Z:/cora-inbox"
  watch_interval_minutes: 60
  default_branded_plus_ratio: 0.7
```

**New skill_map entry under clickup:**
```yaml
"Link Building":
  tool: "run_link_building"
  auto_execute: false           # Cora Backlinks triggered by folder watcher, not scheduler
  complete_status: "complete"   # Override: use "complete" instead of "internal review"
  error_status: "internal review"  # On failure, move to internal review
  field_mapping:
    lb_method: "LB Method"
    project_name: "task_name"
    money_site_url: "IMSURL"
    custom_anchors: "CustomAnchors"
    branded_plus_ratio: "BrandedPlusRatio"
    cli_flags: "CLIFlags"
    xlsx_path: "CoraFile"
```

**New agent:**
```yaml
- name: link_builder
  display_name: Link Builder
  tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory]
  memory_scope: ""
```

### 6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread)

**New thread `_folder_watch_loop`** alongside existing poll, heartbeat, and ClickUp threads:
- Starts if `config.link_building.watch_folder` is non-empty
- Runs every `watch_interval_minutes` (default 60)
- `_scan_watch_folder()` globs `*.xlsx` in watch folder
- For each file, checks KV store `linkbuilding:watched:{filename}` — skip if already processed
- **Fuzzy-matches filename stem against ClickUp tasks** with `LB Method = "Cora Backlinks"` and status "to do":
  - Queries ClickUp for Link Building tasks
  - Compares normalized filename stem against each task's `Keyword` custom field
  - If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc.
  - If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task
- On match: executes `run_link_building` tool with args from the ClickUp task fields
- On completion: moves .xlsx to `Z:/cora-inbox/processed/` subfolder, updates KV state
- On failure: updates KV state with error, notifies via NotificationBus

**File handling after pipeline:**
- On success: .xlsx moved from `Z:/cora-inbox/` → `Z:/cora-inbox/processed/`
- On failure: .xlsx stays in `Z:/cora-inbox/` (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry)

**Also adds `scan_cora_folder` tool** (can live in linkbuilding.py):
- Chat-invocable utility for the agent to check what's in the watch folder
- Returns list of unprocessed .xlsx files with ClickUp match status
- Internal agent tool, not a dashboard concern

### 7. `cheddahbot/clickup.py` — Add field creation method

Add `create_custom_field(list_id, name, field_type, type_config=None)` method that calls `POST /list/{list_id}/field`. Used by the setup tool to auto-create custom fields across lists.

### 8. `cheddahbot/__main__.py` — Add API endpoint

Add before Gradio mount:
```python
@fastapi_app.get("/api/linkbuilding/status")
async def linkbuilding_status():
    """Return link building status for dashboard consumption."""
    # Returns:
    # {
    #   "pending_cora_runs": [
    #     {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"},
    #     ...
    #   ],
    #   "in_progress": [...],     # Currently executing pipelines
    #   "completed": [...],       # Recently completed (last 7 days)
    #   "failed": [...]           # Failed tasks needing attention
    # }
```

The `pending_cora_runs` section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's `Keyword` field and `IMSURL` (copiable URL) so the user can see exactly which Cora reports need to be run.

Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support.

No other `__main__.py` changes needed — agent wiring is automatic from config.yaml.

## ClickUp Custom Fields (Auto-Created)

New custom fields to be created programmatically:

| Field | Type | Purpose |
|-------|------|---------|
| `LB Method` | Dropdown | Link building subtype. Initial option: "Cora Backlinks" |
| `Keyword` | Short Text | Target keyword (used for file matching) |
| `CoraFile` | Short Text | Path to .xlsx file (optional, set by agent after file match) |
| `CustomAnchors` | Short Text | Comma-separated anchor text overrides |
| `BrandedPlusRatio` | Short Text | Override for `-bp` flag (e.g., "0.7") |
| `CLIFlags` | Short Text | Raw additional CLI flags (e.g., "-r 5 -t 0.3") |

Fields that already exist and will be reused: `Client`, `IMSURL`, `Work Category` (add "Link Building" option).

### Auto-creation approach

- Add `create_custom_field(list_id, name, type, type_config=None)` method to `cheddahbot/clickup.py` — calls `POST /list/{list_id}/field`
- Add a `setup_linkbuilding_fields` tool (category="linkbuilding") that:
  1. Gets all list IDs in the space
  2. For each list, checks if fields already exist (via `get_custom_fields`)
  3. Creates missing fields via the new API method
  4. For `LB Method` dropdown, creates with `type_config` containing "Cora Backlinks" option
  5. For `Work Category`, adds "Link Building" option if missing
- This tool runs once during initial setup, or can be re-run if new lists are added
- Also add "Link Building" as an option to the existing `Work Category` dropdown if not present

## Data Flow & Status Lifecycle

### Primary Trigger: Folder Watcher (Cora Backlinks)

The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora.

```
1. ClickUp task created:
   Work Category="Link Building", LB Method="Cora Backlinks", status="to do"
   Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc.
   → Appears on dashboard as "needs Cora run"

2. User runs Cora manually, drops .xlsx in Z:/cora-inbox

3. Folder watcher (_scan_watch_folder, runs every 60 min):
   → Finds precision-cnc-machining.xlsx
   → Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks
   → Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.)
   → Sets CoraFile field on the ClickUp task to the file path
   → Moves task to "in progress"
   → Posts comment: "Starting Cora Backlinks pipeline..."

4. Pipeline runs:
   → Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json"
   → Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers."

5. On success:
   → Move task to "complete"
   → Post summary comment with stats
   → Move .xlsx to Z:/cora-inbox/processed/

6. On failure:
   → Move task to "internal review"
   → Post error comment with details
   → .xlsx stays in Z:/cora-inbox (can retry)
```

### Secondary Trigger: Chat

```
User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx"
  → Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method)
  → Tool auto-looks up matching ClickUp task via Keyword field (if exists)
  → Same pipeline + ClickUp sync as above
  → If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only
```

### Future Trigger: ClickUp Scheduler (other LB Methods)

Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The `run_link_building` orchestrator checks `lb_method`:
- "Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these)
- Future methods → can execute directly from ClickUp task data

### ClickUp Skill Map Note

The skill_map entry for "Link Building" exists primarily for **field mapping reference** (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but `run_link_building` will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher.

## Implementation Order

1. **Config** — Add `LinkBuildingConfig` to config.py, add `link_building:` section to config.yaml, add `link_builder` agent to config.yaml
2. **Core tools** — Create `cheddahbot/tools/linkbuilding.py` with `_run_blm_command`, parsers, `run_link_building` orchestrator, and `run_cora_backlinks` pipeline
3. **Standalone tools** — Add `blm_ingest_cora` and `blm_generate_batch`
4. **Tests** — Create `tests/test_linkbuilding.py`, verify with `uv run pytest tests/test_linkbuilding.py -v`
5. **ClickUp field creation** — Add `create_custom_field` to clickup.py, add `setup_linkbuilding_fields` tool
6. **ClickUp integration** — Add skill_map entry, add ClickUp state tracking to tools
7. **Folder watcher** — Add `_folder_watch_loop` to scheduler.py, add `scan_cora_folder` tool
8. **API endpoint** — Add `/api/linkbuilding/status` to `__main__.py`
9. **Skill file** — Create `skills/linkbuilding.md`
10. **ClickUp setup** — Run `setup_linkbuilding_fields` to auto-create custom fields across all lists
11. **Full test run** — `uv run pytest -v --no-cov`

## Verification

1. **Unit tests**: `uv run pytest tests/test_linkbuilding.py -v` — all pass with mocked subprocess
2. **Full suite**: `uv run pytest -v --no-cov` — no regressions
3. **Lint**: `uv run ruff check .` + `uv run ruff format .`
4. **Manual e2e**: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs
5. **ClickUp e2e**: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution
6. **Chat e2e**: Ask CheddahBot to "run link building for [keyword]" via chat UI
7. **API check**: Hit `http://localhost:7860/api/linkbuilding/status` and verify data returned

## Key Reference Files

- `cheddahbot/tools/press_release.py` — Reference pattern for multi-step pipeline tool
- `cheddahbot/scheduler.py:55-76` — Where to add 4th daemon thread
- `cheddahbot/config.py:108-200` — load_config() pattern for new config sections
- `E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md` — Full CLI reference
- `E:/dev/Big-Link-Man/src/cli/commands.py` — Exact output formats to parse