288 lines
15 KiB
Markdown
288 lines
15 KiB
Markdown
# Link Building Agent Plan
|
|
|
|
## Context
|
|
|
|
CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (`E:/dev/Big-Link-Man/`). The current workflow is manual: run Cora on another machine → get .xlsx → manually run `main.py ingest-cora` → manually run `main.py generate-batch`. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.).
|
|
|
|
## Decisions Made
|
|
|
|
- **Watch folder**: `Z:/cora-inbox` (network drive, Cora machine accessible)
|
|
- **File→task matching**: Fuzzy match .xlsx filename stem against ClickUp task's `Keyword` custom field
|
|
- **New ClickUp field "LB Method"**: Dropdown with initial option "Cora Backlinks" (more added later)
|
|
- **Dashboard**: API endpoint + NotificationBus events only (no frontend work — separate project)
|
|
- **Sidecar files**: Not needed — all metadata comes from the matching ClickUp task
|
|
- **Tool naming**: Orchestrator pattern — `run_link_building` is a thin dispatcher that reads `LB Method` and routes to the specific pipeline tool (e.g., `run_cora_backlinks`). Future link building methods get their own tools and slot into the orchestrator.
|
|
|
|
## Files to Create
|
|
|
|
### 1. `cheddahbot/tools/linkbuilding.py` — Main tool module
|
|
|
|
Four `@tool`-decorated functions + private helpers:
|
|
|
|
**`run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
|
|
- **Orchestrator/dispatcher** — reads `lb_method` (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool
|
|
- If `lb_method` is "Cora Backlinks" or empty (default): calls `run_cora_backlinks()`
|
|
- Future: if `lb_method` is "MCP Link Building": calls `run_mcp_link_building()` (not yet implemented)
|
|
- Passes all other args through to the sub-tool
|
|
- This is what the ClickUp skill_map always routes to
|
|
|
|
**`run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
|
|
- The actual Cora pipeline — runs ingest-cora → generate-batch
|
|
- Step 1: Build CLI args, call `_run_blm_command(["ingest-cora", ...])`, parse stdout for job file path
|
|
- Step 2: Call `_run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])`
|
|
- Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern)
|
|
- Returns `## ClickUp Sync` in output to signal scheduler that sync was handled internally
|
|
- Can also be called directly from chat for explicit Cora runs
|
|
|
|
**`blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
|
|
- Standalone ingest — runs ingest-cora only, returns project ID and job file path
|
|
- For cases where user wants to ingest but not generate yet
|
|
|
|
**`blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)`**
|
|
- Standalone generate — runs generate-batch only on an existing job file
|
|
- For re-running generation or running a manually-created job
|
|
|
|
**Private helpers:**
|
|
- `_run_blm_command(args, timeout=1800)` — subprocess wrapper, runs `uv run python main.py <args>` from BLM_DIR, injects `-u`/`-p` from `BLM_USERNAME`/`BLM_PASSWORD` env vars
|
|
- `_parse_ingest_output(stdout)` — regex extract project_id + job_file path
|
|
- `_parse_generate_output(stdout)` — extract completion stats
|
|
- `_build_ingest_args(...)` — construct CLI argument list from tool params
|
|
- `_set_status(ctx, message)` — write pipeline status to KV store (for UI polling)
|
|
- `_sync_clickup(ctx, task_id, step, message)` — post comment + update state
|
|
|
|
**Critical: always pass `-m` flag** to ingest-cora to prevent interactive stdin prompt from blocking the subprocess.
|
|
|
|
### 2. `skills/linkbuilding.md` — Skill file
|
|
|
|
YAML frontmatter linking to `[run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder]` tools and `[link_builder, default]` agents. Markdown body describes when to use, default flags, workflow steps.
|
|
|
|
### 3. `tests/test_linkbuilding.py` — Test suite (~40 tests)
|
|
|
|
All tests mock `subprocess.run` — never call Big-Link-Man. Categories:
|
|
- Output parser unit tests (`_parse_ingest_output`, `_parse_generate_output`)
|
|
- CLI arg builder tests (all flag combinations, missing required params)
|
|
- Full pipeline integration (happy path, ingest failure, generate failure)
|
|
- ClickUp state machine (executing → completed, executing → failed)
|
|
- Folder watcher scan logic (new files, skip processed, missing ClickUp match)
|
|
|
|
## Files to Modify
|
|
|
|
### 4. `cheddahbot/config.py` — Add LinkBuildingConfig
|
|
|
|
```python
|
|
@dataclass
|
|
class LinkBuildingConfig:
|
|
blm_dir: str = "E:/dev/Big-Link-Man"
|
|
watch_folder: str = "" # empty = disabled
|
|
watch_interval_minutes: int = 60
|
|
default_branded_plus_ratio: float = 0.7
|
|
```
|
|
|
|
Add `link_building: LinkBuildingConfig` field to `Config` dataclass. Add YAML loading block in `load_config()` (same pattern as memory/scheduler/shell). Add env var override for `BLM_DIR`.
|
|
|
|
### 5. `config.yaml` — Three additions
|
|
|
|
**New top-level section:**
|
|
```yaml
|
|
link_building:
|
|
blm_dir: "E:/dev/Big-Link-Man"
|
|
watch_folder: "Z:/cora-inbox"
|
|
watch_interval_minutes: 60
|
|
default_branded_plus_ratio: 0.7
|
|
```
|
|
|
|
**New skill_map entry under clickup:**
|
|
```yaml
|
|
"Link Building":
|
|
tool: "run_link_building"
|
|
auto_execute: false # Cora Backlinks triggered by folder watcher, not scheduler
|
|
complete_status: "complete" # Override: use "complete" instead of "internal review"
|
|
error_status: "internal review" # On failure, move to internal review
|
|
field_mapping:
|
|
lb_method: "LB Method"
|
|
project_name: "task_name"
|
|
money_site_url: "IMSURL"
|
|
custom_anchors: "CustomAnchors"
|
|
branded_plus_ratio: "BrandedPlusRatio"
|
|
cli_flags: "CLIFlags"
|
|
xlsx_path: "CoraFile"
|
|
```
|
|
|
|
**New agent:**
|
|
```yaml
|
|
- name: link_builder
|
|
display_name: Link Builder
|
|
tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory]
|
|
memory_scope: ""
|
|
```
|
|
|
|
### 6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread)
|
|
|
|
**New thread `_folder_watch_loop`** alongside existing poll, heartbeat, and ClickUp threads:
|
|
- Starts if `config.link_building.watch_folder` is non-empty
|
|
- Runs every `watch_interval_minutes` (default 60)
|
|
- `_scan_watch_folder()` globs `*.xlsx` in watch folder
|
|
- For each file, checks KV store `linkbuilding:watched:{filename}` — skip if already processed
|
|
- **Fuzzy-matches filename stem against ClickUp tasks** with `LB Method = "Cora Backlinks"` and status "to do":
|
|
- Queries ClickUp for Link Building tasks
|
|
- Compares normalized filename stem against each task's `Keyword` custom field
|
|
- If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc.
|
|
- If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task
|
|
- On match: executes `run_link_building` tool with args from the ClickUp task fields
|
|
- On completion: moves .xlsx to `Z:/cora-inbox/processed/` subfolder, updates KV state
|
|
- On failure: updates KV state with error, notifies via NotificationBus
|
|
|
|
**File handling after pipeline:**
|
|
- On success: .xlsx moved from `Z:/cora-inbox/` → `Z:/cora-inbox/processed/`
|
|
- On failure: .xlsx stays in `Z:/cora-inbox/` (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry)
|
|
|
|
**Also adds `scan_cora_folder` tool** (can live in linkbuilding.py):
|
|
- Chat-invocable utility for the agent to check what's in the watch folder
|
|
- Returns list of unprocessed .xlsx files with ClickUp match status
|
|
- Internal agent tool, not a dashboard concern
|
|
|
|
### 7. `cheddahbot/clickup.py` — Add field creation method
|
|
|
|
Add `create_custom_field(list_id, name, field_type, type_config=None)` method that calls `POST /list/{list_id}/field`. Used by the setup tool to auto-create custom fields across lists.
|
|
|
|
### 8. `cheddahbot/__main__.py` — Add API endpoint
|
|
|
|
Add before Gradio mount:
|
|
```python
|
|
@fastapi_app.get("/api/linkbuilding/status")
|
|
async def linkbuilding_status():
|
|
"""Return link building status for dashboard consumption."""
|
|
# Returns:
|
|
# {
|
|
# "pending_cora_runs": [
|
|
# {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"},
|
|
# ...
|
|
# ],
|
|
# "in_progress": [...], # Currently executing pipelines
|
|
# "completed": [...], # Recently completed (last 7 days)
|
|
# "failed": [...] # Failed tasks needing attention
|
|
# }
|
|
```
|
|
|
|
The `pending_cora_runs` section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's `Keyword` field and `IMSURL` (copiable URL) so the user can see exactly which Cora reports need to be run.
|
|
|
|
Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support.
|
|
|
|
No other `__main__.py` changes needed — agent wiring is automatic from config.yaml.
|
|
|
|
## ClickUp Custom Fields (Auto-Created)
|
|
|
|
New custom fields to be created programmatically:
|
|
|
|
| Field | Type | Purpose |
|
|
|-------|------|---------|
|
|
| `LB Method` | Dropdown | Link building subtype. Initial option: "Cora Backlinks" |
|
|
| `Keyword` | Short Text | Target keyword (used for file matching) |
|
|
| `CoraFile` | Short Text | Path to .xlsx file (optional, set by agent after file match) |
|
|
| `CustomAnchors` | Short Text | Comma-separated anchor text overrides |
|
|
| `BrandedPlusRatio` | Short Text | Override for `-bp` flag (e.g., "0.7") |
|
|
| `CLIFlags` | Short Text | Raw additional CLI flags (e.g., "-r 5 -t 0.3") |
|
|
|
|
Fields that already exist and will be reused: `Client`, `IMSURL`, `Work Category` (add "Link Building" option).
|
|
|
|
### Auto-creation approach
|
|
|
|
- Add `create_custom_field(list_id, name, type, type_config=None)` method to `cheddahbot/clickup.py` — calls `POST /list/{list_id}/field`
|
|
- Add a `setup_linkbuilding_fields` tool (category="linkbuilding") that:
|
|
1. Gets all list IDs in the space
|
|
2. For each list, checks if fields already exist (via `get_custom_fields`)
|
|
3. Creates missing fields via the new API method
|
|
4. For `LB Method` dropdown, creates with `type_config` containing "Cora Backlinks" option
|
|
5. For `Work Category`, adds "Link Building" option if missing
|
|
- This tool runs once during initial setup, or can be re-run if new lists are added
|
|
- Also add "Link Building" as an option to the existing `Work Category` dropdown if not present
|
|
|
|
## Data Flow & Status Lifecycle
|
|
|
|
### Primary Trigger: Folder Watcher (Cora Backlinks)
|
|
|
|
The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora.
|
|
|
|
```
|
|
1. ClickUp task created:
|
|
Work Category="Link Building", LB Method="Cora Backlinks", status="to do"
|
|
Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc.
|
|
→ Appears on dashboard as "needs Cora run"
|
|
|
|
2. User runs Cora manually, drops .xlsx in Z:/cora-inbox
|
|
|
|
3. Folder watcher (_scan_watch_folder, runs every 60 min):
|
|
→ Finds precision-cnc-machining.xlsx
|
|
→ Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks
|
|
→ Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.)
|
|
→ Sets CoraFile field on the ClickUp task to the file path
|
|
→ Moves task to "in progress"
|
|
→ Posts comment: "Starting Cora Backlinks pipeline..."
|
|
|
|
4. Pipeline runs:
|
|
→ Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json"
|
|
→ Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers."
|
|
|
|
5. On success:
|
|
→ Move task to "complete"
|
|
→ Post summary comment with stats
|
|
→ Move .xlsx to Z:/cora-inbox/processed/
|
|
|
|
6. On failure:
|
|
→ Move task to "internal review"
|
|
→ Post error comment with details
|
|
→ .xlsx stays in Z:/cora-inbox (can retry)
|
|
```
|
|
|
|
### Secondary Trigger: Chat
|
|
|
|
```
|
|
User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx"
|
|
→ Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method)
|
|
→ Tool auto-looks up matching ClickUp task via Keyword field (if exists)
|
|
→ Same pipeline + ClickUp sync as above
|
|
→ If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only
|
|
```
|
|
|
|
### Future Trigger: ClickUp Scheduler (other LB Methods)
|
|
|
|
Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The `run_link_building` orchestrator checks `lb_method`:
|
|
- "Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these)
|
|
- Future methods → can execute directly from ClickUp task data
|
|
|
|
### ClickUp Skill Map Note
|
|
|
|
The skill_map entry for "Link Building" exists primarily for **field mapping reference** (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but `run_link_building` will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher.
|
|
|
|
## Implementation Order
|
|
|
|
1. **Config** — Add `LinkBuildingConfig` to config.py, add `link_building:` section to config.yaml, add `link_builder` agent to config.yaml
|
|
2. **Core tools** — Create `cheddahbot/tools/linkbuilding.py` with `_run_blm_command`, parsers, `run_link_building` orchestrator, and `run_cora_backlinks` pipeline
|
|
3. **Standalone tools** — Add `blm_ingest_cora` and `blm_generate_batch`
|
|
4. **Tests** — Create `tests/test_linkbuilding.py`, verify with `uv run pytest tests/test_linkbuilding.py -v`
|
|
5. **ClickUp field creation** — Add `create_custom_field` to clickup.py, add `setup_linkbuilding_fields` tool
|
|
6. **ClickUp integration** — Add skill_map entry, add ClickUp state tracking to tools
|
|
7. **Folder watcher** — Add `_folder_watch_loop` to scheduler.py, add `scan_cora_folder` tool
|
|
8. **API endpoint** — Add `/api/linkbuilding/status` to `__main__.py`
|
|
9. **Skill file** — Create `skills/linkbuilding.md`
|
|
10. **ClickUp setup** — Run `setup_linkbuilding_fields` to auto-create custom fields across all lists
|
|
11. **Full test run** — `uv run pytest -v --no-cov`
|
|
|
|
## Verification
|
|
|
|
1. **Unit tests**: `uv run pytest tests/test_linkbuilding.py -v` — all pass with mocked subprocess
|
|
2. **Full suite**: `uv run pytest -v --no-cov` — no regressions
|
|
3. **Lint**: `uv run ruff check .` + `uv run ruff format .`
|
|
4. **Manual e2e**: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs
|
|
5. **ClickUp e2e**: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution
|
|
6. **Chat e2e**: Ask CheddahBot to "run link building for [keyword]" via chat UI
|
|
7. **API check**: Hit `http://localhost:7860/api/linkbuilding/status` and verify data returned
|
|
|
|
## Key Reference Files
|
|
|
|
- `cheddahbot/tools/press_release.py` — Reference pattern for multi-step pipeline tool
|
|
- `cheddahbot/scheduler.py:55-76` — Where to add 4th daemon thread
|
|
- `cheddahbot/config.py:108-200` — load_config() pattern for new config sections
|
|
- `E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md` — Full CLI reference
|
|
- `E:/dev/Big-Link-Man/src/cli/commands.py` — Exact output formats to parse
|