15 KiB

Raw Permalink Blame History

Link Building Agent Plan

Context

CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (E:/dev/Big-Link-Man/). The current workflow is manual: run Cora on another machine → get .xlsx → manually run main.py ingest-cora → manually run main.py generate-batch. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.).

Decisions Made

Watch folder: Z:/cora-inbox (network drive, Cora machine accessible)
File→task matching: Fuzzy match .xlsx filename stem against ClickUp task's Keyword custom field
New ClickUp field "LB Method": Dropdown with initial option "Cora Backlinks" (more added later)
Dashboard: API endpoint + NotificationBus events only (no frontend work — separate project)
Sidecar files: Not needed — all metadata comes from the matching ClickUp task
Tool naming: Orchestrator pattern — run_link_building is a thin dispatcher that reads LB Method and routes to the specific pipeline tool (e.g., run_cora_backlinks). Future link building methods get their own tools and slot into the orchestrator.

Files to Create

1. `cheddahbot/tools/linkbuilding.py` — Main tool module

Four @tool-decorated functions + private helpers:

run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)

Orchestrator/dispatcher — reads lb_method (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool
If lb_method is "Cora Backlinks" or empty (default): calls run_cora_backlinks()
Future: if lb_method is "MCP Link Building": calls run_mcp_link_building() (not yet implemented)
Passes all other args through to the sub-tool
This is what the ClickUp skill_map always routes to

run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)

The actual Cora pipeline — runs ingest-cora → generate-batch
Step 1: Build CLI args, call _run_blm_command(["ingest-cora", ...]), parse stdout for job file path
Step 2: Call _run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])
Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern)
Returns ## ClickUp Sync in output to signal scheduler that sync was handled internally
Can also be called directly from chat for explicit Cora runs

blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)

Standalone ingest — runs ingest-cora only, returns project ID and job file path
For cases where user wants to ingest but not generate yet

blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)

Standalone generate — runs generate-batch only on an existing job file
For re-running generation or running a manually-created job

Private helpers:

_run_blm_command(args, timeout=1800) — subprocess wrapper, runs uv run python main.py <args> from BLM_DIR, injects -u/-p from BLM_USERNAME/BLM_PASSWORD env vars
_parse_ingest_output(stdout) — regex extract project_id + job_file path
_parse_generate_output(stdout) — extract completion stats
_build_ingest_args(...) — construct CLI argument list from tool params
_set_status(ctx, message) — write pipeline status to KV store (for UI polling)
_sync_clickup(ctx, task_id, step, message) — post comment + update state

Critical: always pass -m flag to ingest-cora to prevent interactive stdin prompt from blocking the subprocess.

2. `skills/linkbuilding.md` — Skill file

YAML frontmatter linking to [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder] tools and [link_builder, default] agents. Markdown body describes when to use, default flags, workflow steps.

3. `tests/test_linkbuilding.py` — Test suite (~40 tests)

All tests mock subprocess.run — never call Big-Link-Man. Categories:

Output parser unit tests (_parse_ingest_output, _parse_generate_output)
CLI arg builder tests (all flag combinations, missing required params)
Full pipeline integration (happy path, ingest failure, generate failure)
ClickUp state machine (executing → completed, executing → failed)
Folder watcher scan logic (new files, skip processed, missing ClickUp match)

Files to Modify

4. `cheddahbot/config.py` — Add LinkBuildingConfig

@dataclass
class LinkBuildingConfig:
    blm_dir: str = "E:/dev/Big-Link-Man"
    watch_folder: str = ""                    # empty = disabled
    watch_interval_minutes: int = 60
    default_branded_plus_ratio: float = 0.7

Add link_building: LinkBuildingConfig field to Config dataclass. Add YAML loading block in load_config() (same pattern as memory/scheduler/shell). Add env var override for BLM_DIR.

5. `config.yaml` — Three additions

New top-level section:

link_building:
  blm_dir: "E:/dev/Big-Link-Man"
  watch_folder: "Z:/cora-inbox"
  watch_interval_minutes: 60
  default_branded_plus_ratio: 0.7

New skill_map entry under clickup:

"Link Building":
  tool: "run_link_building"
  auto_execute: false           # Cora Backlinks triggered by folder watcher, not scheduler
  complete_status: "complete"   # Override: use "complete" instead of "internal review"
  error_status: "internal review"  # On failure, move to internal review
  field_mapping:
    lb_method: "LB Method"
    project_name: "task_name"
    money_site_url: "IMSURL"
    custom_anchors: "CustomAnchors"
    branded_plus_ratio: "BrandedPlusRatio"
    cli_flags: "CLIFlags"
    xlsx_path: "CoraFile"

New agent:

- name: link_builder
  display_name: Link Builder
  tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory]
  memory_scope: ""

6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread)

New thread _folder_watch_loop alongside existing poll, heartbeat, and ClickUp threads:

Starts if config.link_building.watch_folder is non-empty
Runs every watch_interval_minutes (default 60)
_scan_watch_folder() globs *.xlsx in watch folder
For each file, checks KV store linkbuilding:watched:{filename} — skip if already processed
Fuzzy-matches filename stem against ClickUp tasks with LB Method = "Cora Backlinks" and status "to do":
- Queries ClickUp for Link Building tasks
- Compares normalized filename stem against each task's Keyword custom field
- If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc.
- If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task
On match: executes run_link_building tool with args from the ClickUp task fields
On completion: moves .xlsx to Z:/cora-inbox/processed/ subfolder, updates KV state
On failure: updates KV state with error, notifies via NotificationBus

File handling after pipeline:

On success: .xlsx moved from Z:/cora-inbox/ → Z:/cora-inbox/processed/
On failure: .xlsx stays in Z:/cora-inbox/ (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry)

Also adds scan_cora_folder tool (can live in linkbuilding.py):

Chat-invocable utility for the agent to check what's in the watch folder
Returns list of unprocessed .xlsx files with ClickUp match status
Internal agent tool, not a dashboard concern

7. `cheddahbot/clickup.py` — Add field creation method

Add create_custom_field(list_id, name, field_type, type_config=None) method that calls POST /list/{list_id}/field. Used by the setup tool to auto-create custom fields across lists.

8. `cheddahbot/main.py` — Add API endpoint

Add before Gradio mount:

@fastapi_app.get("/api/linkbuilding/status")
async def linkbuilding_status():
    """Return link building status for dashboard consumption."""
    # Returns:
    # {
    #   "pending_cora_runs": [
    #     {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"},
    #     ...
    #   ],
    #   "in_progress": [...],     # Currently executing pipelines
    #   "completed": [...],       # Recently completed (last 7 days)
    #   "failed": [...]           # Failed tasks needing attention
    # }

The pending_cora_runs section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's Keyword field and IMSURL (copiable URL) so the user can see exactly which Cora reports need to be run.

Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support.

No other __main__.py changes needed — agent wiring is automatic from config.yaml.

ClickUp Custom Fields (Auto-Created)

New custom fields to be created programmatically:

Field	Type	Purpose
`LB Method`	Dropdown	Link building subtype. Initial option: "Cora Backlinks"
`Keyword`	Short Text	Target keyword (used for file matching)
`CoraFile`	Short Text	Path to .xlsx file (optional, set by agent after file match)
`CustomAnchors`	Short Text	Comma-separated anchor text overrides
`BrandedPlusRatio`	Short Text	Override for `-bp` flag (e.g., "0.7")
`CLIFlags`	Short Text	Raw additional CLI flags (e.g., "-r 5 -t 0.3")

Fields that already exist and will be reused: Client, IMSURL, Work Category (add "Link Building" option).

Auto-creation approach

Add create_custom_field(list_id, name, type, type_config=None) method to cheddahbot/clickup.py — calls POST /list/{list_id}/field
Add a setup_linkbuilding_fields tool (category="linkbuilding") that:
1. Gets all list IDs in the space
2. For each list, checks if fields already exist (via get_custom_fields)
3. Creates missing fields via the new API method
4. For LB Method dropdown, creates with type_config containing "Cora Backlinks" option
5. For Work Category, adds "Link Building" option if missing
This tool runs once during initial setup, or can be re-run if new lists are added
Also add "Link Building" as an option to the existing Work Category dropdown if not present

Data Flow & Status Lifecycle

Primary Trigger: Folder Watcher (Cora Backlinks)

The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora.

1. ClickUp task created:
   Work Category="Link Building", LB Method="Cora Backlinks", status="to do"
   Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc.
   → Appears on dashboard as "needs Cora run"

2. User runs Cora manually, drops .xlsx in Z:/cora-inbox

3. Folder watcher (_scan_watch_folder, runs every 60 min):
   → Finds precision-cnc-machining.xlsx
   → Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks
   → Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.)
   → Sets CoraFile field on the ClickUp task to the file path
   → Moves task to "in progress"
   → Posts comment: "Starting Cora Backlinks pipeline..."

4. Pipeline runs:
   → Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json"
   → Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers."

5. On success:
   → Move task to "complete"
   → Post summary comment with stats
   → Move .xlsx to Z:/cora-inbox/processed/

6. On failure:
   → Move task to "internal review"
   → Post error comment with details
   → .xlsx stays in Z:/cora-inbox (can retry)

Secondary Trigger: Chat

User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx"
  → Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method)
  → Tool auto-looks up matching ClickUp task via Keyword field (if exists)
  → Same pipeline + ClickUp sync as above
  → If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only

Future Trigger: ClickUp Scheduler (other LB Methods)

Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The run_link_building orchestrator checks lb_method:

"Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these)
Future methods → can execute directly from ClickUp task data

ClickUp Skill Map Note

The skill_map entry for "Link Building" exists primarily for field mapping reference (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but run_link_building will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher.

Implementation Order

Config — Add LinkBuildingConfig to config.py, add link_building: section to config.yaml, add link_builder agent to config.yaml
Core tools — Create cheddahbot/tools/linkbuilding.py with _run_blm_command, parsers, run_link_building orchestrator, and run_cora_backlinks pipeline
Standalone tools — Add blm_ingest_cora and blm_generate_batch
Tests — Create tests/test_linkbuilding.py, verify with uv run pytest tests/test_linkbuilding.py -v
ClickUp field creation — Add create_custom_field to clickup.py, add setup_linkbuilding_fields tool
ClickUp integration — Add skill_map entry, add ClickUp state tracking to tools
Folder watcher — Add _folder_watch_loop to scheduler.py, add scan_cora_folder tool
API endpoint — Add /api/linkbuilding/status to __main__.py
Skill file — Create skills/linkbuilding.md
ClickUp setup — Run setup_linkbuilding_fields to auto-create custom fields across all lists
Full test run — uv run pytest -v --no-cov

Verification

Unit tests: uv run pytest tests/test_linkbuilding.py -v — all pass with mocked subprocess
Full suite: uv run pytest -v --no-cov — no regressions
Lint: uv run ruff check . + uv run ruff format .
Manual e2e: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs
ClickUp e2e: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution
Chat e2e: Ask CheddahBot to "run link building for [keyword]" via chat UI
API check: Hit http://localhost:7860/api/linkbuilding/status and verify data returned

Key Reference Files

cheddahbot/tools/press_release.py — Reference pattern for multi-step pipeline tool
cheddahbot/scheduler.py:55-76 — Where to add 4th daemon thread
cheddahbot/config.py:108-200 — load_config() pattern for new config sections
E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md — Full CLI reference
E:/dev/Big-Link-Man/src/cli/commands.py — Exact output formats to parse

15 KiB Raw Permalink Blame History