Standalone package wrapping Big-Link-Man (BLM) for Paperclip. Extracted from cheddahbot/tools/linkbuilding.py and related modules, with task-system coupling, folder watching, and AutoCora queue logic stripped out. Public API: - Deps, BLMConfig, LLMCheck (injection types) - normalize_for_match, fuzzy_keyword_match, filename_stem_to_keyword - list_inbox_xlsx, find_xlsx_for_keyword, find_all_xlsx_for_keyword - blm_ingest_cora, blm_generate_batch, run_cora_backlinks (pipelines) - PipelineResult, IngestResult, GenerateResult (return types) 89 tests, 96% coverage. |
||
|---|---|---|
| src/link_building_workflow | ||
| tests | ||
| .gitignore | ||
| README.md | ||
| pyproject.toml | ||
README.md
Linkman-Paperclip-Wrap
A standalone Python package wrapping the Big-Link-Man (BLM) CLI for use by
Paperclip agents. Extracted from CheddahBot (cheddahbot/tools/linkbuilding.py)
and simplified for consumption by external callers.
What it does
Given a task keyword, the package can:
- Find a matching CORA
.xlsxin an inbox folder (e.g.Cora-For-Humans/) using fuzzy keyword matching with singular/plural awareness. - Invoke Big-Link-Man to run
ingest-coraandgenerate-batchon that xlsx, producing the backlink content. - Return a structured result the caller can use to update task state.
No folder watching, no task-system coupling, no notifications. The caller owns task state and polling cadence; this package is pure work.
Package layout
src/link_building_workflow/
deps.py -- Deps, BLMConfig, LLMCheck types
matching.py -- Keyword normalization and fuzzy matching
inbox.py -- Inbox folder scanning (list / find-by-keyword)
blm.py -- BLM subprocess wrapper and stdout parsers
pipeline.py -- run_cora_backlinks, blm_ingest_cora, blm_generate_batch
__init__.py -- Public API re-exports
Installation
uv add git+https://git.peninsulaindustries.com/bryanb/Linkman-Paperclip-Wrap.git
Big-Link-Man itself is a separate dependency the caller provides. Install it
on the same host and point BLMConfig.blm_dir at the checkout.
Public API
All imports available from the top level:
from link_building_workflow import (
# Dependency types
Deps, BLMConfig, LLMCheck,
# Matching primitives
normalize_for_match, fuzzy_keyword_match, filename_stem_to_keyword,
# Inbox scanning
InboxMatch, list_inbox_xlsx, find_xlsx_for_keyword, find_all_xlsx_for_keyword,
# Pipeline entry points
PipelineResult, run_cora_backlinks, blm_ingest_cora, blm_generate_batch,
# Low-level BLM (if you need to run a custom BLM command)
IngestResult, GenerateResult, build_ingest_args,
parse_ingest_output, parse_generate_output, run_blm_command,
)
Typical usage (Paperclip)
The caller decides when a task is eligible to run (all required task fields filled in, xlsx present in the inbox). This package provides the primitives to check the xlsx gate and to execute the work.
from link_building_workflow import (
Deps, BLMConfig, find_xlsx_for_keyword, run_cora_backlinks,
)
deps = Deps(
blm=BLMConfig(
blm_dir="/opt/big-link-man",
username="your-blm-user",
password="your-blm-pass",
timeout_seconds=1800,
),
llm_check=your_plural_checker, # callable[[str, str], bool]
)
def try_run_link_building(task):
# Caller gates 1-4: task-field checks (LB Method, Keyword, IMSURL, ...)
if not (task.keyword and task.imsurl):
return "blocked: missing task fields"
# Gate 5: does a matching xlsx exist yet?
match = find_xlsx_for_keyword(
"/data/Cora-For-Humans",
task.keyword,
deps.llm_check,
)
if match is None:
return "blocked: no xlsx in Cora-For-Humans"
# Execute
result = run_cora_backlinks(
xlsx_path=str(match.path),
project_name=task.keyword,
money_site_url=task.imsurl,
custom_anchors=task.custom_anchors or "",
cli_flags=task.cli_flags or "",
branded_plus_ratio=task.branded_plus_ratio, # None -> BLMConfig default
deps=deps,
)
if result.ok:
# result.summary is a multi-line human-readable string
# result.ingest.project_id, result.generate.job_moved_to, etc.
return f"done: {result.summary}"
else:
# result.step tells you where it stopped: "ingest" or "generate"
# result.error has the details
return f"failed at {result.step}: {result.error}"
The LLMCheck callable
Used when the fast-path string equality fails during fuzzy matching. Should
return True iff two keywords are the same modulo plural form ("shaft" vs
"shafts", "company" vs "companies"). Return False for any other kind of
difference. Implementations should cache -- the workflow may call this
repeatedly with the same pair while scanning an inbox.
Example implementation (the one CheddahBot uses):
import httpx
_cache = {}
def openrouter_plural_check(a: str, b: str) -> bool:
key = (a, b) if a <= b else (b, a)
if key in _cache:
return _cache[key]
resp = httpx.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={"Authorization": f"Bearer {OPENROUTER_API_KEY}"},
json={
"model": "anthropic/claude-haiku-4.5",
"max_tokens": 5,
"messages": [
{"role": "system", "content":
"Reply with only 'YES' or 'NO'. YES iff the two keywords "
"are identical except for singular/plural form."},
{"role": "user", "content": f'A: "{a}"\nB: "{b}"'},
],
},
timeout=15,
)
result = "YES" in resp.json()["choices"][0]["message"]["content"].upper()
_cache[key] = result
return result
Tests may pass lambda a, b: False for the fast-path-only case, or any
deterministic fake.
The PipelineResult dataclass
Every pipeline entry point returns the same shape:
| field | meaning |
|---|---|
ok |
True if the pipeline completed the phase it was asked to do |
step |
"ingest" / "generate" / "complete" (on success) or where it failed |
ingest |
IngestResult if ingest ran, else None |
generate |
GenerateResult if generate ran, else None |
error |
Human-readable error message (empty on success) |
summary |
Multi-line human-readable summary, safe to post as a comment |
project_name |
The BLM project name |
job_file |
Path to the final job file (post-move on success) |
log_lines |
Progress messages captured during the run |
What this package does NOT do
- Does not watch folders. No threads, no polling loops.
- Does not know about ClickUp, Linear, or any task system. The caller owns task state and decides what status transitions mean.
- Does not sync with shared-folder job queues (the old AutoCora queue).
- Does not manage the Cora tool itself. It only consumes xlsx files that Cora has already produced.
- Does not pick up where BLM leaves off. When BLM finishes
generate-batch, the job is done from this package's perspective.
These were deliberate drops during extraction. CheddahBot had folder-watch threads, ClickUp auto-matching, AutoCora queue submission, and a multi-inbox distribution loop. Paperclip owns that scheduling logic in its own code.
Development
Requires Python 3.11+ and uv.
uv sync # install dev + test deps
uv run pytest # run the test suite (89 tests, ~96% coverage)
uv run ruff check . # lint
Provenance
Extracted from the CheddahBot repo, specifically:
cheddahbot/tools/linkbuilding.py-- pipeline logic and fuzzy matchingcheddahbot/tools/autocora.py-- only the fuzzy-match helpers were kept; the shared-folder job queue and result polling were droppedcheddahbot/scheduler.py-- folder-watch loops were dropped; their matching logic was converted to a synchronousfind_xlsx_for_keywordcall
The BLM invocation parameters, stdout parsing regexes, and default ratios match CheddahBot's production behavior exactly.