Go to file

PeninsulaInd fcacf06078 Add --tier1-count flag to ingest-cora command Allows overriding the default random 10-12 T1 article count. Usage: ingest-cora ... --tier1-count 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-02-20 21:08:56 -06:00
.cursor	added branded + anchor text with -bp flag	2026-01-21 16:23:29 -06:00
.github/workflows	Story 1.7 finished	2025-10-18 12:56:30 -05:00
debug_output	Story 2.3 - content generation script nightmare alomst done - pre-fix outline too big issue	2025-10-19 20:29:13 -05:00
docs	Add tier1 branded anchor text ratio flag to ingest-cora command	2026-01-16 15:53:07 -06:00
jobs	Add CLI documentation, S3 scripts, and update deployment code	2025-12-29 12:51:59 -06:00
scripts	Add S3 bucket discovery with auto-import and exclusion list support	2025-12-30 16:57:51 -06:00
src	Add --tier1-count flag to ingest-cora command	2026-02-20 21:08:56 -06:00
tests	Add CLI documentation, S3 scripts, and update deployment code	2025-12-29 12:51:59 -06:00
.gitignore	add brands.json to repository	2026-01-21 16:24:16 -06:00
CLI_INTEGRATION_EXPLANATION.md	Story 3.3: QA says all of epic 3 isnt in batch_processor.py, pre fix	2025-10-21 13:51:38 -05:00
DEPLOY_BATCH_ANALYSIS.md	Fixed NOT TESTED: now actually listens to # of links. Also makes See Also smaller.	2025-10-23 15:37:31 -05:00
IMAGE_TEMPLATE_ISSUES_ANALYSIS.md	Add Epic 6: Multi-Cloud Storage Support planning and merge images branch changes	2025-12-10 11:37:56 -06:00
INTEGRATION_COMPLETE.md	Story 3.3: QA fixed cli integration	2025-10-21 14:28:18 -05:00
INTEGRATION_GAP_VISUAL.md	Story 3.3: QA says all of epic 3 isnt in batch_processor.py, pre fix	2025-10-21 13:51:38 -05:00
JOB_FIELD_REFERENCE.md	Implement Story 8.1: Job-Level Anchor Text Control for T1 and T2+	2025-12-19 12:43:01 -06:00
QA_REPORT_STORY_3.3.md	Story 3.3: QA says all of epic 3 isnt in batch_processor.py, pre fix	2025-10-21 13:51:38 -05:00
QA_REPORT_STORY_3.4.md	Story 3.4 writen and qa	2025-10-21 19:15:02 -05:00
README.md	Add S3 bucket discovery with auto-import and exclusion list support	2025-12-30 16:57:51 -06:00
STORY_2.5_IMPLEMENTATION_SUMMARY.md	Story 2.5 kludgy mapping of buckets to real domains	2025-10-20 23:35:55 -05:00
STORY_3.1_IMPLEMENTATION_SUMMARY.md	Story 3.2 written	2025-10-21 10:34:11 -05:00
STORY_3.1_QUICKSTART.md	Story 3.2 written	2025-10-21 10:34:11 -05:00
STORY_3.2_IMPLEMENTATION_SUMMARY.md	Story 4.5 coded and tested	2025-10-22 14:19:59 -05:00
STORY_3.3_COMPLETE.md	Story 3.4 generated by scrum manager	2025-10-21 14:58:11 -05:00
STORY_3.3_IMPLEMENTATION_SUMMARY.md	Story 3.3: QA fixed cli integration	2025-10-21 14:28:18 -05:00
STORY_3.3_QA_SUMMARY.md	Story 3.3: QA says all of epic 3 isnt in batch_processor.py, pre fix	2025-10-21 13:51:38 -05:00
STORY_3.4_CREATED.md	Story 3.4 generated with corrected logic and data storage	2025-10-21 18:59:26 -05:00
STORY_3.4_IMPLEMENTATION_SUMMARY.md	Story 3.4 writen and qa	2025-10-21 19:15:02 -05:00
STORY_3.4_QA_SUMMARY.md	Story 3.4 writen and qa	2025-10-21 19:15:02 -05:00
STORY_4.1_IMPLEMENTATION_SUMMARY.md	Story 4.1-3 coded and real-world tested. Working	2025-10-22 13:04:03 -05:00
STORY_4.1_QUICKSTART.md	Story 4.1 coded - pre real testing	2025-10-22 12:19:25 -05:00
STORY_4.1_REAL_UPLOAD_VALIDATION.md	Story 4.1-3 coded and real-world tested. Working	2025-10-22 13:04:03 -05:00
STORY_4.4_IMPLEMENTATION_SUMMARY.md	Story 4.4 - simple check works	2025-10-22 13:46:14 -05:00
STORY_4.4_QUICKSTART.md	Story 4.4 - simple check works	2025-10-22 13:46:14 -05:00
STORY_4.5_IMPLEMENTATION_SUMMARY.md	Story 4.5 coded and tested	2025-10-22 14:19:59 -05:00
TEMPLATE_TRACKING_FIX.md	Story 3.4 generated with corrected logic and data storage	2025-10-21 18:59:26 -05:00
brands.json	add brands.json to repository	2026-01-21 16:24:16 -06:00
check_last_gen.py	Passed initial test, generates an entire article	2025-10-20 11:35:02 -05:00
check_link_counts.py	Fixed NOT TESTED: now actually listens to # of links. Also makes See Also smaller.	2025-10-23 15:37:31 -05:00
content_automation.db.backup	Passed initial test, generates an entire article	2025-10-20 11:35:02 -05:00
content_automation.db.backup_before_fresh_start	Story 3.2 written	2025-10-21 10:34:11 -05:00
create_job_config.py	Story 2.3 - content generation script nightmare alomst done - pre-fix outline too big issue	2025-10-19 20:29:13 -05:00
env.example	Add image generation using fal.ai API	2025-11-19 17:09:20 -06:00
et --hard d81537f	Passed initial test, generates an entire article	2025-10-20 11:35:02 -05:00
main.py	feat: Complete Story 1.1 - Project Initialization & Configuration	2025-10-17 23:42:16 -05:00
master.config.json	Story 3.4 generated with corrected logic and data storage	2025-10-21 18:59:26 -05:00
requirements.txt	Implement PIL text overlay for hero images	2025-11-19 17:28:26 -06:00
run_colinkri.bat	Add S3 bucket discovery with auto-import and exclusion list support	2025-12-30 16:57:51 -06:00
s3_bucket_exclusions.txt	Add S3 bucket discovery with auto-import and exclusion list support	2025-12-30 16:57:51 -06:00
story2.1-IMPLEMENTATION_SUMMARY.md	Story 3.2 written	2025-10-21 10:34:11 -05:00
story3.1-IMPLEMENTATION_COMPLETE.md	Story 3.2 written	2025-10-21 10:34:11 -05:00
temp_service_original.py	Story 2.3 - content generation script nightmare alomst done - pre-fix outline too big issue	2025-10-19 20:29:13 -05:00
verify_integration.py	Story 3.3: QA fixed cli integration	2025-10-21 14:28:18 -05:00

README.md

Big Link Man - Content Automation & Syndication Platform

AI-powered content generation and multi-tier link building system with cloud deployment.

Quick Start

# Install dependencies
uv pip install -r requirements.txt

# Setup environment
cp env.example .env
# Edit .env with your credentials

# Initialize database
uv run python scripts/init_db.py

# Create first admin user
uv run python scripts/create_first_admin.py

# Run CLI
uv run python main.py --help

Environment Configuration

Required environment variables in .env:

DATABASE_URL=sqlite:///./content_automation.db
OPENROUTER_API_KEY=your_key_here
BUNNY_ACCOUNT_API_KEY=your_bunny_key_here

See env.example for full configuration options.

Database Management

Initialize Database

uv run python scripts/init_db.py

Reset Database (drops all data)

uv run python scripts/init_db.py reset

Create First Admin

uv run python scripts/create_first_admin.py

Database Migrations

# Story 3.1 - Site deployments
uv run python scripts/migrate_story_3.1_sqlite.py

# Story 3.2 - Anchor text
uv run python scripts/migrate_add_anchor_text.py

# Story 3.3 - Template fields
uv run python scripts/migrate_add_template_fields.py

# Story 3.4 - Site pages
uv run python scripts/migrate_add_site_pages.py

# Story 4.1 - Deployment fields
uv run python scripts/migrate_add_deployment_fields.py

# Backfill site pages after migration
uv run python scripts/backfill_site_pages.py

User Management

Add User

uv run python main.py add-user \
  --username newuser \
  --password password123 \
  --role Admin \
  --admin-user admin \
  --admin-password adminpass

List Users

uv run python main.py list-users \
  --admin-user admin \
  --admin-password adminpass

Delete User

uv run python main.py delete-user \
  --username olduser \
  --admin-user admin \
  --admin-password adminpass

Site Management

Provision New Site

uv run python main.py provision-site \
  --name "My Site" \
  --domain www.example.com \
  --storage-name my-storage-zone \
  --region DE \
  --admin-user admin \
  --admin-password adminpass

Regions: DE, NY, LA, SG, SYD

Attach Domain to Existing Storage

uv run python main.py attach-domain \
  --name "Another Site" \
  --domain www.another.com \
  --storage-name my-storage-zone \
  --admin-user admin \
  --admin-password adminpass

Sync Existing Bunny.net Sites

# Dry run
uv run python main.py sync-sites \
  --admin-user admin \
  --dry-run

# Actually import
uv run python main.py sync-sites \
  --admin-user admin

List Sites

uv run python main.py list-sites \
  --admin-user admin \
  --admin-password adminpass

Get Site Details

uv run python main.py get-site \
  --domain www.example.com \
  --admin-user admin \
  --admin-password adminpass

Remove Site

uv run python main.py remove-site \
  --domain www.example.com \
  --admin-user admin \
  --admin-password adminpass

S3 Bucket Management

The platform supports AWS S3 buckets as storage providers alongside bunny.net. S3 buckets can be discovered, registered, and managed through the system.

Prerequisites

Set AWS credentials in .env:

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1  # Optional, defaults to us-east-1

Discover and Register S3 Buckets

Interactive Mode (select buckets manually):

uv run python main.py discover-s3-buckets

Or run the script directly:

uv run python scripts/discover_s3_buckets.py

Auto-Import Mode (import all unregistered buckets automatically):

uv run python scripts/discover_s3_buckets.py --auto-import-all

Auto-import mode will:

Discover all S3 buckets in your AWS account
Skip buckets already registered in the database
Skip buckets in the exclusion list
Register remaining buckets as bucket-only sites (no custom domain)

Bucket Exclusion List

To prevent certain buckets from being auto-imported (e.g., buckets manually added with FQDNs), add them to s3_bucket_exclusions.txt:

# S3 Bucket Exclusion List
# One bucket name per line (comments start with #)

learningeducationtech.com
theteacher.best
airconditionerfixer.com

The discovery script automatically loads and respects this exclusion list. Excluded buckets are marked as [EXCLUDED] in the display and are skipped during both interactive and auto-import operations.

List S3 Sites with FQDNs

To see which S3 buckets have custom domains (and should be excluded):

uv run python scripts/list_s3_fqdn_sites.py

This script lists all S3 sites with s3_custom_domain set and outputs bucket names that should be added to the exclusion list.

S3 Site Types

S3 sites can be registered in two ways:

Bucket-only sites: No custom domain, accessed via S3 website endpoint
- Created via auto-import or interactive discovery
- Uses bucket name as site identifier
- URL format: https://bucket-name.s3.region.amazonaws.com/
FQDN sites: Manually added with custom domains
- Created manually with s3_custom_domain set
- Should be added to exclusion list to prevent re-import
- URL format: https://custom-domain.com/

S3 Storage Features

Multi-region support: Automatically detects bucket region
Public read access: Buckets configured for public read-only access
Bucket policy: Applied automatically for public read access
Region mapping: AWS regions mapped to short codes (US, EU, SG, etc.)
Duplicate prevention: Checks existing registrations before importing

Helper Scripts

List S3 FQDN sites:

uv run python scripts/list_s3_fqdn_sites.py

Delete sites by ID:

# Edit scripts/delete_sites.py to set site_ids, then:
uv run python scripts/delete_sites.py

Check sites around specific IDs:

# Edit scripts/list_sites_by_id.py to set target_ids, then:
uv run python scripts/list_sites_by_id.py

Project Management

Ingest CORA Report

uv run python main.py ingest-cora \
  --file shaft_machining.xlsx \
  --name "Shaft Machining Project" \
  --custom-anchors "shaft repair,engine parts" \
  --username admin \
  --password adminpass

List Projects

uv run python main.py list-projects \
  --username admin \
  --password adminpass

Content Generation

Create Job Configuration

# Tier 1 only
uv run python create_job_config.py 1 tier1 15

# Multi-tier
uv run python create_job_config.py 1 multi 15 50 100

Generate Content Batch

uv run python main.py generate-batch \
  --job-file jobs/project_1_tier1_15articles.json \
  --username admin \
  --password adminpass

With options:

uv run python main.py generate-batch \
  --job-file jobs/my_job.json \
  --username admin \
  --password adminpass \
  --debug \
  --continue-on-error \
  --model gpt-4o-mini

Available models: gpt-4o-mini, claude-sonnet-4.5, - anything at openrouter

Note: If your job file contains a models config, it will override the --model flag and use different models for title, outline, and content generation stages.

Deployment

Deploy Batch

# Automatic deployment (runs after generation)
uv run python main.py generate-batch \
  --job-file jobs/my_job.json \
  --username admin \
  --password adminpass

# Manual deployment
uv run python main.py deploy-batch \
  --batch-id 123 \
  --admin-user admin \
  --admin-password adminpass

Dry Run Deployment

uv run python main.py deploy-batch \
  --batch-id 123 \
  --dry-run

Verify Deployment

# Check all URLs
uv run python main.py verify-deployment --batch-id 123

# Check random sample
uv run python main.py verify-deployment \
  --batch-id 123 \
  --sample 10 \
  --timeout 10

Link Export

Export Article URLs

# Tier 1 only
uv run python main.py get-links \
  --project-id 123 \
  --tier 1

# Tier 2 and above
uv run python main.py get-links \
  --project-id 123 \
  --tier 2+

# With anchor text and destinations
uv run python main.py get-links \
  --project-id 123 \
  --tier 2+ \
  --with-anchor-text \
  --with-destination-url

Output is CSV format to stdout. Redirect to save:

uv run python main.py get-links \
  --project-id 123 \
  --tier 1 > tier1_urls.csv

Utility Scripts

Add robots.txt to All Buckets

Add a standardized robots.txt file to all storage buckets (both S3 and Bunny) that blocks SEO tools and bad bots while allowing legitimate search engines and AI crawlers:

# Preview what would be done (recommended first)
uv run python scripts/add_robots_txt_to_buckets.py --dry-run

# Upload to all buckets
uv run python scripts/add_robots_txt_to_buckets.py

# Only process S3 buckets
uv run python scripts/add_robots_txt_to_buckets.py --provider s3

# Only process Bunny storage zones
uv run python scripts/add_robots_txt_to_buckets.py --provider bunny

robots.txt behavior:

Allows: Google, Bing, Yahoo, DuckDuckGo, Baidu, Yandex
Allows: GPTBot, Claude, Common Crawl, Perplexity, ByteDance AI
Blocks: Ahrefs, Semrush, Moz, and other SEO tools
Blocks: HTTrack, Wget, and other scrapers/bad bots

The script is idempotent (safe to run multiple times) and will overwrite existing robots.txt files. It continues processing remaining buckets if one fails and reports all failures at the end.

Check Last Generated Content

uv run python check_last_gen.py

List All Users (Direct DB Access)

uv run python scripts/list_users.py

Add Admin (Direct DB Access)

uv run python scripts/add_admin_direct.py

Check Migration Status

uv run python scripts/check_migration.py

Add Tier to Projects

uv run python scripts/add_tier_to_projects.py

Testing

Run All Tests

uv run pytest

Run Unit Tests

uv run pytest tests/unit/ -v

Run Integration Tests

uv run pytest tests/integration/ -v

Run Specific Test File

uv run pytest tests/unit/test_url_generator.py -v

Run Story 3.1 Tests

uv run pytest tests/unit/test_url_generator.py \
               tests/unit/test_site_provisioning.py \
               tests/unit/test_site_assignment.py \
               tests/unit/test_job_config_extensions.py \
               tests/integration/test_story_3_1_integration.py \
               -v

Run with Coverage

uv run pytest --cov=src --cov-report=html

System Information

Show Configuration

uv run python main.py config

Health Check

uv run python main.py health

List Available Models

uv run python main.py models

Directory Structure

Big-Link-Man/
├── main.py                 # CLI entry point
├── src/                    # Source code
│   ├── api/               # FastAPI endpoints
│   ├── auth/              # Authentication
│   ├── cli/               # CLI commands
│   ├── core/              # Configuration
│   ├── database/          # Models, repositories
│   ├── deployment/        # Cloud deployment
│   ├── generation/        # Content generation
│   ├── ingestion/         # CORA parsing
│   ├── interlinking/      # Link injection
│   └── templating/        # HTML templates
├── scripts/               # Database & utility scripts
├── tests/                 # Test suite
│   ├── unit/
│   └── integration/
├── jobs/                  # Job configuration files
├── docs/                  # Documentation
└── deployment_logs/       # Deployed URL logs

Job Configuration Format

Example job config (jobs/example.json):

{
  "job_name": "Multi-Tier Launch",
  "project_id": 1,
  "description": "Site build with 165 articles",
  "models": {
    "title": "openai/gpt-4o-mini",
    "outline": "anthropic/claude-3.5-sonnet",
    "content": "anthropic/claude-3.5-sonnet"
  },
  "tiers": [
    {
      "tier": 1,
      "article_count": 15,
      "validation_attempts": 3
    },
    {
      "tier": 2,
      "article_count": 50,
      "validation_attempts": 2
    }
  ],
  "failure_config": {
    "max_consecutive_failures": 10,
    "skip_on_failure": true
  },
  "interlinking": {
    "links_per_article_min": 2,
    "links_per_article_max": 4,
    "include_home_link": true
  },
  "deployment_targets": ["www.primary.com"],
  "tier1_preferred_sites": ["www.premium.com"],
  "auto_create_sites": true
}

Per-Stage Model Configuration

You can specify different AI models for each generation stage (title, outline, content):

{
  "models": {
    "title": "openai/gpt-4o-mini",
    "outline": "anthropic/claude-3.5-sonnet",
    "content": "openai/gpt-4o"
  }
}

Available models:

openai/gpt-4o-mini - Fast and cost-effective
openai/gpt-4o - Higher quality, more expensive
anthropic/claude-3.5-sonnet - Excellent for long-form content

If models is not specified in the job file, all stages use the model from the --model CLI flag (default: gpt-4o-mini).

Common Workflows

Initial Setup

uv pip install -r requirements.txt
cp env.example .env
# Edit .env
uv run python scripts/init_db.py
uv run python scripts/create_first_admin.py
uv run python main.py sync-sites --admin-user admin

New Project Workflow

# 1. Ingest CORA report
uv run python main.py ingest-cora \
  --file project.xlsx \
  --name "My Project" \
  --username admin \
  --password adminpass

# 2. Create job config
uv run python create_job_config.py 1 multi 15 50 100

# 3. Generate content (auto-deploys)
uv run python main.py generate-batch \
  --job-file jobs/project_1_multi_3tiers_165articles.json \
  --username admin \
  --password adminpass

# 4. Verify deployment
uv run python main.py verify-deployment --batch-id 1

# 5. Export URLs for link building
uv run python main.py get-links \
  --project-id 1 \
  --tier 1 > tier1_urls.csv

Re-deploy After Changes

uv run python main.py deploy-batch \
  --batch-id 123 \
  --admin-user admin \
  --admin-password adminpass

Troubleshooting

Database locked

# Stop any running processes, then:
uv run python scripts/init_db.py reset

Missing dependencies

uv pip install -r requirements.txt --force-reinstall

AI API errors

Check OPENROUTER_API_KEY in .env

Bunny.net authentication failed

Check BUNNY_ACCOUNT_API_KEY in .env

Storage upload failed

Verify storage_zone_password in database (set during site provisioning)

Documentation

CLI Command Reference: docs/CLI_COMMAND_REFERENCE.md - Comprehensive documentation for all CLI commands
Product Requirements: docs/prd.md
Architecture: docs/architecture/
Implementation Summaries: STORY_*.md files
Quick Start Guides: *_QUICKSTART.md files

Regenerating CLI Documentation

To regenerate the CLI command reference after adding or modifying commands:

uv run python scripts/generate_cli_docs.py

This will update docs/CLI_COMMAND_REFERENCE.md with all current commands and their options.