# Big Link Man - Content Automation & Syndication Platform AI-powered content generation and multi-tier link building system with cloud deployment. ## Quick Start ```bash # Install dependencies uv pip install -r requirements.txt # Setup environment cp env.example .env # Edit .env with your credentials # Initialize database uv run python scripts/init_db.py # Create first admin user uv run python scripts/create_first_admin.py # Run CLI uv run python main.py --help ``` ## Environment Configuration Required environment variables in `.env`: ```bash DATABASE_URL=sqlite:///./content_automation.db OPENROUTER_API_KEY=your_key_here BUNNY_ACCOUNT_API_KEY=your_bunny_key_here ``` See `env.example` for full configuration options. ## Database Management ### Initialize Database ```bash uv run python scripts/init_db.py ``` ### Reset Database (drops all data) ```bash uv run python scripts/init_db.py reset ``` ### Create First Admin ```bash uv run python scripts/create_first_admin.py ``` ### Database Migrations ```bash # Story 3.1 - Site deployments uv run python scripts/migrate_story_3.1_sqlite.py # Story 3.2 - Anchor text uv run python scripts/migrate_add_anchor_text.py # Story 3.3 - Template fields uv run python scripts/migrate_add_template_fields.py # Story 3.4 - Site pages uv run python scripts/migrate_add_site_pages.py # Story 4.1 - Deployment fields uv run python scripts/migrate_add_deployment_fields.py # Backfill site pages after migration uv run python scripts/backfill_site_pages.py ``` ## User Management ### Add User ```bash uv run python main.py add-user \ --username newuser \ --password password123 \ --role Admin \ --admin-user admin \ --admin-password adminpass ``` ### List Users ```bash uv run python main.py list-users \ --admin-user admin \ --admin-password adminpass ``` ### Delete User ```bash uv run python main.py delete-user \ --username olduser \ --admin-user admin \ --admin-password adminpass ``` ## Site Management ### Provision New Site ```bash uv run python main.py provision-site \ --name "My Site" \ --domain www.example.com \ --storage-name my-storage-zone \ --region DE \ --admin-user admin \ --admin-password adminpass ``` Regions: `DE`, `NY`, `LA`, `SG`, `SYD` ### Attach Domain to Existing Storage ```bash uv run python main.py attach-domain \ --name "Another Site" \ --domain www.another.com \ --storage-name my-storage-zone \ --admin-user admin \ --admin-password adminpass ``` ### Sync Existing Bunny.net Sites ```bash # Dry run uv run python main.py sync-sites \ --admin-user admin \ --dry-run # Actually import uv run python main.py sync-sites \ --admin-user admin ``` ### List Sites ```bash uv run python main.py list-sites \ --admin-user admin \ --admin-password adminpass ``` ### Get Site Details ```bash uv run python main.py get-site \ --domain www.example.com \ --admin-user admin \ --admin-password adminpass ``` ### Remove Site ```bash uv run python main.py remove-site \ --domain www.example.com \ --admin-user admin \ --admin-password adminpass ``` ## S3 Bucket Management The platform supports AWS S3 buckets as storage providers alongside bunny.net. S3 buckets can be discovered, registered, and managed through the system. ### Prerequisites Set AWS credentials in `.env`: ```bash AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_REGION=us-east-1 # Optional, defaults to us-east-1 ``` ### Discover and Register S3 Buckets **Interactive Mode** (select buckets manually): ```bash uv run python main.py discover-s3-buckets ``` Or run the script directly: ```bash uv run python scripts/discover_s3_buckets.py ``` **Auto-Import Mode** (import all unregistered buckets automatically): ```bash uv run python scripts/discover_s3_buckets.py --auto-import-all ``` Auto-import mode will: - Discover all S3 buckets in your AWS account - Skip buckets already registered in the database - Skip buckets in the exclusion list - Register remaining buckets as bucket-only sites (no custom domain) ### Bucket Exclusion List To prevent certain buckets from being auto-imported (e.g., buckets manually added with FQDNs), add them to `s3_bucket_exclusions.txt`: ``` # S3 Bucket Exclusion List # One bucket name per line (comments start with #) learningeducationtech.com theteacher.best airconditionerfixer.com ``` The discovery script automatically loads and respects this exclusion list. Excluded buckets are marked as `[EXCLUDED]` in the display and are skipped during both interactive and auto-import operations. ### List S3 Sites with FQDNs To see which S3 buckets have custom domains (and should be excluded): ```bash uv run python scripts/list_s3_fqdn_sites.py ``` This script lists all S3 sites with `s3_custom_domain` set and outputs bucket names that should be added to the exclusion list. ### S3 Site Types S3 sites can be registered in two ways: 1. **Bucket-only sites**: No custom domain, accessed via S3 website endpoint - Created via auto-import or interactive discovery - Uses bucket name as site identifier - URL format: `https://bucket-name.s3.region.amazonaws.com/` 2. **FQDN sites**: Manually added with custom domains - Created manually with `s3_custom_domain` set - Should be added to exclusion list to prevent re-import - URL format: `https://custom-domain.com/` ### S3 Storage Features - **Multi-region support**: Automatically detects bucket region - **Public read access**: Buckets configured for public read-only access - **Bucket policy**: Applied automatically for public read access - **Region mapping**: AWS regions mapped to short codes (US, EU, SG, etc.) - **Duplicate prevention**: Checks existing registrations before importing ### Helper Scripts **List S3 FQDN sites**: ```bash uv run python scripts/list_s3_fqdn_sites.py ``` **Delete sites by ID**: ```bash # Edit scripts/delete_sites.py to set site_ids, then: uv run python scripts/delete_sites.py ``` **Check sites around specific IDs**: ```bash # Edit scripts/list_sites_by_id.py to set target_ids, then: uv run python scripts/list_sites_by_id.py ``` ## Project Management ### Ingest CORA Report ```bash uv run python main.py ingest-cora \ --file shaft_machining.xlsx \ --name "Shaft Machining Project" \ --custom-anchors "shaft repair,engine parts" \ --username admin \ --password adminpass ``` ### List Projects ```bash uv run python main.py list-projects \ --username admin \ --password adminpass ``` ## Content Generation ### Create Job Configuration ```bash # Tier 1 only uv run python create_job_config.py 1 tier1 15 # Multi-tier uv run python create_job_config.py 1 multi 15 50 100 ``` ### Generate Content Batch ```bash uv run python main.py generate-batch \ --job-file jobs/project_1_tier1_15articles.json \ --username admin \ --password adminpass ``` With options: ```bash uv run python main.py generate-batch \ --job-file jobs/my_job.json \ --username admin \ --password adminpass \ --debug \ --continue-on-error \ --model gpt-4o-mini ``` Available models: `gpt-4o-mini`, `claude-sonnet-4.5`, - anything at openrouter **Note:** If your job file contains a `models` config, it will override the `--model` flag and use different models for title, outline, and content generation stages. ## Deployment ### Deploy Batch ```bash # Automatic deployment (runs after generation) uv run python main.py generate-batch \ --job-file jobs/my_job.json \ --username admin \ --password adminpass # Manual deployment uv run python main.py deploy-batch \ --batch-id 123 \ --admin-user admin \ --admin-password adminpass ``` ### Dry Run Deployment ```bash uv run python main.py deploy-batch \ --batch-id 123 \ --dry-run ``` ### Verify Deployment ```bash # Check all URLs uv run python main.py verify-deployment --batch-id 123 # Check random sample uv run python main.py verify-deployment \ --batch-id 123 \ --sample 10 \ --timeout 10 ``` ## Link Export ### Export Article URLs ```bash # Tier 1 only uv run python main.py get-links \ --project-id 123 \ --tier 1 # Tier 2 and above uv run python main.py get-links \ --project-id 123 \ --tier 2+ # With anchor text and destinations uv run python main.py get-links \ --project-id 123 \ --tier 2+ \ --with-anchor-text \ --with-destination-url ``` Output is CSV format to stdout. Redirect to save: ```bash uv run python main.py get-links \ --project-id 123 \ --tier 1 > tier1_urls.csv ``` ## Utility Scripts ### Add robots.txt to All Buckets Add a standardized robots.txt file to all storage buckets (both S3 and Bunny) that blocks SEO tools and bad bots while allowing legitimate search engines and AI crawlers: ```bash # Preview what would be done (recommended first) uv run python scripts/add_robots_txt_to_buckets.py --dry-run # Upload to all buckets uv run python scripts/add_robots_txt_to_buckets.py # Only process S3 buckets uv run python scripts/add_robots_txt_to_buckets.py --provider s3 # Only process Bunny storage zones uv run python scripts/add_robots_txt_to_buckets.py --provider bunny ``` **robots.txt behavior:** - Allows: Google, Bing, Yahoo, DuckDuckGo, Baidu, Yandex - Allows: GPTBot, Claude, Common Crawl, Perplexity, ByteDance AI - Blocks: Ahrefs, Semrush, Moz, and other SEO tools - Blocks: HTTrack, Wget, and other scrapers/bad bots The script is idempotent (safe to run multiple times) and will overwrite existing robots.txt files. It continues processing remaining buckets if one fails and reports all failures at the end. ### Update Index Pages and Sitemaps Automatically generate or update `index.html` and `sitemap.xml` files for all storage buckets (both S3 and Bunny). The script: - Lists all HTML files in each bucket's root directory - Extracts titles from `