# Big Link Man - Content Automation & Syndication Platform AI-powered content generation and multi-tier link building system with cloud deployment. ## Quick Start ```bash # Install dependencies uv pip install -r requirements.txt # Setup environment cp env.example .env # Edit .env with your credentials # Initialize database uv run python scripts/init_db.py # Create first admin user uv run python scripts/create_first_admin.py # Run CLI uv run python main.py --help ``` ## Environment Configuration Required environment variables in `.env`: ```bash DATABASE_URL=sqlite:///./content_automation.db OPENROUTER_API_KEY=your_key_here BUNNY_ACCOUNT_API_KEY=your_bunny_key_here ``` See `env.example` for full configuration options. ## Database Management ### Initialize Database ```bash uv run python scripts/init_db.py ``` ### Reset Database (drops all data) ```bash uv run python scripts/init_db.py reset ``` ### Create First Admin ```bash uv run python scripts/create_first_admin.py ``` ### Database Migrations ```bash # Story 3.1 - Site deployments uv run python scripts/migrate_story_3.1_sqlite.py # Story 3.2 - Anchor text uv run python scripts/migrate_add_anchor_text.py # Story 3.3 - Template fields uv run python scripts/migrate_add_template_fields.py # Story 3.4 - Site pages uv run python scripts/migrate_add_site_pages.py # Story 4.1 - Deployment fields uv run python scripts/migrate_add_deployment_fields.py # Backfill site pages after migration uv run python scripts/backfill_site_pages.py ``` ## User Management ### Add User ```bash uv run python main.py add-user \ --username newuser \ --password password123 \ --role Admin \ --admin-user admin \ --admin-password adminpass ``` ### List Users ```bash uv run python main.py list-users \ --admin-user admin \ --admin-password adminpass ``` ### Delete User ```bash uv run python main.py delete-user \ --username olduser \ --admin-user admin \ --admin-password adminpass ``` ## Site Management ### Provision New Site ```bash uv run python main.py provision-site \ --name "My Site" \ --domain www.example.com \ --storage-name my-storage-zone \ --region DE \ --admin-user admin \ --admin-password adminpass ``` Regions: `DE`, `NY`, `LA`, `SG`, `SYD` ### Attach Domain to Existing Storage ```bash uv run python main.py attach-domain \ --name "Another Site" \ --domain www.another.com \ --storage-name my-storage-zone \ --admin-user admin \ --admin-password adminpass ``` ### Sync Existing Bunny.net Sites ```bash # Dry run uv run python main.py sync-sites \ --admin-user admin \ --dry-run # Actually import uv run python main.py sync-sites \ --admin-user admin ``` ### List Sites ```bash uv run python main.py list-sites \ --admin-user admin \ --admin-password adminpass ``` ### Get Site Details ```bash uv run python main.py get-site \ --domain www.example.com \ --admin-user admin \ --admin-password adminpass ``` ### Remove Site ```bash uv run python main.py remove-site \ --domain www.example.com \ --admin-user admin \ --admin-password adminpass ``` ## S3 Bucket Management The platform supports AWS S3 buckets as storage providers alongside bunny.net. S3 buckets can be discovered, registered, and managed through the system. ### Prerequisites Set AWS credentials in `.env`: ```bash AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_REGION=us-east-1 # Optional, defaults to us-east-1 ``` ### Discover and Register S3 Buckets **Interactive Mode** (select buckets manually): ```bash uv run python main.py discover-s3-buckets ``` Or run the script directly: ```bash uv run python scripts/discover_s3_buckets.py ``` **Auto-Import Mode** (import all unregistered buckets automatically): ```bash uv run python scripts/discover_s3_buckets.py --auto-import-all ``` Auto-import mode will: - Discover all S3 buckets in your AWS account - Skip buckets already registered in the database - Skip buckets in the exclusion list - Register remaining buckets as bucket-only sites (no custom domain) ### Bucket Exclusion List To prevent certain buckets from being auto-imported (e.g., buckets manually added with FQDNs), add them to `s3_bucket_exclusions.txt`: ``` # S3 Bucket Exclusion List # One bucket name per line (comments start with #) learningeducationtech.com theteacher.best airconditionerfixer.com ``` The discovery script automatically loads and respects this exclusion list. Excluded buckets are marked as `[EXCLUDED]` in the display and are skipped during both interactive and auto-import operations. ### List S3 Sites with FQDNs To see which S3 buckets have custom domains (and should be excluded): ```bash uv run python scripts/list_s3_fqdn_sites.py ``` This script lists all S3 sites with `s3_custom_domain` set and outputs bucket names that should be added to the exclusion list. ### S3 Site Types S3 sites can be registered in two ways: 1. **Bucket-only sites**: No custom domain, accessed via S3 website endpoint - Created via auto-import or interactive discovery - Uses bucket name as site identifier - URL format: `https://bucket-name.s3.region.amazonaws.com/` 2. **FQDN sites**: Manually added with custom domains - Created manually with `s3_custom_domain` set - Should be added to exclusion list to prevent re-import - URL format: `https://custom-domain.com/` ### S3 Storage Features - **Multi-region support**: Automatically detects bucket region - **Public read access**: Buckets configured for public read-only access - **Bucket policy**: Applied automatically for public read access - **Region mapping**: AWS regions mapped to short codes (US, EU, SG, etc.) - **Duplicate prevention**: Checks existing registrations before importing ### Helper Scripts **List S3 FQDN sites**: ```bash uv run python scripts/list_s3_fqdn_sites.py ``` **Delete sites by ID**: ```bash # Edit scripts/delete_sites.py to set site_ids, then: uv run python scripts/delete_sites.py ``` **Check sites around specific IDs**: ```bash # Edit scripts/list_sites_by_id.py to set target_ids, then: uv run python scripts/list_sites_by_id.py ``` ## Project Management ### Ingest CORA Report ```bash uv run python main.py ingest-cora \ --file shaft_machining.xlsx \ --name "Shaft Machining Project" \ --custom-anchors "shaft repair,engine parts" \ --username admin \ --password adminpass ``` ### List Projects ```bash uv run python main.py list-projects \ --username admin \ --password adminpass ``` ## Content Generation ### Create Job Configuration ```bash # Tier 1 only uv run python create_job_config.py 1 tier1 15 # Multi-tier uv run python create_job_config.py 1 multi 15 50 100 ``` ### Generate Content Batch ```bash uv run python main.py generate-batch \ --job-file jobs/project_1_tier1_15articles.json \ --username admin \ --password adminpass ``` With options: ```bash uv run python main.py generate-batch \ --job-file jobs/my_job.json \ --username admin \ --password adminpass \ --debug \ --continue-on-error \ --model gpt-4o-mini ``` Available models: `gpt-4o-mini`, `claude-sonnet-4.5`, - anything at openrouter **Note:** If your job file contains a `models` config, it will override the `--model` flag and use different models for title, outline, and content generation stages. ## Deployment ### Deploy Batch ```bash # Automatic deployment (runs after generation) uv run python main.py generate-batch \ --job-file jobs/my_job.json \ --username admin \ --password adminpass # Manual deployment uv run python main.py deploy-batch \ --batch-id 123 \ --admin-user admin \ --admin-password adminpass ``` ### Dry Run Deployment ```bash uv run python main.py deploy-batch \ --batch-id 123 \ --dry-run ``` ### Verify Deployment ```bash # Check all URLs uv run python main.py verify-deployment --batch-id 123 # Check random sample uv run python main.py verify-deployment \ --batch-id 123 \ --sample 10 \ --timeout 10 ``` ## Link Export ### Export Article URLs ```bash # Tier 1 only uv run python main.py get-links \ --project-id 123 \ --tier 1 # Tier 2 and above uv run python main.py get-links \ --project-id 123 \ --tier 2+ # With anchor text and destinations uv run python main.py get-links \ --project-id 123 \ --tier 2+ \ --with-anchor-text \ --with-destination-url ``` Output is CSV format to stdout. Redirect to save: ```bash uv run python main.py get-links \ --project-id 123 \ --tier 1 > tier1_urls.csv ``` ## Utility Scripts ### Add robots.txt to All Buckets Add a standardized robots.txt file to all storage buckets (both S3 and Bunny) that blocks SEO tools and bad bots while allowing legitimate search engines and AI crawlers: ```bash # Preview what would be done (recommended first) uv run python scripts/add_robots_txt_to_buckets.py --dry-run # Upload to all buckets uv run python scripts/add_robots_txt_to_buckets.py # Only process S3 buckets uv run python scripts/add_robots_txt_to_buckets.py --provider s3 # Only process Bunny storage zones uv run python scripts/add_robots_txt_to_buckets.py --provider bunny ``` **robots.txt behavior:** - Allows: Google, Bing, Yahoo, DuckDuckGo, Baidu, Yandex - Allows: GPTBot, Claude, Common Crawl, Perplexity, ByteDance AI - Blocks: Ahrefs, Semrush, Moz, and other SEO tools - Blocks: HTTrack, Wget, and other scrapers/bad bots The script is idempotent (safe to run multiple times) and will overwrite existing robots.txt files. It continues processing remaining buckets if one fails and reports all failures at the end. ### Check Last Generated Content ```bash uv run python check_last_gen.py ``` ### List All Users (Direct DB Access) ```bash uv run python scripts/list_users.py ``` ### Add Admin (Direct DB Access) ```bash uv run python scripts/add_admin_direct.py ``` ### Check Migration Status ```bash uv run python scripts/check_migration.py ``` ### Add Tier to Projects ```bash uv run python scripts/add_tier_to_projects.py ``` ## Testing ### Run All Tests ```bash uv run pytest ``` ### Run Unit Tests ```bash uv run pytest tests/unit/ -v ``` ### Run Integration Tests ```bash uv run pytest tests/integration/ -v ``` ### Run Specific Test File ```bash uv run pytest tests/unit/test_url_generator.py -v ``` ### Run Story 3.1 Tests ```bash uv run pytest tests/unit/test_url_generator.py \ tests/unit/test_site_provisioning.py \ tests/unit/test_site_assignment.py \ tests/unit/test_job_config_extensions.py \ tests/integration/test_story_3_1_integration.py \ -v ``` ### Run with Coverage ```bash uv run pytest --cov=src --cov-report=html ``` ## System Information ### Show Configuration ```bash uv run python main.py config ``` ### Health Check ```bash uv run python main.py health ``` ### List Available Models ```bash uv run python main.py models ``` ## Directory Structure ``` Big-Link-Man/ ├── main.py # CLI entry point ├── src/ # Source code │ ├── api/ # FastAPI endpoints │ ├── auth/ # Authentication │ ├── cli/ # CLI commands │ ├── core/ # Configuration │ ├── database/ # Models, repositories │ ├── deployment/ # Cloud deployment │ ├── generation/ # Content generation │ ├── ingestion/ # CORA parsing │ ├── interlinking/ # Link injection │ └── templating/ # HTML templates ├── scripts/ # Database & utility scripts ├── tests/ # Test suite │ ├── unit/ │ └── integration/ ├── jobs/ # Job configuration files ├── docs/ # Documentation └── deployment_logs/ # Deployed URL logs ``` ## Job Configuration Format Example job config (`jobs/example.json`): ```json { "job_name": "Multi-Tier Launch", "project_id": 1, "description": "Site build with 165 articles", "models": { "title": "openai/gpt-4o-mini", "outline": "anthropic/claude-3.5-sonnet", "content": "anthropic/claude-3.5-sonnet" }, "tiers": [ { "tier": 1, "article_count": 15, "validation_attempts": 3 }, { "tier": 2, "article_count": 50, "validation_attempts": 2 } ], "failure_config": { "max_consecutive_failures": 10, "skip_on_failure": true }, "interlinking": { "links_per_article_min": 2, "links_per_article_max": 4, "include_home_link": true }, "deployment_targets": ["www.primary.com"], "tier1_preferred_sites": ["www.premium.com"], "auto_create_sites": true } ``` ### Per-Stage Model Configuration You can specify different AI models for each generation stage (title, outline, content): ```json { "models": { "title": "openai/gpt-4o-mini", "outline": "anthropic/claude-3.5-sonnet", "content": "openai/gpt-4o" } } ``` **Available models:** - `openai/gpt-4o-mini` - Fast and cost-effective - `openai/gpt-4o` - Higher quality, more expensive - `anthropic/claude-3.5-sonnet` - Excellent for long-form content If `models` is not specified in the job file, all stages use the model from the `--model` CLI flag (default: `gpt-4o-mini`). ## Common Workflows ### Initial Setup ```bash uv pip install -r requirements.txt cp env.example .env # Edit .env uv run python scripts/init_db.py uv run python scripts/create_first_admin.py uv run python main.py sync-sites --admin-user admin ``` ### New Project Workflow ```bash # 1. Ingest CORA report uv run python main.py ingest-cora \ --file project.xlsx \ --name "My Project" \ --username admin \ --password adminpass # 2. Create job config uv run python create_job_config.py 1 multi 15 50 100 # 3. Generate content (auto-deploys) uv run python main.py generate-batch \ --job-file jobs/project_1_multi_3tiers_165articles.json \ --username admin \ --password adminpass # 4. Verify deployment uv run python main.py verify-deployment --batch-id 1 # 5. Export URLs for link building uv run python main.py get-links \ --project-id 1 \ --tier 1 > tier1_urls.csv ``` ### Re-deploy After Changes ```bash uv run python main.py deploy-batch \ --batch-id 123 \ --admin-user admin \ --admin-password adminpass ``` ## Troubleshooting ### Database locked ```bash # Stop any running processes, then: uv run python scripts/init_db.py reset ``` ### Missing dependencies ```bash uv pip install -r requirements.txt --force-reinstall ``` ### AI API errors Check `OPENROUTER_API_KEY` in `.env` ### Bunny.net authentication failed Check `BUNNY_ACCOUNT_API_KEY` in `.env` ### Storage upload failed Verify `storage_zone_password` in database (set during site provisioning) ## Documentation - **CLI Command Reference**: `docs/CLI_COMMAND_REFERENCE.md` - Comprehensive documentation for all CLI commands - Product Requirements: `docs/prd.md` - Architecture: `docs/architecture/` - Implementation Summaries: `STORY_*.md` files - Quick Start Guides: `*_QUICKSTART.md` files ### Regenerating CLI Documentation To regenerate the CLI command reference after adding or modifying commands: ```bash uv run python scripts/generate_cli_docs.py ``` This will update `docs/CLI_COMMAND_REFERENCE.md` with all current commands and their options. ## License All rights reserved.