# Job Configuration Schema This document defines the complete schema for job configuration files used in the Big-Link-Man content automation platform. All job files are JSON format and define batch content generation parameters. ## Root Structure ```json { "jobs": [ { // Job object (see Job Object section below) } ] } ``` ### Root Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `jobs` | `Array` | Yes | Array of job definitions to process | ## Job Object Each job object defines a complete content generation batch for a specific project. ### Required Fields | Field | Type | Description | |-------|------|-------------| | `project_id` | `integer` | The project ID to generate content for | | `tiers` | `Object` | Dictionary of tier configurations (see Tier Configuration section) | ### Optional Fields | Field | Type | Default | Description | |-------|------|---------|-------------| | `models` | `Object` | Uses CLI default | AI models to use for each generation stage (title, outline, content) | | `deployment_targets` | `Array` | `null` | Array of site custom_hostnames for tier1 deployment assignment (Story 2.5) | | `tier1_preferred_sites` | `Array` | `null` | Array of hostnames for tier1 site assignment priority (Story 3.1) | | `auto_create_sites` | `boolean` | `false` | Whether to auto-create sites when pool is insufficient (Story 3.1) | | `create_sites_for_keywords` | `Array` | `null` | Array of keyword site creation configs (Story 3.1) | | `tiered_link_count_range` | `Object` | `null` | Configuration for tiered link counts (Story 3.2) | ## Tier Configuration Each tier in the `tiers` object defines content generation parameters for that specific tier level. ### Tier Keys - `tier1` - Premium content (highest quality) - `tier2` - Standard content (medium quality) - `tier3` - Supporting content (basic quality) ### Tier Fields | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `count` | `integer` | Yes | - | Number of articles to generate for this tier | | `min_word_count` | `integer` | No | See defaults | Minimum word count for articles | | `max_word_count` | `integer` | No | See defaults | Maximum word count for articles | | `min_h2_tags` | `integer` | No | See defaults | Minimum number of H2 headings | | `max_h2_tags` | `integer` | No | See defaults | Maximum number of H2 headings | | `min_h3_tags` | `integer` | No | See defaults | Minimum number of H3 subheadings | | `max_h3_tags` | `integer` | No | See defaults | Maximum number of H3 subheadings | ### Tier Defaults #### Tier 1 Defaults ```json { "min_word_count": 2000, "max_word_count": 2500, "min_h2_tags": 3, "max_h2_tags": 5, "min_h3_tags": 5, "max_h3_tags": 10 } ``` #### Tier 2 Defaults ```json { "min_word_count": 1500, "max_word_count": 2000, "min_h2_tags": 2, "max_h2_tags": 4, "min_h3_tags": 3, "max_h3_tags": 8 } ``` #### Tier 3 Defaults ```json { "min_word_count": 1000, "max_word_count": 1500, "min_h2_tags": 2, "max_h2_tags": 3, "min_h3_tags": 2, "max_h3_tags": 6 } ``` ## Deployment Target Assignment (Story 2.5) ### `deployment_targets` - **Type**: `Array` (optional) - **Purpose**: Assigns tier1 articles to specific sites in round-robin fashion - **Behavior**: - Only affects tier1 articles - Articles 0 through N-1 get assigned to N deployment targets - Articles N and beyond get `site_deployment_id = null` - If not specified, all articles get `site_deployment_id = null` ### Example ```json { "deployment_targets": [ "www.domain1.com", "www.domain2.com", "www.domain3.com" ] } ``` **Assignment Result:** - Article 0 → www.domain1.com - Article 1 → www.domain2.com - Article 2 → www.domain3.com - Articles 3+ → null (no assignment) ## Site Assignment (Story 3.1) ### `tier1_preferred_sites` - **Type**: `Array` (optional) - **Purpose**: Preferred sites for tier1 article assignment - **Behavior**: Used in priority order before random selection - **Validation**: All hostnames must exist in database ### `auto_create_sites` - **Type**: `boolean` (optional, default: `false`) - **Purpose**: Auto-create sites when available pool is insufficient - **Behavior**: Creates generic sites using project keyword as prefix ### `create_sites_for_keywords` - **Type**: `Array` (optional) - **Purpose**: Pre-create sites for specific keywords before assignment - **Structure**: Each object must have `keyword` (string) and `count` (integer) #### Keyword Site Creation Object | Field | Type | Required | Description | |-------|------|----------|-------------| | `keyword` | `string` | Yes | Keyword to create sites for | | `count` | `integer` | Yes | Number of sites to create for this keyword | ### Example ```json { "tier1_preferred_sites": [ "www.premium-site1.com", "site123.b-cdn.net" ], "auto_create_sites": true, "create_sites_for_keywords": [ { "keyword": "engine repair", "count": 3 }, { "keyword": "car maintenance", "count": 2 } ] } ``` ## AI Model Configuration ### `models` - **Type**: `Object` (optional) - **Purpose**: Specifies AI models to use for each generation stage - **Behavior**: Allows different models for title, outline, and content generation - **Note**: If not specified, all stages use the model from CLI `--model` flag (default: `gpt-4o-mini`) #### Models Object Fields | Field | Type | Description | |-------|------|-------------| | `title` | `string` | Model to use for title generation | | `outline` | `string` | Model to use for outline generation | | `content` | `string` | Model to use for content generation | ### Available Models (from master.config.json) - `anthropic/claude-sonnet-4.5` (Claude Sonnet 4.5) - `anthropic/claude-3.5-sonnet` (Claude 3.5 Sonnet) - `openai/gpt-4o` (GPT-4 Optimized) - `openai/gpt-4o-mini` (GPT-4 Mini) - `meta-llama/llama-3.1-70b-instruct` (Llama 3.1 70B) - `meta-llama/llama-3.1-8b-instruct` (Llama 3.1 8B) - `google/gemini-2.5-flash` (Gemini 2.5 Flash) ### Example ```json { "models": { "title": "openai/gpt-4o-mini", "outline": "openai/gpt-4o", "content": "anthropic/claude-3.5-sonnet" } } ``` ### Implementation Status **Implemented** - The `models` field is fully functional. Different models can be specified for title, outline, and content generation stages. If a job file contains a `models` configuration and you also use the `--model` CLI flag, the system will warn you that the CLI flag is being ignored in favor of the job config. ## Tiered Link Configuration (Story 3.2) ### `tiered_link_count_range` - **Type**: `Object` (optional) - **Purpose**: Configures how many tiered links to generate per article - **Default**: `{"min": 2, "max": 4}` if not specified #### Tiered Link Range Object | Field | Type | Required | Description | |-------|------|----------|-------------| | `min` | `integer` | Yes | Minimum number of tiered links (must be >= 1) | | `max` | `integer` | Yes | Maximum number of tiered links (must be >= min) | ### Example ```json { "tiered_link_count_range": { "min": 3, "max": 5 } } ``` ## Complete Example ```json { "jobs": [ { "project_id": 1, "models": { "title": "anthropic/claude-3.5-sonnet", "outline": "anthropic/claude-3.5-sonnet", "content": "openai/gpt-4o" }, "deployment_targets": [ "www.primary-domain.com", "www.secondary-domain.com" ], "tier1_preferred_sites": [ "www.premium-site1.com", "site123.b-cdn.net" ], "auto_create_sites": true, "create_sites_for_keywords": [ { "keyword": "engine repair", "count": 3 }, { "keyword": "car maintenance", "count": 2 } ], "tiered_link_count_range": { "min": 3, "max": 5 }, "tiers": { "tier1": { "count": 10, "min_word_count": 2000, "max_word_count": 2500, "min_h2_tags": 3, "max_h2_tags": 5, "min_h3_tags": 5, "max_h3_tags": 10 }, "tier2": { "count": 50, "min_word_count": 1500, "max_word_count": 2000 }, "tier3": { "count": 100 } } } ] } ``` ## Validation Rules ### Job Level Validation - `project_id` must be a positive integer - `tiers` must be an object with at least one tier - `models` must be an object with `title`, `outline`, and `content` fields (if specified) - `deployment_targets` must be an array of strings (if specified) - `tier1_preferred_sites` must be an array of strings (if specified) - `auto_create_sites` must be a boolean (if specified) - `create_sites_for_keywords` must be an array of objects with `keyword` and `count` fields (if specified) - `tiered_link_count_range` must have `min` >= 1 and `max` >= `min` (if specified) ### Tier Level Validation - `count` must be a positive integer - `min_word_count` must be <= `max_word_count` - `min_h2_tags` must be <= `max_h2_tags` - `min_h3_tags` must be <= `max_h3_tags` ### Site Assignment Validation - All hostnames in `deployment_targets` must exist in database - All hostnames in `tier1_preferred_sites` must exist in database - Keywords in `create_sites_for_keywords` must be non-empty strings - Count values in `create_sites_for_keywords` must be positive integers ## Usage ### CLI Command ```bash uv run python main.py generate-batch --job-file jobs/example.json --username admin --password secret ``` ### Command Options - `--job-file, -j`: Path to job JSON file (required) - `--username, -u`: Username for authentication - `--password, -p`: Password for authentication - `--debug`: Save AI responses to debug_output/ - `--continue-on-error`: Continue processing if article generation fails - `--model, -m`: AI model to use (default: gpt-4o-mini). Overridden by job file `models` config if present. ## Implementation History ### Story 2.2: Basic Content Generation - Added `project_id` and `tiers` fields - Added tier configuration with word count and heading constraints - Added tier defaults for common configurations ### Story 2.3: AI Content Generation - **Implemented**: Per-stage model selection via job config `models` field - **Implemented**: Dynamic model switching in AIClient with `override_model` parameter - **Implemented**: CLI warning when job contains models but `--model` flag is used - **Behavior**: Job file `models` config takes precedence over CLI `--model` flag ### Story 2.5: Deployment Target Assignment - Added `deployment_targets` field for tier1 site assignment - Implemented round-robin assignment logic - Added validation for deployment target hostnames ### Story 3.1: URL Generation and Site Assignment - Added `tier1_preferred_sites` for priority-based assignment - Added `auto_create_sites` for on-demand site creation - Added `create_sites_for_keywords` for pre-creation of keyword sites - Extended site assignment beyond deployment targets ### Story 3.2: Tiered Link Finding - Added `tiered_link_count_range` for configurable link counts - Integrated with tiered link generation system - Added validation for link count ranges ## Future Extensions The schema is designed to be extensible for future features: - **Story 3.3**: Content interlinking injection - **Story 4.x**: Cloud deployment and handoff - **Future**: Advanced site matching, cost tracking, analytics ## Error Handling ### Common Validation Errors - `"Job missing 'project_id'"` - Required field missing - `"Job missing 'tiers'"` - Required field missing - `"'deployment_targets' must be an array"` - Wrong data type - `"Deployment targets not found in database: invalid.com"` - Invalid hostname - `"'tiered_link_count_range' min must be >= 1"` - Invalid range value ### Graceful Degradation - Missing optional fields use sensible defaults - Invalid hostnames cause clear error messages - Insufficient sites trigger auto-creation (if enabled) or clear errors - Failed articles are logged but don't stop batch processing (with `--continue-on-error`)