Big-Link-Man/docs/job-schema.md

12 KiB

Job Configuration Schema

This document defines the complete schema for job configuration files used in the Big-Link-Man content automation platform. All job files are JSON format and define batch content generation parameters.

Root Structure

{
  "jobs": [
    {
      // Job object (see Job Object section below)
    }
  ]
}

Root Fields

Field Type Required Description
jobs Array<Job> Yes Array of job definitions to process

Job Object

Each job object defines a complete content generation batch for a specific project.

Required Fields

Field Type Description
project_id integer The project ID to generate content for
tiers Object Dictionary of tier configurations (see Tier Configuration section)

Optional Fields

Field Type Default Description
models Object Uses CLI default AI models to use for each generation stage (title, outline, content)
deployment_targets Array<string> null Array of site custom_hostnames for tier1 deployment assignment (Story 2.5)
tier1_preferred_sites Array<string> null Array of hostnames for tier1 site assignment priority (Story 3.1)
auto_create_sites boolean false Whether to auto-create sites when pool is insufficient (Story 3.1)
create_sites_for_keywords Array<Object> null Array of keyword site creation configs (Story 3.1)
tiered_link_count_range Object null Configuration for tiered link counts (Story 3.2)

Tier Configuration

Each tier in the tiers object defines content generation parameters for that specific tier level.

Tier Keys

  • tier1 - Premium content (highest quality)
  • tier2 - Standard content (medium quality)
  • tier3 - Supporting content (basic quality)

Tier Fields

Field Type Required Default Description
count integer Yes - Number of articles to generate for this tier
min_word_count integer No See defaults Minimum word count for articles
max_word_count integer No See defaults Maximum word count for articles
min_h2_tags integer No See defaults Minimum number of H2 headings
max_h2_tags integer No See defaults Maximum number of H2 headings
min_h3_tags integer No See defaults Minimum number of H3 subheadings
max_h3_tags integer No See defaults Maximum number of H3 subheadings

Tier Defaults

Tier 1 Defaults

{
  "min_word_count": 2000,
  "max_word_count": 2500,
  "min_h2_tags": 3,
  "max_h2_tags": 5,
  "min_h3_tags": 5,
  "max_h3_tags": 10
}

Tier 2 Defaults

{
  "min_word_count": 1500,
  "max_word_count": 2000,
  "min_h2_tags": 2,
  "max_h2_tags": 4,
  "min_h3_tags": 3,
  "max_h3_tags": 8
}

Tier 3 Defaults

{
  "min_word_count": 1000,
  "max_word_count": 1500,
  "min_h2_tags": 2,
  "max_h2_tags": 3,
  "min_h3_tags": 2,
  "max_h3_tags": 6
}

Deployment Target Assignment (Story 2.5)

deployment_targets

  • Type: Array<string> (optional)
  • Purpose: Assigns tier1 articles to specific sites in round-robin fashion
  • Behavior:
    • Only affects tier1 articles
    • Articles 0 through N-1 get assigned to N deployment targets
    • Articles N and beyond get site_deployment_id = null
    • If not specified, all articles get site_deployment_id = null

Example

{
  "deployment_targets": [
    "www.domain1.com",
    "www.domain2.com", 
    "www.domain3.com"
  ]
}

Assignment Result:

  • Article 0 → www.domain1.com
  • Article 1 → www.domain2.com
  • Article 2 → www.domain3.com
  • Articles 3+ → null (no assignment)

Site Assignment (Story 3.1)

tier1_preferred_sites

  • Type: Array<string> (optional)
  • Purpose: Preferred sites for tier1 article assignment
  • Behavior: Used in priority order before random selection
  • Validation: All hostnames must exist in database

auto_create_sites

  • Type: boolean (optional, default: false)
  • Purpose: Auto-create sites when available pool is insufficient
  • Behavior: Creates generic sites using project keyword as prefix

create_sites_for_keywords

  • Type: Array<Object> (optional)
  • Purpose: Pre-create sites for specific keywords before assignment
  • Structure: Each object must have keyword (string) and count (integer)

Keyword Site Creation Object

Field Type Required Description
keyword string Yes Keyword to create sites for
count integer Yes Number of sites to create for this keyword

Example

{
  "tier1_preferred_sites": [
    "www.premium-site1.com",
    "site123.b-cdn.net"
  ],
  "auto_create_sites": true,
  "create_sites_for_keywords": [
    {
      "keyword": "engine repair",
      "count": 3
    },
    {
      "keyword": "car maintenance", 
      "count": 2
    }
  ]
}

AI Model Configuration

models

  • Type: Object (optional)
  • Purpose: Specifies AI models to use for each generation stage
  • Behavior: Allows different models for title, outline, and content generation
  • Note: If not specified, all stages use the model from CLI --model flag (default: gpt-4o-mini)

Models Object Fields

Field Type Description
title string Model to use for title generation
outline string Model to use for outline generation
content string Model to use for content generation

Available Models (from master.config.json)

  • anthropic/claude-sonnet-4.5 (Claude Sonnet 4.5)
  • anthropic/claude-3.5-sonnet (Claude 3.5 Sonnet)
  • openai/gpt-4o (GPT-4 Optimized)
  • openai/gpt-4o-mini (GPT-4 Mini)
  • meta-llama/llama-3.1-70b-instruct (Llama 3.1 70B)
  • meta-llama/llama-3.1-8b-instruct (Llama 3.1 8B)
  • google/gemini-2.5-flash (Gemini 2.5 Flash)

Example

{
  "models": {
    "title": "openai/gpt-4o-mini",
    "outline": "openai/gpt-4o",
    "content": "anthropic/claude-3.5-sonnet"
  }
}

Implementation Status

Implemented - The models field is fully functional. Different models can be specified for title, outline, and content generation stages. If a job file contains a models configuration and you also use the --model CLI flag, the system will warn you that the CLI flag is being ignored in favor of the job config.

Tiered Link Configuration (Story 3.2)

  • Type: Object (optional)
  • Purpose: Configures how many tiered links to generate per article
  • Default: {"min": 2, "max": 4} if not specified
Field Type Required Description
min integer Yes Minimum number of tiered links (must be >= 1)
max integer Yes Maximum number of tiered links (must be >= min)

Example

{
  "tiered_link_count_range": {
    "min": 3,
    "max": 5
  }
}

Complete Example

{
  "jobs": [
    {
      "project_id": 1,
      "models": {
        "title": "anthropic/claude-3.5-sonnet",
        "outline": "anthropic/claude-3.5-sonnet",
        "content": "openai/gpt-4o"
      },
      "deployment_targets": [
        "www.primary-domain.com",
        "www.secondary-domain.com"
      ],
      "tier1_preferred_sites": [
        "www.premium-site1.com",
        "site123.b-cdn.net"
      ],
      "auto_create_sites": true,
      "create_sites_for_keywords": [
        {
          "keyword": "engine repair",
          "count": 3
        },
        {
          "keyword": "car maintenance",
          "count": 2
        }
      ],
      "tiered_link_count_range": {
        "min": 3,
        "max": 5
      },
      "tiers": {
        "tier1": {
          "count": 10,
          "min_word_count": 2000,
          "max_word_count": 2500,
          "min_h2_tags": 3,
          "max_h2_tags": 5,
          "min_h3_tags": 5,
          "max_h3_tags": 10
        },
        "tier2": {
          "count": 50,
          "min_word_count": 1500,
          "max_word_count": 2000
        },
        "tier3": {
          "count": 100
        }
      }
    }
  ]
}

Validation Rules

Job Level Validation

  • project_id must be a positive integer
  • tiers must be an object with at least one tier
  • models must be an object with title, outline, and content fields (if specified)
  • deployment_targets must be an array of strings (if specified)
  • tier1_preferred_sites must be an array of strings (if specified)
  • auto_create_sites must be a boolean (if specified)
  • create_sites_for_keywords must be an array of objects with keyword and count fields (if specified)
  • tiered_link_count_range must have min >= 1 and max >= min (if specified)

Tier Level Validation

  • count must be a positive integer
  • min_word_count must be <= max_word_count
  • min_h2_tags must be <= max_h2_tags
  • min_h3_tags must be <= max_h3_tags

Site Assignment Validation

  • All hostnames in deployment_targets must exist in database
  • All hostnames in tier1_preferred_sites must exist in database
  • Keywords in create_sites_for_keywords must be non-empty strings
  • Count values in create_sites_for_keywords must be positive integers

Usage

CLI Command

uv run python main.py generate-batch --job-file jobs/example.json --username admin --password secret

Command Options

  • --job-file, -j: Path to job JSON file (required)
  • --username, -u: Username for authentication
  • --password, -p: Password for authentication
  • --debug: Save AI responses to debug_output/
  • --continue-on-error: Continue processing if article generation fails
  • --model, -m: AI model to use (default: gpt-4o-mini). Overridden by job file models config if present.

Implementation History

Story 2.2: Basic Content Generation

  • Added project_id and tiers fields
  • Added tier configuration with word count and heading constraints
  • Added tier defaults for common configurations

Story 2.3: AI Content Generation

  • Implemented: Per-stage model selection via job config models field
  • Implemented: Dynamic model switching in AIClient with override_model parameter
  • Implemented: CLI warning when job contains models but --model flag is used
  • Behavior: Job file models config takes precedence over CLI --model flag

Story 2.5: Deployment Target Assignment

  • Added deployment_targets field for tier1 site assignment
  • Implemented round-robin assignment logic
  • Added validation for deployment target hostnames

Story 3.1: URL Generation and Site Assignment

  • Added tier1_preferred_sites for priority-based assignment
  • Added auto_create_sites for on-demand site creation
  • Added create_sites_for_keywords for pre-creation of keyword sites
  • Extended site assignment beyond deployment targets

Story 3.2: Tiered Link Finding

  • Added tiered_link_count_range for configurable link counts
  • Integrated with tiered link generation system
  • Added validation for link count ranges

Future Extensions

The schema is designed to be extensible for future features:

  • Story 3.3: Content interlinking injection
  • Story 4.x: Cloud deployment and handoff
  • Future: Advanced site matching, cost tracking, analytics

Error Handling

Common Validation Errors

  • "Job missing 'project_id'" - Required field missing
  • "Job missing 'tiers'" - Required field missing
  • "'deployment_targets' must be an array" - Wrong data type
  • "Deployment targets not found in database: invalid.com" - Invalid hostname
  • "'tiered_link_count_range' min must be >= 1" - Invalid range value

Graceful Degradation

  • Missing optional fields use sensible defaults
  • Invalid hostnames cause clear error messages
  • Insufficient sites trigger auto-creation (if enabled) or clear errors
  • Failed articles are logged but don't stop batch processing (with --continue-on-error)