386 lines
12 KiB
Markdown
386 lines
12 KiB
Markdown
# Job Configuration Schema
|
|
|
|
This document defines the complete schema for job configuration files used in the Big-Link-Man content automation platform. All job files are JSON format and define batch content generation parameters.
|
|
|
|
## Root Structure
|
|
|
|
```json
|
|
{
|
|
"jobs": [
|
|
{
|
|
// Job object (see Job Object section below)
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Root Fields
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `jobs` | `Array<Job>` | Yes | Array of job definitions to process |
|
|
|
|
## Job Object
|
|
|
|
Each job object defines a complete content generation batch for a specific project.
|
|
|
|
### Required Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `project_id` | `integer` | The project ID to generate content for |
|
|
| `tiers` | `Object` | Dictionary of tier configurations (see Tier Configuration section) |
|
|
|
|
### Optional Fields
|
|
|
|
| Field | Type | Default | Description |
|
|
|-------|------|---------|-------------|
|
|
| `models` | `Object` | Uses CLI default | AI models to use for each generation stage (Story 2.3 - planned) |
|
|
| `deployment_targets` | `Array<string>` | `null` | Array of site custom_hostnames for tier1 deployment assignment (Story 2.5) |
|
|
| `tier1_preferred_sites` | `Array<string>` | `null` | Array of hostnames for tier1 site assignment priority (Story 3.1) |
|
|
| `auto_create_sites` | `boolean` | `false` | Whether to auto-create sites when pool is insufficient (Story 3.1) |
|
|
| `create_sites_for_keywords` | `Array<Object>` | `null` | Array of keyword site creation configs (Story 3.1) |
|
|
| `tiered_link_count_range` | `Object` | `null` | Configuration for tiered link counts (Story 3.2) |
|
|
|
|
## Tier Configuration
|
|
|
|
Each tier in the `tiers` object defines content generation parameters for that specific tier level.
|
|
|
|
### Tier Keys
|
|
- `tier1` - Premium content (highest quality)
|
|
- `tier2` - Standard content (medium quality)
|
|
- `tier3` - Supporting content (basic quality)
|
|
|
|
### Tier Fields
|
|
|
|
| Field | Type | Required | Default | Description |
|
|
|-------|------|----------|---------|-------------|
|
|
| `count` | `integer` | Yes | - | Number of articles to generate for this tier |
|
|
| `min_word_count` | `integer` | No | See defaults | Minimum word count for articles |
|
|
| `max_word_count` | `integer` | No | See defaults | Maximum word count for articles |
|
|
| `min_h2_tags` | `integer` | No | See defaults | Minimum number of H2 headings |
|
|
| `max_h2_tags` | `integer` | No | See defaults | Maximum number of H2 headings |
|
|
| `min_h3_tags` | `integer` | No | See defaults | Minimum number of H3 subheadings |
|
|
| `max_h3_tags` | `integer` | No | See defaults | Maximum number of H3 subheadings |
|
|
|
|
### Tier Defaults
|
|
|
|
#### Tier 1 Defaults
|
|
```json
|
|
{
|
|
"min_word_count": 2000,
|
|
"max_word_count": 2500,
|
|
"min_h2_tags": 3,
|
|
"max_h2_tags": 5,
|
|
"min_h3_tags": 5,
|
|
"max_h3_tags": 10
|
|
}
|
|
```
|
|
|
|
#### Tier 2 Defaults
|
|
```json
|
|
{
|
|
"min_word_count": 1500,
|
|
"max_word_count": 2000,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 4,
|
|
"min_h3_tags": 3,
|
|
"max_h3_tags": 8
|
|
}
|
|
```
|
|
|
|
#### Tier 3 Defaults
|
|
```json
|
|
{
|
|
"min_word_count": 1000,
|
|
"max_word_count": 1500,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 3,
|
|
"min_h3_tags": 2,
|
|
"max_h3_tags": 6
|
|
}
|
|
```
|
|
|
|
## Deployment Target Assignment (Story 2.5)
|
|
|
|
### `deployment_targets`
|
|
- **Type**: `Array<string>` (optional)
|
|
- **Purpose**: Assigns tier1 articles to specific sites in round-robin fashion
|
|
- **Behavior**:
|
|
- Only affects tier1 articles
|
|
- Articles 0 through N-1 get assigned to N deployment targets
|
|
- Articles N and beyond get `site_deployment_id = null`
|
|
- If not specified, all articles get `site_deployment_id = null`
|
|
|
|
### Example
|
|
```json
|
|
{
|
|
"deployment_targets": [
|
|
"www.domain1.com",
|
|
"www.domain2.com",
|
|
"www.domain3.com"
|
|
]
|
|
}
|
|
```
|
|
|
|
**Assignment Result:**
|
|
- Article 0 → www.domain1.com
|
|
- Article 1 → www.domain2.com
|
|
- Article 2 → www.domain3.com
|
|
- Articles 3+ → null (no assignment)
|
|
|
|
## Site Assignment (Story 3.1)
|
|
|
|
### `tier1_preferred_sites`
|
|
- **Type**: `Array<string>` (optional)
|
|
- **Purpose**: Preferred sites for tier1 article assignment
|
|
- **Behavior**: Used in priority order before random selection
|
|
- **Validation**: All hostnames must exist in database
|
|
|
|
### `auto_create_sites`
|
|
- **Type**: `boolean` (optional, default: `false`)
|
|
- **Purpose**: Auto-create sites when available pool is insufficient
|
|
- **Behavior**: Creates generic sites using project keyword as prefix
|
|
|
|
### `create_sites_for_keywords`
|
|
- **Type**: `Array<Object>` (optional)
|
|
- **Purpose**: Pre-create sites for specific keywords before assignment
|
|
- **Structure**: Each object must have `keyword` (string) and `count` (integer)
|
|
|
|
#### Keyword Site Creation Object
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `keyword` | `string` | Yes | Keyword to create sites for |
|
|
| `count` | `integer` | Yes | Number of sites to create for this keyword |
|
|
|
|
### Example
|
|
```json
|
|
{
|
|
"tier1_preferred_sites": [
|
|
"www.premium-site1.com",
|
|
"site123.b-cdn.net"
|
|
],
|
|
"auto_create_sites": true,
|
|
"create_sites_for_keywords": [
|
|
{
|
|
"keyword": "engine repair",
|
|
"count": 3
|
|
},
|
|
{
|
|
"keyword": "car maintenance",
|
|
"count": 2
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## AI Model Configuration (Story 2.3 - Not Yet Implemented)
|
|
|
|
### `models`
|
|
- **Type**: `Object` (optional)
|
|
- **Purpose**: Specifies AI models to use for each generation stage
|
|
- **Behavior**: Allows different models for title, outline, and content generation
|
|
- **Note**: Currently not parsed by job config - uses CLI `--model` flag instead
|
|
|
|
#### Models Object Fields
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `title` | `string` | Model to use for title generation |
|
|
| `outline` | `string` | Model to use for outline generation |
|
|
| `content` | `string` | Model to use for content generation |
|
|
|
|
### Available Models (from master.config.json)
|
|
- `anthropic/claude-sonnet-4.5` (Claude Sonnet 4.5)
|
|
- `anthropic/claude-3.5-sonnet` (Claude 3.5 Sonnet)
|
|
- `openai/gpt-4o` (GPT-4 Optimized)
|
|
- `openai/gpt-4o-mini` (GPT-4 Mini)
|
|
- `meta-llama/llama-3.1-70b-instruct` (Llama 3.1 70B)
|
|
- `meta-llama/llama-3.1-8b-instruct` (Llama 3.1 8B)
|
|
- `google/gemini-2.5-flash` (Gemini 2.5 Flash)
|
|
|
|
### Example
|
|
```json
|
|
{
|
|
"models": {
|
|
"title": "openai/gpt-4o-mini",
|
|
"outline": "openai/gpt-4o",
|
|
"content": "anthropic/claude-3.5-sonnet"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Implementation Status
|
|
This field is defined in the JSON schema but **not yet implemented** in the job config parser (`src/generation/job_config.py`). Currently, all stages use the same model specified via CLI `--model` flag.
|
|
|
|
## Tiered Link Configuration (Story 3.2)
|
|
|
|
### `tiered_link_count_range`
|
|
- **Type**: `Object` (optional)
|
|
- **Purpose**: Configures how many tiered links to generate per article
|
|
- **Default**: `{"min": 2, "max": 4}` if not specified
|
|
|
|
#### Tiered Link Range Object
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `min` | `integer` | Yes | Minimum number of tiered links (must be >= 1) |
|
|
| `max` | `integer` | Yes | Maximum number of tiered links (must be >= min) |
|
|
|
|
### Example
|
|
```json
|
|
{
|
|
"tiered_link_count_range": {
|
|
"min": 3,
|
|
"max": 5
|
|
}
|
|
}
|
|
```
|
|
|
|
## Complete Example
|
|
|
|
```json
|
|
{
|
|
"jobs": [
|
|
{
|
|
"project_id": 1,
|
|
"models": {
|
|
"title": "anthropic/claude-3.5-sonnet",
|
|
"outline": "anthropic/claude-3.5-sonnet",
|
|
"content": "openai/gpt-4o"
|
|
},
|
|
"deployment_targets": [
|
|
"www.primary-domain.com",
|
|
"www.secondary-domain.com"
|
|
],
|
|
"tier1_preferred_sites": [
|
|
"www.premium-site1.com",
|
|
"site123.b-cdn.net"
|
|
],
|
|
"auto_create_sites": true,
|
|
"create_sites_for_keywords": [
|
|
{
|
|
"keyword": "engine repair",
|
|
"count": 3
|
|
},
|
|
{
|
|
"keyword": "car maintenance",
|
|
"count": 2
|
|
}
|
|
],
|
|
"tiered_link_count_range": {
|
|
"min": 3,
|
|
"max": 5
|
|
},
|
|
"tiers": {
|
|
"tier1": {
|
|
"count": 10,
|
|
"min_word_count": 2000,
|
|
"max_word_count": 2500,
|
|
"min_h2_tags": 3,
|
|
"max_h2_tags": 5,
|
|
"min_h3_tags": 5,
|
|
"max_h3_tags": 10
|
|
},
|
|
"tier2": {
|
|
"count": 50,
|
|
"min_word_count": 1500,
|
|
"max_word_count": 2000
|
|
},
|
|
"tier3": {
|
|
"count": 100
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Validation Rules
|
|
|
|
### Job Level Validation
|
|
- `project_id` must be a positive integer
|
|
- `tiers` must be an object with at least one tier
|
|
- `models` must be an object with `title`, `outline`, and `content` fields (if specified) - **NOT YET VALIDATED**
|
|
- `deployment_targets` must be an array of strings (if specified)
|
|
- `tier1_preferred_sites` must be an array of strings (if specified)
|
|
- `auto_create_sites` must be a boolean (if specified)
|
|
- `create_sites_for_keywords` must be an array of objects with `keyword` and `count` fields (if specified)
|
|
- `tiered_link_count_range` must have `min` >= 1 and `max` >= `min` (if specified)
|
|
|
|
### Tier Level Validation
|
|
- `count` must be a positive integer
|
|
- `min_word_count` must be <= `max_word_count`
|
|
- `min_h2_tags` must be <= `max_h2_tags`
|
|
- `min_h3_tags` must be <= `max_h3_tags`
|
|
|
|
### Site Assignment Validation
|
|
- All hostnames in `deployment_targets` must exist in database
|
|
- All hostnames in `tier1_preferred_sites` must exist in database
|
|
- Keywords in `create_sites_for_keywords` must be non-empty strings
|
|
- Count values in `create_sites_for_keywords` must be positive integers
|
|
|
|
## Usage
|
|
|
|
### CLI Command
|
|
```bash
|
|
uv run python main.py generate-batch --job-file jobs/example.json --username admin --password secret
|
|
```
|
|
|
|
### Command Options
|
|
- `--job-file, -j`: Path to job JSON file (required)
|
|
- `--username, -u`: Username for authentication
|
|
- `--password, -p`: Password for authentication
|
|
- `--debug`: Save AI responses to debug_output/
|
|
- `--continue-on-error`: Continue processing if article generation fails
|
|
- `--model, -m`: AI model to use (default: gpt-4o-mini)
|
|
|
|
## Implementation History
|
|
|
|
### Story 2.2: Basic Content Generation
|
|
- Added `project_id` and `tiers` fields
|
|
- Added tier configuration with word count and heading constraints
|
|
- Added tier defaults for common configurations
|
|
|
|
### Story 2.3: AI Content Generation (Partial)
|
|
- **Implemented**: Database fields for tracking models (title_model, outline_model, content_model)
|
|
- **Not Implemented**: Job config `models` field - currently uses CLI `--model` flag
|
|
- **Planned**: Per-stage model selection from job configuration
|
|
|
|
### Story 2.5: Deployment Target Assignment
|
|
- Added `deployment_targets` field for tier1 site assignment
|
|
- Implemented round-robin assignment logic
|
|
- Added validation for deployment target hostnames
|
|
|
|
### Story 3.1: URL Generation and Site Assignment
|
|
- Added `tier1_preferred_sites` for priority-based assignment
|
|
- Added `auto_create_sites` for on-demand site creation
|
|
- Added `create_sites_for_keywords` for pre-creation of keyword sites
|
|
- Extended site assignment beyond deployment targets
|
|
|
|
### Story 3.2: Tiered Link Finding
|
|
- Added `tiered_link_count_range` for configurable link counts
|
|
- Integrated with tiered link generation system
|
|
- Added validation for link count ranges
|
|
|
|
## Future Extensions
|
|
|
|
The schema is designed to be extensible for future features:
|
|
|
|
- **Story 3.3**: Content interlinking injection
|
|
- **Story 4.x**: Cloud deployment and handoff
|
|
- **Future**: Advanced site matching, cost tracking, analytics
|
|
|
|
## Error Handling
|
|
|
|
### Common Validation Errors
|
|
- `"Job missing 'project_id'"` - Required field missing
|
|
- `"Job missing 'tiers'"` - Required field missing
|
|
- `"'deployment_targets' must be an array"` - Wrong data type
|
|
- `"Deployment targets not found in database: invalid.com"` - Invalid hostname
|
|
- `"'tiered_link_count_range' min must be >= 1"` - Invalid range value
|
|
|
|
### Graceful Degradation
|
|
- Missing optional fields use sensible defaults
|
|
- Invalid hostnames cause clear error messages
|
|
- Insufficient sites trigger auto-creation (if enabled) or clear errors
|
|
- Failed articles are logged but don't stop batch processing (with `--continue-on-error`)
|