14 KiB

Raw Blame History

Story 2.3: AI-Powered Content Generation - COMPLETED

Overview

Implemented a comprehensive AI-powered content generation system with three-stage pipeline (title → outline → content), validation at each stage, programmatic augmentation for CORA compliance, and batch job processing across multiple tiers.

Status

COMPLETED

Story Details

As a User, I want to execute a job for a project that uses AI to generate a title, an outline, and full-text content, so that the core content is created automatically.

Acceptance Criteria - ALL MET

1. Script Initiation for Projects

Status: COMPLETE

CLI command: generate-batch --job-file <path>
Supports batch processing across multiple tiers
Job configuration via JSON files
Progress tracking and error reporting

2. AI-Powered Generation Using SEO Data

Status: COMPLETE

Title generation with keyword validation
Outline generation meeting CORA H2/H3 targets
Full HTML content generation
Uses project's SEO data (keywords, entities, related searches)
Multiple AI models supported via OpenRouter

3. Content Rule Engine Validation

Status: COMPLETE

Validates at each stage (title, outline, content)
Uses ContentRuleEngine from Story 2.2
Tier-aware validation (strict for Tier 1)
Detailed error reporting

4. Database Storage

Status: COMPLETE

Title, outline, and content stored in GeneratedContent table
Version tracking and metadata
Tracks attempts, models used, validation results
Augmentation logs

5. Progress Logging

Status: COMPLETE

Real-time progress updates via CLI
Logs: "Generating title...", "Generating content...", etc.
Tracks successful, failed, and skipped articles
Detailed summary reports

6. AI Service Error Handling

Status: COMPLETE

Graceful handling of API errors
Retry logic with configurable attempts
Fallback to programmatic augmentation
Continue or stop on failures (configurable)

Implementation Details

Architecture Components

1. Database Models (`src/database/models.py`)

GeneratedContent Model:

class GeneratedContent(Base):
    id, project_id, tier
    title, outline, content
    status, is_active
    generation_stage
    title_attempts, outline_attempts, content_attempts
    title_model, outline_model, content_model
    validation_errors, validation_warnings
    validation_report (JSON)
    word_count, augmented
    augmentation_log (JSON)
    generation_duration
    error_message
    created_at, updated_at

2. AI Client (`src/generation/ai_client.py`)

Features:

OpenRouter API integration
Multiple model support
JSON-formatted responses
Error handling and retries
Model validation

Available Models:

Claude 3.5 Sonnet (default)
Claude 3 Haiku
GPT-4o / GPT-4o-mini
Llama 3.1 70B/8B
Gemini Pro 1.5

3. Job Configuration (`src/generation/job_config.py`)

Job Structure:

{
  "job_name": "Batch Name",
  "project_id": 1,
  "tiers": [
    {
      "tier": 1,
      "article_count": 15,
      "models": {
        "title": "model-id",
        "outline": "model-id",
        "content": "model-id"
      },
      "anchor_text_config": {
        "mode": "default|override|append"
      },
      "validation_attempts": 3
    }
  ],
  "failure_config": {
    "max_consecutive_failures": 5,
    "skip_on_failure": true
  }
}

4. Three-Stage Generation Pipeline (`src/generation/service.py`)

Stage 1: Title Generation

Uses title_generation.json prompt
Validates keyword presence and length
Retries on validation failure
Max attempts configurable

Stage 2: Outline Generation

Uses outline_generation.json prompt
Returns JSON structure with H1, H2s, H3s
Validates CORA targets (H2/H3 counts, keyword distribution)
AI retry → Programmatic augmentation if needed
Ensures FAQ section present

Stage 3: Content Generation

Uses content_generation.json prompt
Follows validated outline structure
Generates full HTML (no CSS, just semantic markup)
Validates against all CORA rules
AI retry → Augmentation if needed

5. Stage Validation (`src/generation/validator.py`)

Title Validation:

Length (30-100 chars)
Keyword presence
Non-empty

Outline Validation:

H1 contains keyword
H2/H3 counts meet targets
Keyword distribution in headings
Entity and related search incorporation
FAQ section present
Tier-aware strictness

Content Validation:

Full CORA rule validation
Word count (min/max)
Keyword frequency
Heading structure
FAQ format
Image alt text (when applicable)

6. Content Augmentation (`src/generation/augmenter.py`)

Outline Augmentation:

Add missing H2s with keywords
Add H3s with entities
Modify existing headings
Maintain logical flow

Content Augmentation:

Strategy 1: Ask AI to add paragraphs (small deficits)
Strategy 2: Programmatically insert terms (large deficits)
Insert keywords into random sentences
Capitalize if sentence-initial
Add complete paragraphs with missing elements

7. Batch Processor (`src/generation/batch_processor.py`)

Features:

Process multiple tiers sequentially
Track progress per tier
Handle failures (skip or stop)
Consecutive failure threshold
Real-time progress callbacks
Detailed result reporting

8. Prompt Templates (`src/generation/prompts/`)

Files:

title_generation.json - Title prompts
outline_generation.json - Outline structure prompts
content_generation.json - Full content prompts
outline_augmentation.json - Outline fix prompts
content_augmentation.json - Content enhancement prompts

Format:

{
  "system": "System message",
  "user_template": "Prompt with {placeholders}",
  "validation": {
    "output_format": "text|json|html",
    "requirements": []
  }
}

CLI Command

python main.py generate-batch \
  --job-file jobs/example_tier1_batch.json \
  --username admin \
  --password password

Options:

--job-file, -j: Path to job configuration JSON (required)
--force-regenerate, -f: Force regeneration (flag, not implemented)
--username, -u: Authentication username
--password, -p: Authentication password

Example Output:

Authenticated as: admin (Admin)

Loading Job: Tier 1 Launch Batch
Project ID: 1
Total Articles: 15

Tiers:
  Tier 1: 15 articles
    Models: gpt-4o-mini / claude-3.5-sonnet / claude-3.5-sonnet

Proceed with generation? [y/N]: y

Starting batch generation...
--------------------------------------------------------------------------------
[Tier 1] Article 1/15: Generating...
[Tier 1] Article 1/15: Completed (ID: 1)
[Tier 1] Article 2/15: Generating...
...
--------------------------------------------------------------------------------

Batch Generation Complete!
Job: Tier 1 Launch Batch
Project ID: 1
Duration: 1234.56s

Results:
  Total Articles: 15
  Successful: 14
  Failed: 0
  Skipped: 1

By Tier:
  Tier 1:
    Successful: 14
    Failed: 0
    Skipped: 1

Example Job Files

Located in jobs/ directory:

example_tier1_batch.json - 15 tier 1 articles
example_multi_tier_batch.json - 165 articles across 3 tiers
example_custom_anchors.json - Custom anchor text demo
README.md - Job configuration guide

Test Coverage

Unit Tests (30+ tests):

test_generation_service.py - Pipeline stages
test_augmenter.py - Content augmentation
test_job_config.py - Job configuration validation

Integration Tests:

test_content_generation.py - Full pipeline with mocked AI
Repository CRUD operations
Service initialization
Job validation

Database Schema

New Table: generated_content

CREATE TABLE generated_content (
    id INTEGER PRIMARY KEY,
    project_id INTEGER REFERENCES projects(id),
    tier INTEGER,
    title TEXT,
    outline TEXT,
    content TEXT,
    status VARCHAR(20) DEFAULT 'pending',
    is_active BOOLEAN DEFAULT 0,
    generation_stage VARCHAR(20) DEFAULT 'title',
    title_attempts INTEGER DEFAULT 0,
    outline_attempts INTEGER DEFAULT 0,
    content_attempts INTEGER DEFAULT 0,
    title_model VARCHAR(100),
    outline_model VARCHAR(100),
    content_model VARCHAR(100),
    validation_errors INTEGER DEFAULT 0,
    validation_warnings INTEGER DEFAULT 0,
    validation_report JSON,
    word_count INTEGER,
    augmented BOOLEAN DEFAULT 0,
    augmentation_log JSON,
    generation_duration FLOAT,
    error_message TEXT,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE INDEX idx_generated_content_project_id ON generated_content(project_id);
CREATE INDEX idx_generated_content_tier ON generated_content(tier);
CREATE INDEX idx_generated_content_status ON generated_content(status);

Dependencies Added

beautifulsoup4==4.12.2 - HTML parsing for augmentation

All other dependencies already present (OpenAI SDK for OpenRouter).

Configuration

Environment Variables:

AI_API_KEY=sk-or-v1-your-openrouter-key
AI_API_BASE_URL=https://openrouter.ai/api/v1  # Optional
AI_MODEL=anthropic/claude-3.5-sonnet  # Optional

master.config.json: Already configured in Story 2.2 with:

ai_service section
content_rules for validation
Available models list

Design Decisions

Why Three Separate Stages?

Title First: Validates keyword usage early, informs outline
Outline Next: Ensures structure before expensive content generation
Content Last: Follows validated structure, reduces failures

Better success rate than single-prompt approach.

Why Programmatic Augmentation?

AI is unreliable at precise keyword placement
Validation failures are common with strict CORA targets
Hybrid approach: AI for quality, programmatic for precision
Saves API costs (no endless retries)

Why Separate GeneratedContent Table?

Version history preserved
Can rollback to previous generation
Track attempts and augmentation
Rich metadata for debugging
A/B testing capability

Why Job Configuration Files?

Reusable batch configurations
Version control job definitions
Easy to share and modify
Future: Auto-process job folder
Clear audit trail

Why Tier-Aware Validation?

Tier 1: Strictest (all CORA targets mandatory)
Tier 2+: Warnings only (more lenient)
Matches real-world content quality needs
Saves costs on bulk tier 2+ content

Known Limitations

No Interlinking Yet: Links added in Epic 3 (Story 3.3)
No CSS/Templates: Added in Story 2.4
Sequential Processing: No parallel generation (future enhancement)
Force-Regenerate Flag: Not yet implemented
No Image Generation: Placeholder for future
Single Project per Job: Can't mix projects in one batch

Next Steps

Story 2.4: HTML Formatting with Multiple Templates

Wrap generated content in full HTML documents
Apply CSS templates
Map templates to deployment targets
Add meta tags and SEO elements

Epic 3: Pre-Deployment & Interlinking

Generate final URLs
Inject interlinks (wheel structure)
Add home page links
Random existing article links

Technical Debt Added

Items added to technical-debt.md:

A/B test different prompt templates
Prompt optimization comparison tool
Parallel article generation
Job folder auto-processing
Cost tracking per generation
Model performance analytics

Files Created/Modified

New Files:

src/database/models.py - Added GeneratedContent model
src/database/interfaces.py - Added IGeneratedContentRepository
src/database/repositories.py - Added GeneratedContentRepository
src/generation/ai_client.py - OpenRouter AI client
src/generation/service.py - Content generation service
src/generation/validator.py - Stage validation
src/generation/augmenter.py - Content augmentation
src/generation/job_config.py - Job configuration schema
src/generation/batch_processor.py - Batch job processor
src/generation/prompts/title_generation.json
src/generation/prompts/outline_generation.json
src/generation/prompts/content_generation.json
src/generation/prompts/outline_augmentation.json
src/generation/prompts/content_augmentation.json
tests/unit/test_generation_service.py
tests/unit/test_augmenter.py
tests/unit/test_job_config.py
tests/integration/test_content_generation.py
jobs/example_tier1_batch.json
jobs/example_multi_tier_batch.json
jobs/example_custom_anchors.json
jobs/README.md
docs/stories/story-2.3-ai-content-generation.md

Modified Files:

src/cli/commands.py - Added generate-batch command
requirements.txt - Added beautifulsoup4
docs/technical-debt.md - Added new items

Manual Testing

Prerequisites:

Set AI_API_KEY in .env
Initialize database: python scripts/init_db.py reset
Create admin user: python scripts/create_first_admin.py
Ingest CORA file: python main.py ingest-cora --file <path> --name "Test" -u admin -p pass

Test Commands:

# Test single tier batch
python main.py generate-batch -j jobs/example_tier1_batch.json -u admin -p password

# Test multi-tier batch  
python main.py generate-batch -j jobs/example_multi_tier_batch.json -u admin -p password

# Test custom anchors
python main.py generate-batch -j jobs/example_custom_anchors.json -u admin -p password

Validation:

-- Check generated content
SELECT id, project_id, tier, status, generation_stage, 
       title_attempts, outline_attempts, content_attempts,
       validation_errors, validation_warnings
FROM generated_content;

-- Check active content
SELECT id, project_id, tier, is_active, word_count, augmented
FROM generated_content
WHERE is_active = 1;

Performance Notes

Title generation: ~2-5 seconds
Outline generation: ~5-10 seconds
Content generation: ~20-60 seconds
Total per article: ~30-75 seconds
Batch of 15 (Tier 1): ~10-20 minutes

Varies by model and complexity.

Completion Checklist

GeneratedContent database model
GeneratedContentRepository
AI client service
Prompt templates
ContentGenerationService (3-stage pipeline)
ContentAugmenter
Stage validation
Batch processor
Job configuration schema
CLI command
Example job files
Unit tests (30+ tests)
Integration tests
Documentation
Database initialization support

Notes

OpenRouter provides unified API for multiple models
JSON prompt format preferred by user for better consistency
Augmentation essential for CORA compliance
Batch processing architecture scales well
Version tracking enables rollback and comparison
Tier system balances quality vs cost

14 KiB Raw Blame History

Story 2.3: AI-Powered Content Generation - COMPLETED

Overview

Status

Story Details

Acceptance Criteria - ALL MET

1. Script Initiation for Projects

2. AI-Powered Generation Using SEO Data

3. Content Rule Engine Validation

4. Database Storage

5. Progress Logging

6. AI Service Error Handling

Implementation Details

Architecture Components

1. Database Models (src/database/models.py)

2. AI Client (src/generation/ai_client.py)

3. Job Configuration (src/generation/job_config.py)

4. Three-Stage Generation Pipeline (src/generation/service.py)

5. Stage Validation (src/generation/validator.py)

6. Content Augmentation (src/generation/augmenter.py)

7. Batch Processor (src/generation/batch_processor.py)

8. Prompt Templates (src/generation/prompts/)

CLI Command

Example Job Files

Test Coverage

Database Schema

Dependencies Added

Configuration

Design Decisions

Why Three Separate Stages?

Why Programmatic Augmentation?

Why Separate GeneratedContent Table?

Why Job Configuration Files?

Why Tier-Aware Validation?

Known Limitations

Next Steps

Technical Debt Added

Files Created/Modified

New Files:

Modified Files:

Manual Testing

Prerequisites:

Test Commands:

Validation:

Performance Notes

Completion Checklist

Notes

14 KiB

Raw Blame History

1. Database Models (`src/database/models.py`)

2. AI Client (`src/generation/ai_client.py`)

3. Job Configuration (`src/generation/job_config.py`)

4. Three-Stage Generation Pipeline (`src/generation/service.py`)

5. Stage Validation (`src/generation/validator.py`)

6. Content Augmentation (`src/generation/augmenter.py`)

7. Batch Processor (`src/generation/batch_processor.py`)

8. Prompt Templates (`src/generation/prompts/`)