14 KiB
Story 2.3: AI-Powered Content Generation - COMPLETED
Overview
Implemented a comprehensive AI-powered content generation system with three-stage pipeline (title → outline → content), validation at each stage, programmatic augmentation for CORA compliance, and batch job processing across multiple tiers.
Status
COMPLETED
Story Details
As a User, I want to execute a job for a project that uses AI to generate a title, an outline, and full-text content, so that the core content is created automatically.
Acceptance Criteria - ALL MET
1. Script Initiation for Projects
Status: COMPLETE
- CLI command:
generate-batch --job-file <path> - Supports batch processing across multiple tiers
- Job configuration via JSON files
- Progress tracking and error reporting
2. AI-Powered Generation Using SEO Data
Status: COMPLETE
- Title generation with keyword validation
- Outline generation meeting CORA H2/H3 targets
- Full HTML content generation
- Uses project's SEO data (keywords, entities, related searches)
- Multiple AI models supported via OpenRouter
3. Content Rule Engine Validation
Status: COMPLETE
- Validates at each stage (title, outline, content)
- Uses ContentRuleEngine from Story 2.2
- Tier-aware validation (strict for Tier 1)
- Detailed error reporting
4. Database Storage
Status: COMPLETE
- Title, outline, and content stored in GeneratedContent table
- Version tracking and metadata
- Tracks attempts, models used, validation results
- Augmentation logs
5. Progress Logging
Status: COMPLETE
- Real-time progress updates via CLI
- Logs: "Generating title...", "Generating content...", etc.
- Tracks successful, failed, and skipped articles
- Detailed summary reports
6. AI Service Error Handling
Status: COMPLETE
- Graceful handling of API errors
- Retry logic with configurable attempts
- Fallback to programmatic augmentation
- Continue or stop on failures (configurable)
Implementation Details
Architecture Components
1. Database Models (src/database/models.py)
GeneratedContent Model:
class GeneratedContent(Base):
id, project_id, tier
title, outline, content
status, is_active
generation_stage
title_attempts, outline_attempts, content_attempts
title_model, outline_model, content_model
validation_errors, validation_warnings
validation_report (JSON)
word_count, augmented
augmentation_log (JSON)
generation_duration
error_message
created_at, updated_at
2. AI Client (src/generation/ai_client.py)
Features:
- OpenRouter API integration
- Multiple model support
- JSON-formatted responses
- Error handling and retries
- Model validation
Available Models:
- Claude 3.5 Sonnet (default)
- Claude 3 Haiku
- GPT-4o / GPT-4o-mini
- Llama 3.1 70B/8B
- Gemini Pro 1.5
3. Job Configuration (src/generation/job_config.py)
Job Structure:
{
"job_name": "Batch Name",
"project_id": 1,
"tiers": [
{
"tier": 1,
"article_count": 15,
"models": {
"title": "model-id",
"outline": "model-id",
"content": "model-id"
},
"anchor_text_config": {
"mode": "default|override|append"
},
"validation_attempts": 3
}
],
"failure_config": {
"max_consecutive_failures": 5,
"skip_on_failure": true
}
}
4. Three-Stage Generation Pipeline (src/generation/service.py)
Stage 1: Title Generation
- Uses title_generation.json prompt
- Validates keyword presence and length
- Retries on validation failure
- Max attempts configurable
Stage 2: Outline Generation
- Uses outline_generation.json prompt
- Returns JSON structure with H1, H2s, H3s
- Validates CORA targets (H2/H3 counts, keyword distribution)
- AI retry → Programmatic augmentation if needed
- Ensures FAQ section present
Stage 3: Content Generation
- Uses content_generation.json prompt
- Follows validated outline structure
- Generates full HTML (no CSS, just semantic markup)
- Validates against all CORA rules
- AI retry → Augmentation if needed
5. Stage Validation (src/generation/validator.py)
Title Validation:
- Length (30-100 chars)
- Keyword presence
- Non-empty
Outline Validation:
- H1 contains keyword
- H2/H3 counts meet targets
- Keyword distribution in headings
- Entity and related search incorporation
- FAQ section present
- Tier-aware strictness
Content Validation:
- Full CORA rule validation
- Word count (min/max)
- Keyword frequency
- Heading structure
- FAQ format
- Image alt text (when applicable)
6. Content Augmentation (src/generation/augmenter.py)
Outline Augmentation:
- Add missing H2s with keywords
- Add H3s with entities
- Modify existing headings
- Maintain logical flow
Content Augmentation:
- Strategy 1: Ask AI to add paragraphs (small deficits)
- Strategy 2: Programmatically insert terms (large deficits)
- Insert keywords into random sentences
- Capitalize if sentence-initial
- Add complete paragraphs with missing elements
7. Batch Processor (src/generation/batch_processor.py)
Features:
- Process multiple tiers sequentially
- Track progress per tier
- Handle failures (skip or stop)
- Consecutive failure threshold
- Real-time progress callbacks
- Detailed result reporting
8. Prompt Templates (src/generation/prompts/)
Files:
title_generation.json- Title promptsoutline_generation.json- Outline structure promptscontent_generation.json- Full content promptsoutline_augmentation.json- Outline fix promptscontent_augmentation.json- Content enhancement prompts
Format:
{
"system": "System message",
"user_template": "Prompt with {placeholders}",
"validation": {
"output_format": "text|json|html",
"requirements": []
}
}
CLI Command
python main.py generate-batch \
--job-file jobs/example_tier1_batch.json \
--username admin \
--password password
Options:
--job-file, -j: Path to job configuration JSON (required)--force-regenerate, -f: Force regeneration (flag, not implemented)--username, -u: Authentication username--password, -p: Authentication password
Example Output:
Authenticated as: admin (Admin)
Loading Job: Tier 1 Launch Batch
Project ID: 1
Total Articles: 15
Tiers:
Tier 1: 15 articles
Models: gpt-4o-mini / claude-3.5-sonnet / claude-3.5-sonnet
Proceed with generation? [y/N]: y
Starting batch generation...
--------------------------------------------------------------------------------
[Tier 1] Article 1/15: Generating...
[Tier 1] Article 1/15: Completed (ID: 1)
[Tier 1] Article 2/15: Generating...
...
--------------------------------------------------------------------------------
Batch Generation Complete!
Job: Tier 1 Launch Batch
Project ID: 1
Duration: 1234.56s
Results:
Total Articles: 15
Successful: 14
Failed: 0
Skipped: 1
By Tier:
Tier 1:
Successful: 14
Failed: 0
Skipped: 1
Example Job Files
Located in jobs/ directory:
example_tier1_batch.json- 15 tier 1 articlesexample_multi_tier_batch.json- 165 articles across 3 tiersexample_custom_anchors.json- Custom anchor text demoREADME.md- Job configuration guide
Test Coverage
Unit Tests (30+ tests):
test_generation_service.py- Pipeline stagestest_augmenter.py- Content augmentationtest_job_config.py- Job configuration validation
Integration Tests:
test_content_generation.py- Full pipeline with mocked AI- Repository CRUD operations
- Service initialization
- Job validation
Database Schema
New Table: generated_content
CREATE TABLE generated_content (
id INTEGER PRIMARY KEY,
project_id INTEGER REFERENCES projects(id),
tier INTEGER,
title TEXT,
outline TEXT,
content TEXT,
status VARCHAR(20) DEFAULT 'pending',
is_active BOOLEAN DEFAULT 0,
generation_stage VARCHAR(20) DEFAULT 'title',
title_attempts INTEGER DEFAULT 0,
outline_attempts INTEGER DEFAULT 0,
content_attempts INTEGER DEFAULT 0,
title_model VARCHAR(100),
outline_model VARCHAR(100),
content_model VARCHAR(100),
validation_errors INTEGER DEFAULT 0,
validation_warnings INTEGER DEFAULT 0,
validation_report JSON,
word_count INTEGER,
augmented BOOLEAN DEFAULT 0,
augmentation_log JSON,
generation_duration FLOAT,
error_message TEXT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
CREATE INDEX idx_generated_content_project_id ON generated_content(project_id);
CREATE INDEX idx_generated_content_tier ON generated_content(tier);
CREATE INDEX idx_generated_content_status ON generated_content(status);
Dependencies Added
beautifulsoup4==4.12.2- HTML parsing for augmentation
All other dependencies already present (OpenAI SDK for OpenRouter).
Configuration
Environment Variables:
AI_API_KEY=sk-or-v1-your-openrouter-key
AI_API_BASE_URL=https://openrouter.ai/api/v1 # Optional
AI_MODEL=anthropic/claude-3.5-sonnet # Optional
master.config.json: Already configured in Story 2.2 with:
ai_servicesectioncontent_rulesfor validation- Available models list
Design Decisions
Why Three Separate Stages?
- Title First: Validates keyword usage early, informs outline
- Outline Next: Ensures structure before expensive content generation
- Content Last: Follows validated structure, reduces failures
Better success rate than single-prompt approach.
Why Programmatic Augmentation?
- AI is unreliable at precise keyword placement
- Validation failures are common with strict CORA targets
- Hybrid approach: AI for quality, programmatic for precision
- Saves API costs (no endless retries)
Why Separate GeneratedContent Table?
- Version history preserved
- Can rollback to previous generation
- Track attempts and augmentation
- Rich metadata for debugging
- A/B testing capability
Why Job Configuration Files?
- Reusable batch configurations
- Version control job definitions
- Easy to share and modify
- Future: Auto-process job folder
- Clear audit trail
Why Tier-Aware Validation?
- Tier 1: Strictest (all CORA targets mandatory)
- Tier 2+: Warnings only (more lenient)
- Matches real-world content quality needs
- Saves costs on bulk tier 2+ content
Known Limitations
- No Interlinking Yet: Links added in Epic 3 (Story 3.3)
- No CSS/Templates: Added in Story 2.4
- Sequential Processing: No parallel generation (future enhancement)
- Force-Regenerate Flag: Not yet implemented
- No Image Generation: Placeholder for future
- Single Project per Job: Can't mix projects in one batch
Next Steps
Story 2.4: HTML Formatting with Multiple Templates
- Wrap generated content in full HTML documents
- Apply CSS templates
- Map templates to deployment targets
- Add meta tags and SEO elements
Epic 3: Pre-Deployment & Interlinking
- Generate final URLs
- Inject interlinks (wheel structure)
- Add home page links
- Random existing article links
Technical Debt Added
Items added to technical-debt.md:
- A/B test different prompt templates
- Prompt optimization comparison tool
- Parallel article generation
- Job folder auto-processing
- Cost tracking per generation
- Model performance analytics
Files Created/Modified
New Files:
src/database/models.py- Added GeneratedContent modelsrc/database/interfaces.py- Added IGeneratedContentRepositorysrc/database/repositories.py- Added GeneratedContentRepositorysrc/generation/ai_client.py- OpenRouter AI clientsrc/generation/service.py- Content generation servicesrc/generation/validator.py- Stage validationsrc/generation/augmenter.py- Content augmentationsrc/generation/job_config.py- Job configuration schemasrc/generation/batch_processor.py- Batch job processorsrc/generation/prompts/title_generation.jsonsrc/generation/prompts/outline_generation.jsonsrc/generation/prompts/content_generation.jsonsrc/generation/prompts/outline_augmentation.jsonsrc/generation/prompts/content_augmentation.jsontests/unit/test_generation_service.pytests/unit/test_augmenter.pytests/unit/test_job_config.pytests/integration/test_content_generation.pyjobs/example_tier1_batch.jsonjobs/example_multi_tier_batch.jsonjobs/example_custom_anchors.jsonjobs/README.mddocs/stories/story-2.3-ai-content-generation.md
Modified Files:
src/cli/commands.py- Added generate-batch commandrequirements.txt- Added beautifulsoup4docs/technical-debt.md- Added new items
Manual Testing
Prerequisites:
- Set AI_API_KEY in
.env - Initialize database:
python scripts/init_db.py reset - Create admin user:
python scripts/create_first_admin.py - Ingest CORA file:
python main.py ingest-cora --file <path> --name "Test" -u admin -p pass
Test Commands:
# Test single tier batch
python main.py generate-batch -j jobs/example_tier1_batch.json -u admin -p password
# Test multi-tier batch
python main.py generate-batch -j jobs/example_multi_tier_batch.json -u admin -p password
# Test custom anchors
python main.py generate-batch -j jobs/example_custom_anchors.json -u admin -p password
Validation:
-- Check generated content
SELECT id, project_id, tier, status, generation_stage,
title_attempts, outline_attempts, content_attempts,
validation_errors, validation_warnings
FROM generated_content;
-- Check active content
SELECT id, project_id, tier, is_active, word_count, augmented
FROM generated_content
WHERE is_active = 1;
Performance Notes
- Title generation: ~2-5 seconds
- Outline generation: ~5-10 seconds
- Content generation: ~20-60 seconds
- Total per article: ~30-75 seconds
- Batch of 15 (Tier 1): ~10-20 minutes
Varies by model and complexity.
Completion Checklist
- GeneratedContent database model
- GeneratedContentRepository
- AI client service
- Prompt templates
- ContentGenerationService (3-stage pipeline)
- ContentAugmenter
- Stage validation
- Batch processor
- Job configuration schema
- CLI command
- Example job files
- Unit tests (30+ tests)
- Integration tests
- Documentation
- Database initialization support
Notes
- OpenRouter provides unified API for multiple models
- JSON prompt format preferred by user for better consistency
- Augmentation essential for CORA compliance
- Batch processing architecture scales well
- Version tracking enables rollback and comparison
- Tier system balances quality vs cost