23 KiB
Story 2.2: Simplified AI Content Generation - Detailed Task Breakdown
Overview
This document breaks down Story 2.2 into detailed tasks with specific implementation notes.
PHASE 1: Data Model & Schema Design
Task 1.1: Create GeneratedContent Database Model
File: src/database/models.py
Add new model class:
class GeneratedContent(Base):
__tablename__ = "generated_content"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
project_id: Mapped[int] = mapped_column(Integer, ForeignKey('projects.id'), nullable=False, index=True)
tier: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
title: Mapped[str] = mapped_column(Text, nullable=False)
outline: Mapped[dict] = mapped_column(JSON, nullable=False)
content: Mapped[str] = mapped_column(Text, nullable=False)
word_count: Mapped[int] = mapped_column(Integer, nullable=False)
status: Mapped[str] = mapped_column(String(20), nullable=False)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
updated_at: Mapped[datetime] = mapped_column(
DateTime,
default=datetime.utcnow,
onupdate=datetime.utcnow,
nullable=False
)
Status values: generated, augmented, failed
Update: scripts/init_db.py to create the table
Task 1.2: Create GeneratedContent Repository
File: src/database/repositories.py
Add repository class:
class GeneratedContentRepository(BaseRepository[GeneratedContent]):
def __init__(self, session: Session):
super().__init__(GeneratedContent, session)
def get_by_project_id(self, project_id: int) -> list[GeneratedContent]:
pass
def get_by_project_and_tier(self, project_id: int, tier: str) -> list[GeneratedContent]:
pass
def get_by_keyword(self, keyword: str) -> list[GeneratedContent]:
pass
Task 1.3: Define Job File JSON Schema
File: jobs/README.md (create/update)
Job file structure (one project per job, multiple jobs per file):
{
"jobs": [
{
"project_id": 1,
"tiers": {
"tier1": {
"count": 5,
"min_word_count": 2000,
"max_word_count": 2500,
"min_h2_tags": 3,
"max_h2_tags": 5,
"min_h3_tags": 5,
"max_h3_tags": 10
},
"tier2": {
"count": 10,
"min_word_count": 1500,
"max_word_count": 2000,
"min_h2_tags": 2,
"max_h2_tags": 4,
"min_h3_tags": 3,
"max_h3_tags": 8
},
"tier3": {
"count": 15,
"min_word_count": 1000,
"max_word_count": 1500,
"min_h2_tags": 2,
"max_h2_tags": 3,
"min_h3_tags": 2,
"max_h3_tags": 6
}
}
},
{
"project_id": 2,
"tiers": {
"tier1": { ... }
}
}
]
}
Tier defaults (constants if not specified in job file):
TIER_DEFAULTS = {
"tier1": {
"min_word_count": 2000,
"max_word_count": 2500,
"min_h2_tags": 3,
"max_h2_tags": 5,
"min_h3_tags": 5,
"max_h3_tags": 10
},
"tier2": {
"min_word_count": 1500,
"max_word_count": 2000,
"min_h2_tags": 2,
"max_h2_tags": 4,
"min_h3_tags": 3,
"max_h3_tags": 8
},
"tier3": {
"min_word_count": 1000,
"max_word_count": 1500,
"min_h2_tags": 2,
"max_h2_tags": 3,
"min_h3_tags": 2,
"max_h3_tags": 6
}
}
Future extensibility note: This structure allows adding more fields per job in future stories.
PHASE 2: AI Client & Prompt Management
Task 2.1: Implement AIClient for OpenRouter
File: src/generation/ai_client.py
OpenRouter API details:
- Base URL:
https://openrouter.ai/api/v1 - Compatible with OpenAI SDK
- Requires
OPENROUTER_API_KEYenv variable
Initial model list:
AVAILABLE_MODELS = {
"gpt-4o-mini": "openai/gpt-4o-mini",
"claude-sonnet-4.5": "anthropic/claude-3.5-sonnet"
}
Implementation:
class AIClient:
def __init__(self, api_key: str, model: str, base_url: str = "https://openrouter.ai/api/v1"):
self.client = OpenAI(api_key=api_key, base_url=base_url)
self.model = model
def generate_completion(
self,
prompt: str,
system_message: str = None,
max_tokens: int = 4000,
temperature: float = 0.7,
json_mode: bool = False
) -> str:
"""
Generate completion from OpenRouter API
json_mode: if True, adds response_format={"type": "json_object"}
"""
pass
Error handling: Retry 3x with exponential backoff for network/rate limit errors
Task 2.2: Create Prompt Templates
Files: src/generation/prompts/*.json
title_generation.json:
{
"system_message": "You are an expert SEO content writer...",
"user_prompt": "Generate an SEO-optimized title for an article about: {keyword}\n\nRelated entities: {entities}\n\nRelated searches: {related_searches}\n\nReturn only the title text, no formatting."
}
outline_generation.json:
{
"system_message": "You are an expert content outliner...",
"user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- {min_h2} to {max_h2} H2 headings\n- {min_h3} to {max_h3} H3 subheadings total\n\nEntities: {entities}\nRelated searches: {related_searches}\n\nReturn as JSON: {\"outline\": [{\"h2\": \"...\", \"h3\": [\"...\", \"...\"]}]}"
}
content_generation.json:
{
"system_message": "You are an expert content writer...",
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include: {entities}\nRelated searches: {related_searches}\n\nReturn as HTML fragment with <h2>, <h3>, <p> tags. Do NOT include <html>, <head>, or <body> tags."
}
content_augmentation.json:
{
"system_message": "You are an expert content editor...",
"user_prompt": "Please expand on the following article to add more detail and depth, ensuring you maintain the existing topical focus. Target word count: {target_word_count}\n\nCurrent article:\n{content}\n\nReturn the expanded article as an HTML fragment."
}
Task 2.3: Create PromptManager
File: src/generation/ai_client.py (add to same file)
class PromptManager:
def __init__(self, prompts_dir: str = "src/generation/prompts"):
self.prompts_dir = prompts_dir
self.prompts = {}
def load_prompt(self, prompt_name: str) -> dict:
"""Load prompt from JSON file"""
pass
def format_prompt(self, prompt_name: str, **kwargs) -> tuple[str, str]:
"""
Format prompt with variables
Returns: (system_message, user_prompt)
"""
pass
PHASE 3: Core Generation Pipeline
Task 3.1: Implement ContentGenerator Service
File: src/generation/service.py
class ContentGenerator:
def __init__(
self,
ai_client: AIClient,
prompt_manager: PromptManager,
project_repo: ProjectRepository,
content_repo: GeneratedContentRepository
):
self.ai_client = ai_client
self.prompt_manager = prompt_manager
self.project_repo = project_repo
self.content_repo = content_repo
Task 3.2: Implement Stage 1 - Title Generation
File: src/generation/service.py
def generate_title(self, project_id: int, debug: bool = False) -> str:
"""
Generate SEO-optimized title
Returns: title string
Saves to debug_output/title_project_{id}_{timestamp}.txt if debug=True
"""
# Fetch project
# Load prompt
# Call AI
# If debug: save response to debug_output/
# Return title
pass
Task 3.3: Implement Stage 2 - Outline Generation
File: src/generation/service.py
def generate_outline(
self,
project_id: int,
title: str,
min_h2: int,
max_h2: int,
min_h3: int,
max_h3: int,
debug: bool = False
) -> dict:
"""
Generate article outline in JSON format
Returns: {"outline": [{"h2": "...", "h3": ["...", "..."]}]}
Uses json_mode=True in AI call to ensure JSON response
Validates: at least min_h2 headings, at least min_h3 total subheadings
Saves to debug_output/outline_project_{id}_{timestamp}.json if debug=True
"""
pass
Validation:
- Parse JSON response
- Count h2 tags (must be >= min_h2)
- Count total h3 tags across all h2s (must be >= min_h3)
- Raise error if validation fails
Task 3.4: Implement Stage 3 - Content Generation
File: src/generation/service.py
def generate_content(
self,
project_id: int,
title: str,
outline: dict,
debug: bool = False
) -> str:
"""
Generate full article HTML fragment
Returns: HTML string with <h2>, <h3>, <p> tags
Does NOT include <html>, <head>, or <body> tags
Saves to debug_output/content_project_{id}_{timestamp}.html if debug=True
"""
pass
HTML fragment format:
<h2>First Heading</h2>
<p>Paragraph content...</p>
<h3>Subheading</h3>
<p>More content...</p>
Task 3.5: Implement Word Count Validation
File: src/generation/service.py
def validate_word_count(self, content: str, min_words: int, max_words: int) -> tuple[bool, int]:
"""
Validate content word count
Returns: (is_valid, actual_count)
- is_valid: True if min_words <= actual_count <= max_words
- actual_count: number of words in content
Implementation: Strip HTML tags, split on whitespace, count tokens
"""
pass
Task 3.6: Implement Simple Augmentation
File: src/generation/service.py
def augment_content(
self,
content: str,
target_word_count: int,
debug: bool = False
) -> str:
"""
Expand article content to meet minimum word count
Called ONLY if word_count < min_word_count
Makes ONE API call only
Saves to debug_output/augmented_project_{id}_{timestamp}.html if debug=True
"""
pass
PHASE 4: Batch Processing
Task 4.1: Create JobConfig Parser
File: src/generation/job_config.py
from dataclasses import dataclass
from typing import Optional
TIER_DEFAULTS = {
"tier1": {
"min_word_count": 2000,
"max_word_count": 2500,
"min_h2_tags": 3,
"max_h2_tags": 5,
"min_h3_tags": 5,
"max_h3_tags": 10
},
"tier2": {
"min_word_count": 1500,
"max_word_count": 2000,
"min_h2_tags": 2,
"max_h2_tags": 4,
"min_h3_tags": 3,
"max_h3_tags": 8
},
"tier3": {
"min_word_count": 1000,
"max_word_count": 1500,
"min_h2_tags": 2,
"max_h2_tags": 3,
"min_h3_tags": 2,
"max_h3_tags": 6
}
}
@dataclass
class TierConfig:
count: int
min_word_count: int
max_word_count: int
min_h2_tags: int
max_h2_tags: int
min_h3_tags: int
max_h3_tags: int
@dataclass
class Job:
project_id: int
tiers: dict[str, TierConfig]
class JobConfig:
def __init__(self, job_file_path: str):
"""Load and parse job file, apply defaults"""
pass
def get_jobs(self) -> list[Job]:
"""Return list of all jobs in file"""
pass
def get_tier_config(self, job: Job, tier_name: str) -> Optional[TierConfig]:
"""Get tier config with defaults applied"""
pass
Task 4.2: Create BatchProcessor
File: src/generation/batch_processor.py
class BatchProcessor:
def __init__(
self,
content_generator: ContentGenerator,
content_repo: GeneratedContentRepository,
project_repo: ProjectRepository
):
pass
def process_job(
self,
job_file_path: str,
debug: bool = False,
continue_on_error: bool = False
):
"""
Process all jobs in job file
For each job:
For each tier:
For count times:
1. Generate title (log to console)
2. Generate outline
3. Generate content
4. Validate word count
5. If below min, augment once
6. Save to GeneratedContent table
Logs progress to console
If debug=True, saves AI responses to debug_output/
"""
pass
Console output format:
Processing Job 1/3: Project ID 5
Tier 1: Generating 5 articles
[1/5] Generating title... "Ultimate Guide to SEO in 2025"
[1/5] Generating outline... 4 H2s, 8 H3s
[1/5] Generating content... 1,845 words
[1/5] Below minimum (2000), augmenting... 2,123 words
[1/5] Saved (ID: 42, Status: augmented)
[2/5] Generating title... "Advanced SEO Techniques"
...
Tier 2: Generating 10 articles
...
Summary:
Jobs processed: 3/3
Articles generated: 45/45
Augmented: 12
Failed: 0
Task 4.3: Error Handling & Retry Logic
File: src/generation/batch_processor.py
Error handling strategy:
- AI API errors: Log error, mark as
status='failed', save to DB - If
continue_on_error=True: continue to next article - If
continue_on_error=False: stop batch processing - Database errors: Always abort (data integrity)
- Invalid job file: Fail fast with validation error
Retry logic (in AIClient):
- Network errors: 3 retries with exponential backoff (1s, 2s, 4s)
- Rate limit errors: Respect Retry-After header
- Other errors: No retry, raise immediately
PHASE 5: CLI Integration
Task 5.1: Add generate-batch Command
File: src/cli/commands.py
@app.command("generate-batch")
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True),
help='Path to job JSON file')
@click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-p', help='Password for authentication')
@click.option('--debug', is_flag=True, help='Save AI responses to debug_output/')
@click.option('--continue-on-error', is_flag=True,
help='Continue processing if article generation fails')
@click.option('--model', '-m', default='gpt-4o-mini',
help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
def generate_batch(
job_file: str,
username: Optional[str],
password: Optional[str],
debug: bool,
continue_on_error: bool,
model: str
):
"""Generate content batch from job file"""
# Authenticate user
# Initialize AIClient with OpenRouter
# Initialize PromptManager, ContentGenerator, BatchProcessor
# Call process_job()
# Show summary
pass
Task 5.2: Add Progress Logging & Debug Output
File: src/generation/batch_processor.py
Debug output (when --debug flag used):
- Create
debug_output/directory if not exists - For each AI call, save response to file:
debug_output/title_project{id}_tier{tier}_{n}_{timestamp}.txtdebug_output/outline_project{id}_tier{tier}_{n}_{timestamp}.jsondebug_output/content_project{id}_tier{tier}_{n}_{timestamp}.htmldebug_output/augmented_project{id}_tier{tier}_{n}_{timestamp}.html
- Also echo to console with
click.echo()
Normal output (without --debug):
- Always show title when generated:
"Generated title: {title}" - Show word counts and status
- Show progress counter
[n/total]
PHASE 6: Testing & Validation
Task 6.1: Create Unit Tests
tests/unit/test_ai_client.py
def test_generate_completion_success():
"""Test successful AI completion"""
pass
def test_generate_completion_json_mode():
"""Test JSON mode returns valid JSON"""
pass
def test_generate_completion_retry_on_network_error():
"""Test retry logic for network errors"""
pass
tests/unit/test_content_generator.py
def test_generate_title():
"""Test title generation with mocked AI response"""
pass
def test_generate_outline_valid_structure():
"""Test outline generation returns valid JSON with min h2/h3"""
pass
def test_generate_content_html_fragment():
"""Test content is HTML fragment (no <html> tag)"""
pass
def test_validate_word_count():
"""Test word count validation with various HTML inputs"""
pass
def test_augment_content_called_once():
"""Test augmentation only called once"""
pass
tests/unit/test_job_config.py
def test_load_job_config_valid():
"""Test loading valid job file"""
pass
def test_tier_defaults_applied():
"""Test defaults applied when not in job file"""
pass
def test_multiple_jobs_in_file():
"""Test parsing file with multiple jobs"""
pass
tests/unit/test_batch_processor.py
def test_process_job_success():
"""Test successful batch processing"""
pass
def test_process_job_with_augmentation():
"""Test articles below min word count are augmented"""
pass
def test_process_job_continue_on_error():
"""Test continue_on_error flag behavior"""
pass
Task 6.2: Create Integration Test
File: tests/integration/test_generate_batch.py
def test_generate_batch_end_to_end(test_db, mock_ai_client):
"""
End-to-end test:
1. Create test project in DB
2. Create test job file
3. Run batch processor
4. Verify GeneratedContent records created
5. Verify word counts within range
6. Verify HTML structure
"""
pass
Task 6.3: Create Example Job Files
jobs/example_tier1_batch.json
{
"jobs": [
{
"project_id": 1,
"tiers": {
"tier1": {
"count": 5
}
}
}
]
}
(Uses all defaults for tier1)
jobs/example_multi_tier_batch.json
{
"jobs": [
{
"project_id": 1,
"tiers": {
"tier1": {
"count": 5,
"min_word_count": 2200,
"max_word_count": 2600
},
"tier2": {
"count": 10
},
"tier3": {
"count": 15,
"max_h2_tags": 4
}
}
},
{
"project_id": 2,
"tiers": {
"tier1": {
"count": 3
}
}
}
]
}
jobs/README.md
Document job file format and examples
PHASE 7: Cleanup & Deprecation
Task 7.1: Remove Old ContentRuleEngine
Action: Delete or gut src/generation/rule_engine.py
Only keep if it has reusable utilities. Otherwise remove entirely.
Task 7.2: Remove Old Validator Logic
Action: Review src/generation/validator.py (if exists)
Remove any strict CORA validation beyond word count. Keep only simple validation utilities.
Task 7.3: Update Documentation
Files to update:
docs/stories/story-2.2. simplified-ai-content-generation.md- Status to "In Progress" → "Done"docs/architecture/workflows.md- Document simplified generation flowdocs/architecture/components.md- Update generation component description
Implementation Order Recommendation
- Phase 1 (Data Layer) - Required foundation
- Phase 2 (AI Client) - Required for generation
- Phase 3 (Core Logic) - Implement one stage at a time, test each
- Phase 4 (Batch Processing) - Orchestrate stages
- Phase 5 (CLI) - Make accessible to users
- Phase 6 (Testing) - Can be done in parallel with implementation
- Phase 7 (Cleanup) - Final polish
Estimated effort:
- Phase 1-2: 4-6 hours
- Phase 3: 6-8 hours
- Phase 4: 3-4 hours
- Phase 5: 2-3 hours
- Phase 6: 4-6 hours
- Phase 7: 1-2 hours
- Total: 20-29 hours
Critical Dev Notes
OpenRouter Specifics
- API key from environment:
OPENROUTER_API_KEY - Model format:
"provider/model-name" - Supports OpenAI SDK drop-in replacement
- Rate limits vary by model (check OpenRouter docs)
HTML Fragment Format
Content generation returns HTML like:
<h2>Main Topic</h2>
<p>Introduction paragraph with relevant keywords and entities.</p>
<h3>Subtopic One</h3>
<p>Detailed content about subtopic.</p>
<h3>Subtopic Two</h3>
<p>More detailed content.</p>
<h2>Second Main Topic</h2>
<p>Content continues...</p>
No document structure: No <!DOCTYPE>, <html>, <head>, or <body> tags.
Word Count Method
import re
from html import unescape
def count_words(html_content: str) -> int:
# Strip HTML tags
text = re.sub(r'<[^>]+>', '', html_content)
# Unescape HTML entities
text = unescape(text)
# Split and count
words = text.split()
return len(words)
Debug Output Directory
- Create
debug_output/at project root if not exists - Add to
.gitignore - Filename format:
{stage}_project{id}_tier{tier}_article{n}_{timestamp}.{ext} - Example:
title_project5_tier1_article3_20251020_143022.txt
Tier Constants Location
Define in src/generation/job_config.py as module-level constant for easy reference.
Future Extensibility
Job file structure designed to support:
- Custom interlinking rules (Story 2.4+)
- Template selection (Story 3.x)
- Deployment targets (Story 4.x)
- SEO metadata overrides
Keep job parsing flexible to add new fields without breaking existing jobs.
Testing Strategy
Unit Test Mocking
Mock AIClient.generate_completion() to return realistic HTML:
@pytest.fixture
def mock_title_response():
return "The Ultimate Guide to Sustainable Gardening in 2025"
@pytest.fixture
def mock_outline_response():
return {
"outline": [
{"h2": "Getting Started", "h3": ["Tools", "Planning"]},
{"h2": "Best Practices", "h3": ["Watering", "Composting"]}
]
}
@pytest.fixture
def mock_content_response():
return """<h2>Getting Started</h2>
<p>Sustainable gardening begins with proper planning...</p>
<h3>Tools</h3>
<p>Essential tools include...</p>"""
Integration Test Database
Use conftest.py fixture with in-memory SQLite and test data:
@pytest.fixture
def test_project(test_db):
project_repo = ProjectRepository(test_db)
return project_repo.create(
user_id=1,
name="Test Project",
data={
"main_keyword": "sustainable gardening",
"entities": ["composting", "organic soil"],
"related_searches": ["how to compost", "organic gardening tips"]
}
)
Success Criteria
Story is complete when:
- All database models and repositories implemented
- AIClient successfully calls OpenRouter API
- Three-stage generation pipeline works end-to-end
- Batch processor handles multiple jobs/tiers
- CLI command
generate-batchfunctional - Debug output saves to
debug_output/when--debugused - All unit tests pass
- Integration test demonstrates full workflow
- Example job files work correctly
- Documentation updated
Acceptance: Run generate-batch on real project, verify content saved to database with correct word count and structure.