921 lines
24 KiB
Markdown
921 lines
24 KiB
Markdown
# Story 2.2: Simplified AI Content Generation - Detailed Task Breakdown
|
|
|
|
## Overview
|
|
This document breaks down Story 2.2 into detailed tasks with specific implementation notes.
|
|
|
|
---
|
|
|
|
## **PHASE 1: Data Model & Schema Design**
|
|
|
|
### Task 1.1: Create GeneratedContent Database Model
|
|
**File**: `src/database/models.py`
|
|
|
|
**Add new model class:**
|
|
```python
|
|
class GeneratedContent(Base):
|
|
__tablename__ = "generated_content"
|
|
|
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
|
project_id: Mapped[int] = mapped_column(Integer, ForeignKey('projects.id'), nullable=False, index=True)
|
|
tier: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
|
|
keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
|
|
title: Mapped[str] = mapped_column(Text, nullable=False)
|
|
outline: Mapped[dict] = mapped_column(JSON, nullable=False)
|
|
content: Mapped[str] = mapped_column(Text, nullable=False)
|
|
word_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
|
status: Mapped[str] = mapped_column(String(20), nullable=False)
|
|
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
|
|
updated_at: Mapped[datetime] = mapped_column(
|
|
DateTime,
|
|
default=datetime.utcnow,
|
|
onupdate=datetime.utcnow,
|
|
nullable=False
|
|
)
|
|
```
|
|
|
|
**Status values**: `generated`, `augmented`, `failed`
|
|
|
|
**Update**: `scripts/init_db.py` to create the table
|
|
|
|
---
|
|
|
|
### Task 1.2: Create GeneratedContent Repository
|
|
**File**: `src/database/repositories.py`
|
|
|
|
**Add repository class:**
|
|
```python
|
|
class GeneratedContentRepository(BaseRepository[GeneratedContent]):
|
|
def __init__(self, session: Session):
|
|
super().__init__(GeneratedContent, session)
|
|
|
|
def get_by_project_id(self, project_id: int) -> list[GeneratedContent]:
|
|
pass
|
|
|
|
def get_by_project_and_tier(self, project_id: int, tier: str) -> list[GeneratedContent]:
|
|
pass
|
|
|
|
def get_by_keyword(self, keyword: str) -> list[GeneratedContent]:
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 1.3: Define Job File JSON Schema
|
|
**File**: `jobs/README.md` (create/update)
|
|
|
|
**Job file structure** (one project per job, multiple jobs per file):
|
|
```json
|
|
{
|
|
"jobs": [
|
|
{
|
|
"project_id": 1,
|
|
"tiers": {
|
|
"tier1": {
|
|
"count": 5,
|
|
"min_word_count": 2000,
|
|
"max_word_count": 2500,
|
|
"min_h2_tags": 3,
|
|
"max_h2_tags": 5,
|
|
"min_h3_tags": 5,
|
|
"max_h3_tags": 10
|
|
},
|
|
"tier2": {
|
|
"count": 10,
|
|
"min_word_count": 1500,
|
|
"max_word_count": 2000,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 4,
|
|
"min_h3_tags": 3,
|
|
"max_h3_tags": 8
|
|
},
|
|
"tier3": {
|
|
"count": 15,
|
|
"min_word_count": 1000,
|
|
"max_word_count": 1500,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 3,
|
|
"min_h3_tags": 2,
|
|
"max_h3_tags": 6
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"project_id": 2,
|
|
"tiers": {
|
|
"tier1": { ... }
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Tier defaults** (constants if not specified in job file):
|
|
```python
|
|
TIER_DEFAULTS = {
|
|
"tier1": {
|
|
"min_word_count": 2000,
|
|
"max_word_count": 2500,
|
|
"min_h2_tags": 3,
|
|
"max_h2_tags": 5,
|
|
"min_h3_tags": 5,
|
|
"max_h3_tags": 10
|
|
},
|
|
"tier2": {
|
|
"min_word_count": 1500,
|
|
"max_word_count": 2000,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 4,
|
|
"min_h3_tags": 3,
|
|
"max_h3_tags": 8
|
|
},
|
|
"tier3": {
|
|
"min_word_count": 1000,
|
|
"max_word_count": 1500,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 3,
|
|
"min_h3_tags": 2,
|
|
"max_h3_tags": 6
|
|
}
|
|
}
|
|
```
|
|
|
|
**Future extensibility note**: This structure allows adding more fields per job in future stories.
|
|
|
|
---
|
|
|
|
## **PHASE 2: AI Client & Prompt Management**
|
|
|
|
### Task 2.1: Implement AIClient for OpenRouter
|
|
**File**: `src/generation/ai_client.py`
|
|
|
|
**OpenRouter API details**:
|
|
- Base URL: `https://openrouter.ai/api/v1`
|
|
- Compatible with OpenAI SDK
|
|
- Requires `OPENROUTER_API_KEY` env variable
|
|
|
|
**Initial model list**:
|
|
```python
|
|
AVAILABLE_MODELS = {
|
|
"gpt-4o-mini": "openai/gpt-4o-mini",
|
|
"claude-sonnet-4.5": "anthropic/claude-3.5-sonnet",
|
|
MANY OTHERS _ CHECK OUT OPENROUTER API FOR MORE
|
|
}
|
|
```
|
|
|
|
**Implementation**:
|
|
```python
|
|
class AIClient:
|
|
def __init__(self, api_key: str, model: str, base_url: str = "https://openrouter.ai/api/v1"):
|
|
self.client = OpenAI(api_key=api_key, base_url=base_url)
|
|
self.model = model
|
|
|
|
def generate_completion(
|
|
self,
|
|
prompt: str,
|
|
system_message: str = None,
|
|
max_tokens: int = 4000,
|
|
temperature: float = 0.7,
|
|
json_mode: bool = False
|
|
) -> str:
|
|
"""
|
|
Generate completion from OpenRouter API
|
|
json_mode: if True, adds response_format={"type": "json_object"}
|
|
"""
|
|
pass
|
|
```
|
|
|
|
**Error handling**: Retry 3x with exponential backoff for network/rate limit errors
|
|
|
|
---
|
|
|
|
### Task 2.2: Create Prompt Templates
|
|
**Files**: `src/generation/prompts/*.json`
|
|
|
|
**title_generation.json**:
|
|
```json
|
|
{
|
|
"system_message": "You are an expert SEO content writer...",
|
|
"user_prompt": "Generate an SEO-optimized title for an article about: {keyword}\n\nRelated entities: {entities}\n\nRelated searches: {related_searches}\n\nReturn only the title text, no formatting."
|
|
}
|
|
```
|
|
|
|
**outline_generation.json**:
|
|
```json
|
|
{
|
|
"system_message": "You are an expert content outliner...",
|
|
"user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- {min_h2} to {max_h2} H2 headings\n- {min_h3} to {max_h3} H3 subheadings total\n\nEntities: {entities}\nRelated searches: {related_searches}\n\nReturn as JSON: {\"outline\": [{\"h2\": \"...\", \"h3\": [\"...\", \"...\"]}]}"
|
|
}
|
|
```
|
|
|
|
**content_generation.json**:
|
|
```json
|
|
{
|
|
"system_message": "You are an expert content writer...",
|
|
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include: {entities}\nRelated searches: {related_searches}\n\nReturn as HTML fragment with <h2>, <h3>, <p> tags. Do NOT include <html>, <head>, or <body> tags."
|
|
}
|
|
```
|
|
|
|
**content_augmentation.json**:
|
|
```json
|
|
{
|
|
"system_message": "You are an expert content editor...",
|
|
"user_prompt": "Please expand on the following article to add more detail and depth, ensuring you maintain the existing topical focus. Target word count: {target_word_count}\n\nCurrent article:\n{content}\n\nReturn the expanded article as an HTML fragment."
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Task 2.3: Create PromptManager
|
|
**File**: `src/generation/ai_client.py` (add to same file)
|
|
|
|
```python
|
|
class PromptManager:
|
|
def __init__(self, prompts_dir: str = "src/generation/prompts"):
|
|
self.prompts_dir = prompts_dir
|
|
self.prompts = {}
|
|
|
|
def load_prompt(self, prompt_name: str) -> dict:
|
|
"""Load prompt from JSON file"""
|
|
pass
|
|
|
|
def format_prompt(self, prompt_name: str, **kwargs) -> tuple[str, str]:
|
|
"""
|
|
Format prompt with variables
|
|
Returns: (system_message, user_prompt)
|
|
"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
## **PHASE 3: Core Generation Pipeline**
|
|
|
|
### Task 3.1: Implement ContentGenerator Service
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
class ContentGenerator:
|
|
def __init__(
|
|
self,
|
|
ai_client: AIClient,
|
|
prompt_manager: PromptManager,
|
|
project_repo: ProjectRepository,
|
|
content_repo: GeneratedContentRepository
|
|
):
|
|
self.ai_client = ai_client
|
|
self.prompt_manager = prompt_manager
|
|
self.project_repo = project_repo
|
|
self.content_repo = content_repo
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3.2: Implement Stage 1 - Title Generation
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
def generate_title(self, project_id: int, debug: bool = False) -> str:
|
|
"""
|
|
Generate SEO-optimized title
|
|
|
|
Returns: title string
|
|
Saves to debug_output/title_project_{id}_{timestamp}.txt if debug=True
|
|
"""
|
|
# Fetch project
|
|
# Load prompt
|
|
# Call AI
|
|
# If debug: save response to debug_output/
|
|
# Return title
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3.3: Implement Stage 2 - Outline Generation
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
def generate_outline(
|
|
self,
|
|
project_id: int,
|
|
title: str,
|
|
min_h2: int,
|
|
max_h2: int,
|
|
min_h3: int,
|
|
max_h3: int,
|
|
debug: bool = False
|
|
) -> dict:
|
|
"""
|
|
Generate article outline in JSON format
|
|
|
|
Returns: {"outline": [{"h2": "...", "h3": ["...", "..."]}]}
|
|
|
|
Uses json_mode=True in AI call to ensure JSON response
|
|
Validates: at least min_h2 headings, at least min_h3 total subheadings
|
|
Saves to debug_output/outline_project_{id}_{timestamp}.json if debug=True
|
|
"""
|
|
pass
|
|
```
|
|
|
|
**Validation**:
|
|
- Parse JSON response
|
|
- Count h2 tags (must be >= min_h2)
|
|
- Count total h3 tags across all h2s (must be >= min_h3)
|
|
- Raise error if validation fails
|
|
|
|
---
|
|
|
|
### Task 3.4: Implement Stage 3 - Content Generation
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
def generate_content(
|
|
self,
|
|
project_id: int,
|
|
title: str,
|
|
outline: dict,
|
|
debug: bool = False
|
|
) -> str:
|
|
"""
|
|
Generate full article HTML fragment
|
|
|
|
Returns: HTML string with <h2>, <h3>, <p> tags
|
|
Does NOT include <html>, <head>, or <body> tags
|
|
|
|
Saves to debug_output/content_project_{id}_{timestamp}.html if debug=True
|
|
"""
|
|
pass
|
|
```
|
|
|
|
**HTML fragment format**:
|
|
```html
|
|
<h2>First Heading</h2>
|
|
<p>Paragraph content...</p>
|
|
<h3>Subheading</h3>
|
|
<p>More content...</p>
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3.5: Implement Word Count Validation
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
def validate_word_count(self, content: str, min_words: int, max_words: int) -> tuple[bool, int]:
|
|
"""
|
|
Validate content word count
|
|
|
|
Returns: (is_valid, actual_count)
|
|
- is_valid: True if min_words <= actual_count <= max_words
|
|
- actual_count: number of words in content
|
|
|
|
Implementation: Strip HTML tags, split on whitespace, count tokens
|
|
"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3.6: Implement Simple Augmentation
|
|
**File**: `src/generation/service.py`
|
|
|
|
```python
|
|
def augment_content(
|
|
self,
|
|
content: str,
|
|
target_word_count: int,
|
|
debug: bool = False
|
|
) -> str:
|
|
"""
|
|
Expand article content to meet minimum word count
|
|
|
|
Called ONLY if word_count < min_word_count
|
|
Makes ONE API call only
|
|
|
|
Saves to debug_output/augmented_project_{id}_{timestamp}.html if debug=True
|
|
"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
## **PHASE 4: Batch Processing**
|
|
|
|
### Task 4.1: Create JobConfig Parser
|
|
**File**: `src/generation/job_config.py`
|
|
|
|
```python
|
|
from dataclasses import dataclass
|
|
from typing import Optional
|
|
|
|
TIER_DEFAULTS = {
|
|
"tier1": {
|
|
"min_word_count": 2000,
|
|
"max_word_count": 2500,
|
|
"min_h2_tags": 3,
|
|
"max_h2_tags": 5,
|
|
"min_h3_tags": 5,
|
|
"max_h3_tags": 10
|
|
},
|
|
"tier2": {
|
|
"min_word_count": 1500,
|
|
"max_word_count": 2000,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 4,
|
|
"min_h3_tags": 3,
|
|
"max_h3_tags": 8
|
|
},
|
|
"tier3": {
|
|
"min_word_count": 1000,
|
|
"max_word_count": 1500,
|
|
"min_h2_tags": 2,
|
|
"max_h2_tags": 3,
|
|
"min_h3_tags": 2,
|
|
"max_h3_tags": 6
|
|
}
|
|
}
|
|
|
|
@dataclass
|
|
class TierConfig:
|
|
count: int
|
|
min_word_count: int
|
|
max_word_count: int
|
|
min_h2_tags: int
|
|
max_h2_tags: int
|
|
min_h3_tags: int
|
|
max_h3_tags: int
|
|
|
|
@dataclass
|
|
class Job:
|
|
project_id: int
|
|
tiers: dict[str, TierConfig]
|
|
|
|
class JobConfig:
|
|
def __init__(self, job_file_path: str):
|
|
"""Load and parse job file, apply defaults"""
|
|
pass
|
|
|
|
def get_jobs(self) -> list[Job]:
|
|
"""Return list of all jobs in file"""
|
|
pass
|
|
|
|
def get_tier_config(self, job: Job, tier_name: str) -> Optional[TierConfig]:
|
|
"""Get tier config with defaults applied"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 4.2: Create BatchProcessor
|
|
**File**: `src/generation/batch_processor.py`
|
|
|
|
```python
|
|
class BatchProcessor:
|
|
def __init__(
|
|
self,
|
|
content_generator: ContentGenerator,
|
|
content_repo: GeneratedContentRepository,
|
|
project_repo: ProjectRepository
|
|
):
|
|
pass
|
|
|
|
def process_job(
|
|
self,
|
|
job_file_path: str,
|
|
debug: bool = False,
|
|
continue_on_error: bool = False
|
|
):
|
|
"""
|
|
Process all jobs in job file
|
|
|
|
For each job:
|
|
0. Validate project configuration (fail fast if invalid)
|
|
- Check project exists
|
|
- Validate money_site_url is set (required for tiered linking strategy)
|
|
For each tier:
|
|
For count times:
|
|
1. Generate title (log to console)
|
|
2. Generate outline
|
|
3. Generate content
|
|
4. Validate word count
|
|
5. If below min, augment once
|
|
6. Save to GeneratedContent table
|
|
|
|
Logs progress to console
|
|
If debug=True, saves AI responses to debug_output/
|
|
"""
|
|
pass
|
|
```
|
|
|
|
**Console output format**:
|
|
```
|
|
Processing Job 1/3: Project ID 5
|
|
Tier 1: Generating 5 articles
|
|
[1/5] Generating title... "Ultimate Guide to SEO in 2025"
|
|
[1/5] Generating outline... 4 H2s, 8 H3s
|
|
[1/5] Generating content... 1,845 words
|
|
[1/5] Below minimum (2000), augmenting... 2,123 words
|
|
[1/5] Saved (ID: 42, Status: augmented)
|
|
[2/5] Generating title... "Advanced SEO Techniques"
|
|
...
|
|
Tier 2: Generating 10 articles
|
|
...
|
|
|
|
Summary:
|
|
Jobs processed: 3/3
|
|
Articles generated: 45/45
|
|
Augmented: 12
|
|
Failed: 0
|
|
```
|
|
|
|
---
|
|
|
|
### Task 4.3: Error Handling & Retry Logic
|
|
**File**: `src/generation/batch_processor.py`
|
|
|
|
**Error handling strategy**:
|
|
- Project validation errors: Fail fast before generation starts
|
|
- Missing project: Abort with clear error
|
|
- Missing money_site_url: Abort with clear error (required for all jobs)
|
|
- AI API errors: Log error, mark as `status='failed'`, save to DB
|
|
- If `continue_on_error=True`: continue to next article
|
|
- If `continue_on_error=False`: stop batch processing
|
|
- Database errors: Always abort (data integrity)
|
|
- Invalid job file: Fail fast with validation error
|
|
|
|
**Retry logic** (in AIClient):
|
|
- Network errors: 3 retries with exponential backoff (1s, 2s, 4s)
|
|
- Rate limit errors: Respect Retry-After header
|
|
- Other errors: No retry, raise immediately
|
|
|
|
---
|
|
|
|
## **PHASE 5: CLI Integration**
|
|
|
|
### Task 5.1: Add generate-batch Command
|
|
**File**: `src/cli/commands.py`
|
|
|
|
```python
|
|
@app.command("generate-batch")
|
|
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True),
|
|
help='Path to job JSON file')
|
|
@click.option('--username', '-u', help='Username for authentication')
|
|
@click.option('--password', '-p', help='Password for authentication')
|
|
@click.option('--debug', is_flag=True, help='Save AI responses to debug_output/')
|
|
@click.option('--continue-on-error', is_flag=True,
|
|
help='Continue processing if article generation fails')
|
|
@click.option('--model', '-m', default='gpt-4o-mini',
|
|
help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
|
|
def generate_batch(
|
|
job_file: str,
|
|
username: Optional[str],
|
|
password: Optional[str],
|
|
debug: bool,
|
|
continue_on_error: bool,
|
|
model: str
|
|
):
|
|
"""Generate content batch from job file"""
|
|
# Authenticate user
|
|
# Initialize AIClient with OpenRouter
|
|
# Initialize PromptManager, ContentGenerator, BatchProcessor
|
|
# Call process_job()
|
|
# Show summary
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 5.2: Add Progress Logging & Debug Output
|
|
**File**: `src/generation/batch_processor.py`
|
|
|
|
**Debug output** (when `--debug` flag used):
|
|
- Create `debug_output/` directory if not exists
|
|
- For each AI call, save response to file:
|
|
- `debug_output/title_project{id}_tier{tier}_{n}_{timestamp}.txt`
|
|
- `debug_output/outline_project{id}_tier{tier}_{n}_{timestamp}.json`
|
|
- `debug_output/content_project{id}_tier{tier}_{n}_{timestamp}.html`
|
|
- `debug_output/augmented_project{id}_tier{tier}_{n}_{timestamp}.html`
|
|
- Also echo to console with `click.echo()`
|
|
|
|
**Normal output** (without `--debug`):
|
|
- Always show title when generated: `"Generated title: {title}"`
|
|
- Show word counts and status
|
|
- Show progress counter `[n/total]`
|
|
|
|
---
|
|
|
|
## **PHASE 6: Testing & Validation**
|
|
|
|
### Task 6.1: Create Unit Tests
|
|
|
|
#### `tests/unit/test_ai_client.py`
|
|
```python
|
|
def test_generate_completion_success():
|
|
"""Test successful AI completion"""
|
|
pass
|
|
|
|
def test_generate_completion_json_mode():
|
|
"""Test JSON mode returns valid JSON"""
|
|
pass
|
|
|
|
def test_generate_completion_retry_on_network_error():
|
|
"""Test retry logic for network errors"""
|
|
pass
|
|
```
|
|
|
|
#### `tests/unit/test_content_generator.py`
|
|
```python
|
|
def test_generate_title():
|
|
"""Test title generation with mocked AI response"""
|
|
pass
|
|
|
|
def test_generate_outline_valid_structure():
|
|
"""Test outline generation returns valid JSON with min h2/h3"""
|
|
pass
|
|
|
|
def test_generate_content_html_fragment():
|
|
"""Test content is HTML fragment (no <html> tag)"""
|
|
pass
|
|
|
|
def test_validate_word_count():
|
|
"""Test word count validation with various HTML inputs"""
|
|
pass
|
|
|
|
def test_augment_content_called_once():
|
|
"""Test augmentation only called once"""
|
|
pass
|
|
```
|
|
|
|
#### `tests/unit/test_job_config.py`
|
|
```python
|
|
def test_load_job_config_valid():
|
|
"""Test loading valid job file"""
|
|
pass
|
|
|
|
def test_tier_defaults_applied():
|
|
"""Test defaults applied when not in job file"""
|
|
pass
|
|
|
|
def test_multiple_jobs_in_file():
|
|
"""Test parsing file with multiple jobs"""
|
|
pass
|
|
```
|
|
|
|
#### `tests/unit/test_batch_processor.py`
|
|
```python
|
|
def test_process_job_success():
|
|
"""Test successful batch processing"""
|
|
pass
|
|
|
|
def test_process_job_with_augmentation():
|
|
"""Test articles below min word count are augmented"""
|
|
pass
|
|
|
|
def test_process_job_continue_on_error():
|
|
"""Test continue_on_error flag behavior"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 6.2: Create Integration Test
|
|
**File**: `tests/integration/test_generate_batch.py`
|
|
|
|
```python
|
|
def test_generate_batch_end_to_end(test_db, mock_ai_client):
|
|
"""
|
|
End-to-end test:
|
|
1. Create test project in DB
|
|
2. Create test job file
|
|
3. Run batch processor
|
|
4. Verify GeneratedContent records created
|
|
5. Verify word counts within range
|
|
6. Verify HTML structure
|
|
"""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
### Task 6.3: Create Example Job Files
|
|
|
|
#### `jobs/example_tier1_batch.json`
|
|
```json
|
|
{
|
|
"jobs": [
|
|
{
|
|
"project_id": 1,
|
|
"tiers": {
|
|
"tier1": {
|
|
"count": 5
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
(Uses all defaults for tier1)
|
|
|
|
#### `jobs/example_multi_tier_batch.json`
|
|
```json
|
|
{
|
|
"jobs": [
|
|
{
|
|
"project_id": 1,
|
|
"tiers": {
|
|
"tier1": {
|
|
"count": 5,
|
|
"min_word_count": 2200,
|
|
"max_word_count": 2600
|
|
},
|
|
"tier2": {
|
|
"count": 10
|
|
},
|
|
"tier3": {
|
|
"count": 15,
|
|
"max_h2_tags": 4
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"project_id": 2,
|
|
"tiers": {
|
|
"tier1": {
|
|
"count": 3
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### `jobs/README.md`
|
|
Document job file format and examples
|
|
|
|
---
|
|
|
|
## **PHASE 7: Cleanup & Deprecation**
|
|
|
|
### Task 7.1: Remove Old ContentRuleEngine
|
|
**Action**: Delete or gut `src/generation/rule_engine.py`
|
|
|
|
Only keep if it has reusable utilities. Otherwise remove entirely.
|
|
|
|
---
|
|
|
|
### Task 7.2: Remove Old Validator Logic
|
|
**Action**: Review `src/generation/validator.py` (if exists)
|
|
|
|
Remove any strict CORA validation beyond word count. Keep only simple validation utilities.
|
|
|
|
---
|
|
|
|
### Task 7.3: Update Documentation
|
|
**Files to update**:
|
|
- `docs/stories/story-2.2. simplified-ai-content-generation.md` - Status to "In Progress" → "Done"
|
|
- `docs/architecture/workflows.md` - Document simplified generation flow
|
|
- `docs/architecture/components.md` - Update generation component description
|
|
|
|
---
|
|
|
|
## Implementation Order Recommendation
|
|
|
|
1. **Phase 1** (Data Layer) - Required foundation
|
|
2. **Phase 2** (AI Client) - Required for generation
|
|
3. **Phase 3** (Core Logic) - Implement one stage at a time, test each
|
|
4. **Phase 4** (Batch Processing) - Orchestrate stages
|
|
5. **Phase 5** (CLI) - Make accessible to users
|
|
6. **Phase 6** (Testing) - Can be done in parallel with implementation
|
|
7. **Phase 7** (Cleanup) - Final polish
|
|
|
|
**Estimated effort**:
|
|
- Phase 1-2: 4-6 hours
|
|
- Phase 3: 6-8 hours
|
|
- Phase 4: 3-4 hours
|
|
- Phase 5: 2-3 hours
|
|
- Phase 6: 4-6 hours
|
|
- Phase 7: 1-2 hours
|
|
- **Total**: 20-29 hours
|
|
|
|
---
|
|
|
|
## Critical Dev Notes
|
|
|
|
### OpenRouter Specifics
|
|
- API key from environment: `OPENROUTER_API_KEY`
|
|
- Model format: `"provider/model-name"`
|
|
- Supports OpenAI SDK drop-in replacement
|
|
- Rate limits vary by model (check OpenRouter docs)
|
|
|
|
### HTML Fragment Format
|
|
Content generation returns HTML like:
|
|
```html
|
|
<h2>Main Topic</h2>
|
|
<p>Introduction paragraph with relevant keywords and entities.</p>
|
|
<h3>Subtopic One</h3>
|
|
<p>Detailed content about subtopic.</p>
|
|
<h3>Subtopic Two</h3>
|
|
<p>More detailed content.</p>
|
|
<h2>Second Main Topic</h2>
|
|
<p>Content continues...</p>
|
|
```
|
|
|
|
**No document structure**: No `<!DOCTYPE>`, `<html>`, `<head>`, or `<body>` tags.
|
|
|
|
### Word Count Method
|
|
```python
|
|
import re
|
|
from html import unescape
|
|
|
|
def count_words(html_content: str) -> int:
|
|
# Strip HTML tags
|
|
text = re.sub(r'<[^>]+>', '', html_content)
|
|
# Unescape HTML entities
|
|
text = unescape(text)
|
|
# Split and count
|
|
words = text.split()
|
|
return len(words)
|
|
```
|
|
|
|
### Debug Output Directory
|
|
- Create `debug_output/` at project root if not exists
|
|
- Add to `.gitignore`
|
|
- Filename format: `{stage}_project{id}_tier{tier}_article{n}_{timestamp}.{ext}`
|
|
- Example: `title_project5_tier1_article3_20251020_143022.txt`
|
|
|
|
### Tier Constants Location
|
|
Define in `src/generation/job_config.py` as module-level constant for easy reference.
|
|
|
|
### Future Extensibility
|
|
Job file structure designed to support:
|
|
- Custom interlinking rules (Story 2.4+)
|
|
- Template selection (Story 3.x)
|
|
- Deployment targets (Story 4.x)
|
|
- SEO metadata overrides
|
|
|
|
Keep job parsing flexible to add new fields without breaking existing jobs.
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Test Mocking
|
|
Mock `AIClient.generate_completion()` to return realistic HTML:
|
|
```python
|
|
@pytest.fixture
|
|
def mock_title_response():
|
|
return "The Ultimate Guide to Sustainable Gardening in 2025"
|
|
|
|
@pytest.fixture
|
|
def mock_outline_response():
|
|
return {
|
|
"outline": [
|
|
{"h2": "Getting Started", "h3": ["Tools", "Planning"]},
|
|
{"h2": "Best Practices", "h3": ["Watering", "Composting"]}
|
|
]
|
|
}
|
|
|
|
@pytest.fixture
|
|
def mock_content_response():
|
|
return """<h2>Getting Started</h2>
|
|
<p>Sustainable gardening begins with proper planning...</p>
|
|
<h3>Tools</h3>
|
|
<p>Essential tools include...</p>"""
|
|
```
|
|
|
|
### Integration Test Database
|
|
Use `conftest.py` fixture with in-memory SQLite and test data:
|
|
```python
|
|
@pytest.fixture
|
|
def test_project(test_db):
|
|
project_repo = ProjectRepository(test_db)
|
|
return project_repo.create(
|
|
user_id=1,
|
|
name="Test Project",
|
|
data={
|
|
"main_keyword": "sustainable gardening",
|
|
"entities": ["composting", "organic soil"],
|
|
"related_searches": ["how to compost", "organic gardening tips"]
|
|
}
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
Story is complete when:
|
|
1. All database models and repositories implemented
|
|
2. AIClient successfully calls OpenRouter API
|
|
3. Three-stage generation pipeline works end-to-end
|
|
4. Batch processor handles multiple jobs/tiers
|
|
5. CLI command `generate-batch` functional
|
|
6. Debug output saves to `debug_output/` when `--debug` used
|
|
7. All unit tests pass
|
|
8. Integration test demonstrates full workflow
|
|
9. Example job files work correctly
|
|
10. Documentation updated
|
|
|
|
**Acceptance**: Run `generate-batch` on real project, verify content saved to database with correct word count and structure.
|
|
|