Passed initial test, generates an entire article
parent
d81537f1bf
commit
ef62ecf852
|
|
@ -16,4 +16,7 @@ __pycache__/
|
|||
.vscode/
|
||||
.idea/
|
||||
|
||||
*.xlsx
|
||||
*.xlsx
|
||||
|
||||
# Debug output
|
||||
debug_output/
|
||||
|
|
@ -0,0 +1,199 @@
|
|||
# Story 2.2 Implementation Summary
|
||||
|
||||
## Overview
|
||||
Successfully implemented simplified AI content generation via batch jobs using OpenRouter API.
|
||||
|
||||
## Completed Phases
|
||||
|
||||
### Phase 1: Data Model & Schema Design
|
||||
- ✅ Added `GeneratedContent` model to `src/database/models.py`
|
||||
- ✅ Created `GeneratedContentRepository` in `src/database/repositories.py`
|
||||
- ✅ Updated `scripts/init_db.py` (automatic table creation via Base.metadata)
|
||||
|
||||
### Phase 2: AI Client & Prompt Management
|
||||
- ✅ Created `src/generation/ai_client.py` with:
|
||||
- `AIClient` class for OpenRouter API integration
|
||||
- `PromptManager` class for template loading
|
||||
- Retry logic with exponential backoff
|
||||
- ✅ Created prompt templates in `src/generation/prompts/`:
|
||||
- `title_generation.json`
|
||||
- `outline_generation.json`
|
||||
- `content_generation.json`
|
||||
- `content_augmentation.json`
|
||||
|
||||
### Phase 3: Core Generation Pipeline
|
||||
- ✅ Implemented `ContentGenerator` in `src/generation/service.py` with:
|
||||
- `generate_title()` - Stage 1
|
||||
- `generate_outline()` - Stage 2 with JSON validation
|
||||
- `generate_content()` - Stage 3
|
||||
- `validate_word_count()` - Word count validation
|
||||
- `augment_content()` - Simple augmentation
|
||||
- `count_words()` - HTML-aware word counting
|
||||
- Debug output support
|
||||
|
||||
### Phase 4: Batch Processing
|
||||
- ✅ Created `src/generation/job_config.py` with:
|
||||
- `JobConfig` parser with tier defaults
|
||||
- `TierConfig` and `Job` dataclasses
|
||||
- JSON validation
|
||||
- ✅ Created `src/generation/batch_processor.py` with:
|
||||
- `BatchProcessor` class
|
||||
- Progress logging to console
|
||||
- Error handling and continue-on-error support
|
||||
- Statistics tracking
|
||||
|
||||
### Phase 5: CLI Integration
|
||||
- ✅ Added `generate-batch` command to `src/cli/commands.py`
|
||||
- ✅ Command options:
|
||||
- `--job-file` (required)
|
||||
- `--username` / `--password` for authentication
|
||||
- `--debug` for saving AI responses
|
||||
- `--continue-on-error` flag
|
||||
- `--model` selection (default: gpt-4o-mini)
|
||||
|
||||
### Phase 6: Testing & Validation
|
||||
- ✅ Created unit tests:
|
||||
- `tests/unit/test_job_config.py` (9 tests)
|
||||
- `tests/unit/test_content_generator.py` (9 tests)
|
||||
- ✅ Created integration test stub:
|
||||
- `tests/integration/test_generate_batch.py` (2 tests)
|
||||
- ✅ Created example job files:
|
||||
- `jobs/example_tier1_batch.json`
|
||||
- `jobs/example_multi_tier_batch.json`
|
||||
- `jobs/README.md` (comprehensive documentation)
|
||||
|
||||
### Phase 7: Cleanup & Documentation
|
||||
- ✅ Deprecated old `src/generation/rule_engine.py`
|
||||
- ✅ Updated documentation:
|
||||
- `docs/architecture/workflows.md` - Added generation workflow diagram
|
||||
- `docs/architecture/components.md` - Updated generation module description
|
||||
- `docs/architecture/data-models.md` - Updated GeneratedContent model
|
||||
- `docs/stories/story-2.2. simplified-ai-content-generation.md` - Marked as Completed
|
||||
- ✅ Updated `.gitignore` to exclude `debug_output/`
|
||||
- ✅ Updated `env.example` with `OPENROUTER_API_KEY`
|
||||
|
||||
## Key Files Created/Modified
|
||||
|
||||
### New Files (17)
|
||||
```
|
||||
src/generation/ai_client.py
|
||||
src/generation/service.py
|
||||
src/generation/job_config.py
|
||||
src/generation/batch_processor.py
|
||||
src/generation/prompts/title_generation.json
|
||||
src/generation/prompts/outline_generation.json
|
||||
src/generation/prompts/content_generation.json
|
||||
src/generation/prompts/content_augmentation.json
|
||||
jobs/example_tier1_batch.json
|
||||
jobs/example_multi_tier_batch.json
|
||||
jobs/README.md
|
||||
tests/unit/test_job_config.py
|
||||
tests/unit/test_content_generator.py
|
||||
tests/integration/test_generate_batch.py
|
||||
IMPLEMENTATION_SUMMARY.md
|
||||
```
|
||||
|
||||
### Modified Files (7)
|
||||
```
|
||||
src/database/models.py (added GeneratedContent model)
|
||||
src/database/repositories.py (added GeneratedContentRepository)
|
||||
src/cli/commands.py (added generate-batch command)
|
||||
src/generation/rule_engine.py (deprecated)
|
||||
docs/architecture/workflows.md (updated)
|
||||
docs/architecture/components.md (updated)
|
||||
docs/architecture/data-models.md (updated)
|
||||
docs/stories/story-2.2. simplified-ai-content-generation.md (marked complete)
|
||||
.gitignore (added debug_output/)
|
||||
env.example (added OPENROUTER_API_KEY)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### 1. Set up environment
|
||||
```bash
|
||||
# Copy env.example to .env and add your OpenRouter API key
|
||||
cp env.example .env
|
||||
# Edit .env and set OPENROUTER_API_KEY
|
||||
```
|
||||
|
||||
### 2. Initialize database
|
||||
```bash
|
||||
python scripts/init_db.py
|
||||
```
|
||||
|
||||
### 3. Create a project (if not exists)
|
||||
```bash
|
||||
python main.py ingest-cora --file path/to/cora.xlsx --name "My Project"
|
||||
```
|
||||
|
||||
### 4. Run batch generation
|
||||
```bash
|
||||
python main.py generate-batch --job-file jobs/example_tier1_batch.json
|
||||
```
|
||||
|
||||
### 5. With debug output
|
||||
```bash
|
||||
python main.py generate-batch --job-file jobs/example_tier1_batch.json --debug
|
||||
```
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Three-Stage Pipeline
|
||||
1. **Title Generation**: Uses keyword + entities + related searches
|
||||
2. **Outline Generation**: JSON-formatted with H2/H3 structure, validated against min/max constraints
|
||||
3. **Content Generation**: Full HTML fragment based on outline
|
||||
|
||||
### Simplification Wins
|
||||
- No complex rule engine
|
||||
- Single word count validation (min/max from job file)
|
||||
- One-attempt augmentation if below minimum
|
||||
- Job file controls all operational parameters
|
||||
- Tier defaults for common configurations
|
||||
|
||||
### Error Handling
|
||||
- Network errors: 3 retries with exponential backoff
|
||||
- Rate limits: Respects retry-after headers
|
||||
- Failed articles: Saved with status='failed', can continue processing with `--continue-on-error`
|
||||
- Database errors: Always abort (data integrity)
|
||||
|
||||
## Testing
|
||||
|
||||
Run tests with:
|
||||
```bash
|
||||
pytest tests/unit/test_job_config.py -v
|
||||
pytest tests/unit/test_content_generator.py -v
|
||||
pytest tests/integration/test_generate_batch.py -v
|
||||
```
|
||||
|
||||
## Next Steps (Future Stories)
|
||||
|
||||
- Story 2.3: Interlinking integration
|
||||
- Story 3.x: Template selection
|
||||
- Story 4.x: Deployment integration
|
||||
- Expand test coverage (currently basic tests only)
|
||||
|
||||
## Success Criteria Met
|
||||
|
||||
All acceptance criteria from Story 2.2 have been met:
|
||||
|
||||
✅ 1. Batch Job Control - Job file specifies all tier parameters
|
||||
✅ 2. Three-Stage Generation - Title → Outline → Content pipeline
|
||||
✅ 3. SEO Data Integration - Keyword, entities, related searches used in all stages
|
||||
✅ 4. Word Count Validation - Validates against min/max from job file
|
||||
✅ 5. Simple Augmentation - Single attempt if below minimum
|
||||
✅ 6. Database Storage - GeneratedContent table with all required fields
|
||||
✅ 7. CLI Execution - generate-batch command with progress logging
|
||||
|
||||
## Estimated Implementation Time
|
||||
- Total: ~20-29 hours (as estimated in task breakdown)
|
||||
- Actual: Completed in single session with comprehensive implementation
|
||||
|
||||
## Notes
|
||||
|
||||
- OpenRouter API key required in environment
|
||||
- Debug output saved to `debug_output/` when `--debug` flag used
|
||||
- Job files support multiple projects and tiers
|
||||
- Tier defaults can be fully or partially overridden
|
||||
- HTML output is fragment format (no <html>, <head>, or <body> tags)
|
||||
- Word count strips HTML tags and counts text words only
|
||||
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
from src.database.session import db_manager
|
||||
from src.database.models import GeneratedContent
|
||||
import json
|
||||
|
||||
s = db_manager.get_session()
|
||||
gc = s.query(GeneratedContent).order_by(GeneratedContent.id.desc()).first()
|
||||
|
||||
if gc:
|
||||
print(f"Content ID: {gc.id}")
|
||||
print(f"Stage: {gc.generation_stage}")
|
||||
print(f"Status: {gc.status}")
|
||||
print(f"Outline attempts: {gc.outline_attempts}")
|
||||
print(f"Error: {gc.error_message}")
|
||||
|
||||
if gc.outline:
|
||||
outline = json.loads(gc.outline)
|
||||
sections = outline.get("sections", [])
|
||||
print(f"\nOutline:")
|
||||
print(f"H2 count: {len(sections)}")
|
||||
h3_count = sum(len(s.get('h3s', [])) for s in sections)
|
||||
print(f"H3 count: {h3_count}")
|
||||
|
||||
has_faq = any("faq" in s["h2"].lower() or "question" in s["h2"].lower() for s in sections)
|
||||
print(f"Has FAQ: {has_faq}")
|
||||
|
||||
print(f"\nH2s:")
|
||||
for s in sections:
|
||||
print(f" - {s['h2']} ({len(s.get('h3s', []))} H3s)")
|
||||
else:
|
||||
print("\nNo outline saved")
|
||||
else:
|
||||
print("No content found")
|
||||
|
||||
s.close()
|
||||
|
||||
|
||||
Binary file not shown.
|
|
@ -20,7 +20,14 @@ Manages user authentication, password hashing, and role-based access control log
|
|||
Responsible for parsing the CORA .xlsx files and creating new Project entries in the database.
|
||||
|
||||
### generation
|
||||
Interacts with the AI service API. It takes project data, constructs prompts, and retrieves the generated text. Includes the Content Rule Engine for validation.
|
||||
Interacts with the AI service API (OpenRouter). Implements a simplified three-stage pipeline:
|
||||
- **AIClient**: Handles OpenRouter API calls with retry logic
|
||||
- **PromptManager**: Loads and formats prompt templates from JSON files
|
||||
- **ContentGenerator**: Orchestrates title, outline, and content generation
|
||||
- **BatchProcessor**: Processes job files and manages multi-tier batch generation
|
||||
- **JobConfig**: Parses job configuration files with tier defaults
|
||||
|
||||
The generation module uses SEO data from the Project table (keyword, entities, related searches) to inform all stages of content generation. Validates word count and performs simple augmentation if content is below minimum threshold.
|
||||
|
||||
### templating
|
||||
Takes raw generated text and applies the appropriate HTML/CSS template based on the project's configuration.
|
||||
|
|
|
|||
|
|
@ -29,20 +29,28 @@ The following data models will be implemented using SQLAlchemy.
|
|||
|
||||
## 3. GeneratedContent
|
||||
|
||||
**Purpose**: Stores the AI-generated content and its final deployed state.
|
||||
**Purpose**: Stores the AI-generated content from the three-stage pipeline.
|
||||
|
||||
**Key Attributes**:
|
||||
- `id`: Integer, Primary Key
|
||||
- `project_id`: Integer, Foreign Key to Project
|
||||
- `title`: Text
|
||||
- `outline`: Text
|
||||
- `body_text`: Text
|
||||
- `final_html`: Text
|
||||
- `deployed_url`: String, Unique
|
||||
- `tier`: String (for link classification)
|
||||
- `id`: Integer, Primary Key, Auto-increment
|
||||
- `project_id`: Integer, Foreign Key to Project, Indexed
|
||||
- `tier`: String(20), Not Null, Indexed (tier1, tier2, tier3)
|
||||
- `keyword`: String(255), Not Null, Indexed
|
||||
- `title`: Text, Not Null (Generated in stage 1)
|
||||
- `outline`: JSON, Not Null (Generated in stage 2)
|
||||
- `content`: Text, Not Null (HTML fragment from stage 3)
|
||||
- `word_count`: Integer, Not Null (Validated word count)
|
||||
- `status`: String(20), Not Null (generated, augmented, failed)
|
||||
- `created_at`: DateTime, Not Null
|
||||
- `updated_at`: DateTime, Not Null
|
||||
|
||||
**Relationships**: Belongs to one Project.
|
||||
|
||||
**Status Values**:
|
||||
- `generated`: Content was successfully generated within word count range
|
||||
- `augmented`: Content was below minimum and was augmented
|
||||
- `failed`: Generation failed (error details in outline JSON)
|
||||
|
||||
## 4. FqdnMapping
|
||||
|
||||
**Purpose**: Maps cloud storage buckets to fully qualified domain names for URL generation.
|
||||
|
|
|
|||
|
|
@ -1,27 +1,81 @@
|
|||
# Core Workflows
|
||||
|
||||
This sequence diagram illustrates the primary workflow for a single content generation job.
|
||||
## Content Generation Workflow (Story 2.2)
|
||||
|
||||
The simplified three-stage content generation pipeline:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant CLI
|
||||
participant Ingestion
|
||||
participant Generation
|
||||
participant Interlinking
|
||||
participant Deployment
|
||||
participant API
|
||||
participant BatchProcessor
|
||||
participant ContentGenerator
|
||||
participant AIClient
|
||||
participant Database
|
||||
|
||||
User->>CLI: run job --file report.xlsx
|
||||
CLI->>Ingestion: process_cora_file("report.xlsx")
|
||||
Ingestion-->>CLI: project_id
|
||||
CLI->>Generation: generate_content(project_id)
|
||||
Generation-->>CLI: raw_html_list
|
||||
CLI->>Interlinking: inject_links(raw_html_list)
|
||||
Interlinking-->>CLI: final_html_list
|
||||
CLI->>Deployment: deploy_batch(final_html_list)
|
||||
Deployment-->>CLI: deployed_urls
|
||||
CLI->>API: send_to_link_builder(job_data, deployed_urls)
|
||||
API-->>CLI: success
|
||||
CLI-->>User: Job Complete! URLs logged.
|
||||
User->>CLI: generate-batch --job-file jobs/example.json
|
||||
CLI->>BatchProcessor: process_job()
|
||||
|
||||
loop For each project/tier/article
|
||||
BatchProcessor->>ContentGenerator: generate_title(project_id)
|
||||
ContentGenerator->>AIClient: generate_completion(prompt)
|
||||
AIClient-->>ContentGenerator: title
|
||||
|
||||
BatchProcessor->>ContentGenerator: generate_outline(project_id, title)
|
||||
ContentGenerator->>AIClient: generate_completion(prompt, json_mode=true)
|
||||
AIClient-->>ContentGenerator: outline JSON
|
||||
|
||||
BatchProcessor->>ContentGenerator: generate_content(project_id, title, outline)
|
||||
ContentGenerator->>AIClient: generate_completion(prompt)
|
||||
AIClient-->>ContentGenerator: HTML content
|
||||
|
||||
BatchProcessor->>ContentGenerator: validate_word_count(content)
|
||||
|
||||
alt Below minimum word count
|
||||
BatchProcessor->>ContentGenerator: augment_content(content, target_count)
|
||||
ContentGenerator->>AIClient: generate_completion(prompt)
|
||||
AIClient-->>ContentGenerator: augmented HTML
|
||||
end
|
||||
|
||||
BatchProcessor->>Database: save GeneratedContent record
|
||||
end
|
||||
|
||||
BatchProcessor-->>CLI: Summary statistics
|
||||
CLI-->>User: Job complete
|
||||
```
|
||||
|
||||
## CORA Ingestion Workflow (Story 2.1)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant CLI
|
||||
participant Parser
|
||||
participant Database
|
||||
|
||||
User->>CLI: ingest-cora --file report.xlsx --name "Project Name"
|
||||
CLI->>Parser: parse(file_path)
|
||||
Parser-->>CLI: cora_data dict
|
||||
CLI->>Database: create Project record
|
||||
Database-->>CLI: project_id
|
||||
CLI-->>User: Project created (ID: X)
|
||||
```
|
||||
|
||||
## Deployment Workflow (Story 1.6)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant CLI
|
||||
participant BunnyNetClient
|
||||
participant Database
|
||||
|
||||
User->>CLI: provision-site --name "Site" --domain "example.com"
|
||||
CLI->>BunnyNetClient: create_storage_zone()
|
||||
BunnyNetClient-->>CLI: storage_zone_id
|
||||
CLI->>BunnyNetClient: create_pull_zone()
|
||||
BunnyNetClient-->>CLI: pull_zone_id
|
||||
CLI->>BunnyNetClient: add_custom_hostname()
|
||||
CLI->>Database: save SiteDeployment record
|
||||
CLI-->>User: Site provisioned! Configure DNS.
|
||||
```
|
||||
|
|
|
|||
|
|
@ -0,0 +1,913 @@
|
|||
# Story 2.2: Simplified AI Content Generation - Detailed Task Breakdown
|
||||
|
||||
## Overview
|
||||
This document breaks down Story 2.2 into detailed tasks with specific implementation notes.
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 1: Data Model & Schema Design**
|
||||
|
||||
### Task 1.1: Create GeneratedContent Database Model
|
||||
**File**: `src/database/models.py`
|
||||
|
||||
**Add new model class:**
|
||||
```python
|
||||
class GeneratedContent(Base):
|
||||
__tablename__ = "generated_content"
|
||||
|
||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
project_id: Mapped[int] = mapped_column(Integer, ForeignKey('projects.id'), nullable=False, index=True)
|
||||
tier: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
|
||||
keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
|
||||
title: Mapped[str] = mapped_column(Text, nullable=False)
|
||||
outline: Mapped[dict] = mapped_column(JSON, nullable=False)
|
||||
content: Mapped[str] = mapped_column(Text, nullable=False)
|
||||
word_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
status: Mapped[str] = mapped_column(String(20), nullable=False)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
|
||||
updated_at: Mapped[datetime] = mapped_column(
|
||||
DateTime,
|
||||
default=datetime.utcnow,
|
||||
onupdate=datetime.utcnow,
|
||||
nullable=False
|
||||
)
|
||||
```
|
||||
|
||||
**Status values**: `generated`, `augmented`, `failed`
|
||||
|
||||
**Update**: `scripts/init_db.py` to create the table
|
||||
|
||||
---
|
||||
|
||||
### Task 1.2: Create GeneratedContent Repository
|
||||
**File**: `src/database/repositories.py`
|
||||
|
||||
**Add repository class:**
|
||||
```python
|
||||
class GeneratedContentRepository(BaseRepository[GeneratedContent]):
|
||||
def __init__(self, session: Session):
|
||||
super().__init__(GeneratedContent, session)
|
||||
|
||||
def get_by_project_id(self, project_id: int) -> list[GeneratedContent]:
|
||||
pass
|
||||
|
||||
def get_by_project_and_tier(self, project_id: int, tier: str) -> list[GeneratedContent]:
|
||||
pass
|
||||
|
||||
def get_by_keyword(self, keyword: str) -> list[GeneratedContent]:
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 1.3: Define Job File JSON Schema
|
||||
**File**: `jobs/README.md` (create/update)
|
||||
|
||||
**Job file structure** (one project per job, multiple jobs per file):
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
},
|
||||
"tier2": {
|
||||
"count": 10,
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
},
|
||||
"tier3": {
|
||||
"count": 15,
|
||||
"min_word_count": 1000,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"project_id": 2,
|
||||
"tiers": {
|
||||
"tier1": { ... }
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Tier defaults** (constants if not specified in job file):
|
||||
```python
|
||||
TIER_DEFAULTS = {
|
||||
"tier1": {
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
},
|
||||
"tier2": {
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
},
|
||||
"tier3": {
|
||||
"min_word_count": 1000,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Future extensibility note**: This structure allows adding more fields per job in future stories.
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 2: AI Client & Prompt Management**
|
||||
|
||||
### Task 2.1: Implement AIClient for OpenRouter
|
||||
**File**: `src/generation/ai_client.py`
|
||||
|
||||
**OpenRouter API details**:
|
||||
- Base URL: `https://openrouter.ai/api/v1`
|
||||
- Compatible with OpenAI SDK
|
||||
- Requires `OPENROUTER_API_KEY` env variable
|
||||
|
||||
**Initial model list**:
|
||||
```python
|
||||
AVAILABLE_MODELS = {
|
||||
"gpt-4o-mini": "openai/gpt-4o-mini",
|
||||
"claude-sonnet-4.5": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
class AIClient:
|
||||
def __init__(self, api_key: str, model: str, base_url: str = "https://openrouter.ai/api/v1"):
|
||||
self.client = OpenAI(api_key=api_key, base_url=base_url)
|
||||
self.model = model
|
||||
|
||||
def generate_completion(
|
||||
self,
|
||||
prompt: str,
|
||||
system_message: str = None,
|
||||
max_tokens: int = 4000,
|
||||
temperature: float = 0.7,
|
||||
json_mode: bool = False
|
||||
) -> str:
|
||||
"""
|
||||
Generate completion from OpenRouter API
|
||||
json_mode: if True, adds response_format={"type": "json_object"}
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Error handling**: Retry 3x with exponential backoff for network/rate limit errors
|
||||
|
||||
---
|
||||
|
||||
### Task 2.2: Create Prompt Templates
|
||||
**Files**: `src/generation/prompts/*.json`
|
||||
|
||||
**title_generation.json**:
|
||||
```json
|
||||
{
|
||||
"system_message": "You are an expert SEO content writer...",
|
||||
"user_prompt": "Generate an SEO-optimized title for an article about: {keyword}\n\nRelated entities: {entities}\n\nRelated searches: {related_searches}\n\nReturn only the title text, no formatting."
|
||||
}
|
||||
```
|
||||
|
||||
**outline_generation.json**:
|
||||
```json
|
||||
{
|
||||
"system_message": "You are an expert content outliner...",
|
||||
"user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- {min_h2} to {max_h2} H2 headings\n- {min_h3} to {max_h3} H3 subheadings total\n\nEntities: {entities}\nRelated searches: {related_searches}\n\nReturn as JSON: {\"outline\": [{\"h2\": \"...\", \"h3\": [\"...\", \"...\"]}]}"
|
||||
}
|
||||
```
|
||||
|
||||
**content_generation.json**:
|
||||
```json
|
||||
{
|
||||
"system_message": "You are an expert content writer...",
|
||||
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include: {entities}\nRelated searches: {related_searches}\n\nReturn as HTML fragment with <h2>, <h3>, <p> tags. Do NOT include <html>, <head>, or <body> tags."
|
||||
}
|
||||
```
|
||||
|
||||
**content_augmentation.json**:
|
||||
```json
|
||||
{
|
||||
"system_message": "You are an expert content editor...",
|
||||
"user_prompt": "Please expand on the following article to add more detail and depth, ensuring you maintain the existing topical focus. Target word count: {target_word_count}\n\nCurrent article:\n{content}\n\nReturn the expanded article as an HTML fragment."
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2.3: Create PromptManager
|
||||
**File**: `src/generation/ai_client.py` (add to same file)
|
||||
|
||||
```python
|
||||
class PromptManager:
|
||||
def __init__(self, prompts_dir: str = "src/generation/prompts"):
|
||||
self.prompts_dir = prompts_dir
|
||||
self.prompts = {}
|
||||
|
||||
def load_prompt(self, prompt_name: str) -> dict:
|
||||
"""Load prompt from JSON file"""
|
||||
pass
|
||||
|
||||
def format_prompt(self, prompt_name: str, **kwargs) -> tuple[str, str]:
|
||||
"""
|
||||
Format prompt with variables
|
||||
Returns: (system_message, user_prompt)
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 3: Core Generation Pipeline**
|
||||
|
||||
### Task 3.1: Implement ContentGenerator Service
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
class ContentGenerator:
|
||||
def __init__(
|
||||
self,
|
||||
ai_client: AIClient,
|
||||
prompt_manager: PromptManager,
|
||||
project_repo: ProjectRepository,
|
||||
content_repo: GeneratedContentRepository
|
||||
):
|
||||
self.ai_client = ai_client
|
||||
self.prompt_manager = prompt_manager
|
||||
self.project_repo = project_repo
|
||||
self.content_repo = content_repo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3.2: Implement Stage 1 - Title Generation
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
def generate_title(self, project_id: int, debug: bool = False) -> str:
|
||||
"""
|
||||
Generate SEO-optimized title
|
||||
|
||||
Returns: title string
|
||||
Saves to debug_output/title_project_{id}_{timestamp}.txt if debug=True
|
||||
"""
|
||||
# Fetch project
|
||||
# Load prompt
|
||||
# Call AI
|
||||
# If debug: save response to debug_output/
|
||||
# Return title
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3.3: Implement Stage 2 - Outline Generation
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
def generate_outline(
|
||||
self,
|
||||
project_id: int,
|
||||
title: str,
|
||||
min_h2: int,
|
||||
max_h2: int,
|
||||
min_h3: int,
|
||||
max_h3: int,
|
||||
debug: bool = False
|
||||
) -> dict:
|
||||
"""
|
||||
Generate article outline in JSON format
|
||||
|
||||
Returns: {"outline": [{"h2": "...", "h3": ["...", "..."]}]}
|
||||
|
||||
Uses json_mode=True in AI call to ensure JSON response
|
||||
Validates: at least min_h2 headings, at least min_h3 total subheadings
|
||||
Saves to debug_output/outline_project_{id}_{timestamp}.json if debug=True
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Validation**:
|
||||
- Parse JSON response
|
||||
- Count h2 tags (must be >= min_h2)
|
||||
- Count total h3 tags across all h2s (must be >= min_h3)
|
||||
- Raise error if validation fails
|
||||
|
||||
---
|
||||
|
||||
### Task 3.4: Implement Stage 3 - Content Generation
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
def generate_content(
|
||||
self,
|
||||
project_id: int,
|
||||
title: str,
|
||||
outline: dict,
|
||||
debug: bool = False
|
||||
) -> str:
|
||||
"""
|
||||
Generate full article HTML fragment
|
||||
|
||||
Returns: HTML string with <h2>, <h3>, <p> tags
|
||||
Does NOT include <html>, <head>, or <body> tags
|
||||
|
||||
Saves to debug_output/content_project_{id}_{timestamp}.html if debug=True
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**HTML fragment format**:
|
||||
```html
|
||||
<h2>First Heading</h2>
|
||||
<p>Paragraph content...</p>
|
||||
<h3>Subheading</h3>
|
||||
<p>More content...</p>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3.5: Implement Word Count Validation
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
def validate_word_count(self, content: str, min_words: int, max_words: int) -> tuple[bool, int]:
|
||||
"""
|
||||
Validate content word count
|
||||
|
||||
Returns: (is_valid, actual_count)
|
||||
- is_valid: True if min_words <= actual_count <= max_words
|
||||
- actual_count: number of words in content
|
||||
|
||||
Implementation: Strip HTML tags, split on whitespace, count tokens
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3.6: Implement Simple Augmentation
|
||||
**File**: `src/generation/service.py`
|
||||
|
||||
```python
|
||||
def augment_content(
|
||||
self,
|
||||
content: str,
|
||||
target_word_count: int,
|
||||
debug: bool = False
|
||||
) -> str:
|
||||
"""
|
||||
Expand article content to meet minimum word count
|
||||
|
||||
Called ONLY if word_count < min_word_count
|
||||
Makes ONE API call only
|
||||
|
||||
Saves to debug_output/augmented_project_{id}_{timestamp}.html if debug=True
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 4: Batch Processing**
|
||||
|
||||
### Task 4.1: Create JobConfig Parser
|
||||
**File**: `src/generation/job_config.py`
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
TIER_DEFAULTS = {
|
||||
"tier1": {
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
},
|
||||
"tier2": {
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
},
|
||||
"tier3": {
|
||||
"min_word_count": 1000,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
}
|
||||
|
||||
@dataclass
|
||||
class TierConfig:
|
||||
count: int
|
||||
min_word_count: int
|
||||
max_word_count: int
|
||||
min_h2_tags: int
|
||||
max_h2_tags: int
|
||||
min_h3_tags: int
|
||||
max_h3_tags: int
|
||||
|
||||
@dataclass
|
||||
class Job:
|
||||
project_id: int
|
||||
tiers: dict[str, TierConfig]
|
||||
|
||||
class JobConfig:
|
||||
def __init__(self, job_file_path: str):
|
||||
"""Load and parse job file, apply defaults"""
|
||||
pass
|
||||
|
||||
def get_jobs(self) -> list[Job]:
|
||||
"""Return list of all jobs in file"""
|
||||
pass
|
||||
|
||||
def get_tier_config(self, job: Job, tier_name: str) -> Optional[TierConfig]:
|
||||
"""Get tier config with defaults applied"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4.2: Create BatchProcessor
|
||||
**File**: `src/generation/batch_processor.py`
|
||||
|
||||
```python
|
||||
class BatchProcessor:
|
||||
def __init__(
|
||||
self,
|
||||
content_generator: ContentGenerator,
|
||||
content_repo: GeneratedContentRepository,
|
||||
project_repo: ProjectRepository
|
||||
):
|
||||
pass
|
||||
|
||||
def process_job(
|
||||
self,
|
||||
job_file_path: str,
|
||||
debug: bool = False,
|
||||
continue_on_error: bool = False
|
||||
):
|
||||
"""
|
||||
Process all jobs in job file
|
||||
|
||||
For each job:
|
||||
For each tier:
|
||||
For count times:
|
||||
1. Generate title (log to console)
|
||||
2. Generate outline
|
||||
3. Generate content
|
||||
4. Validate word count
|
||||
5. If below min, augment once
|
||||
6. Save to GeneratedContent table
|
||||
|
||||
Logs progress to console
|
||||
If debug=True, saves AI responses to debug_output/
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Console output format**:
|
||||
```
|
||||
Processing Job 1/3: Project ID 5
|
||||
Tier 1: Generating 5 articles
|
||||
[1/5] Generating title... "Ultimate Guide to SEO in 2025"
|
||||
[1/5] Generating outline... 4 H2s, 8 H3s
|
||||
[1/5] Generating content... 1,845 words
|
||||
[1/5] Below minimum (2000), augmenting... 2,123 words
|
||||
[1/5] Saved (ID: 42, Status: augmented)
|
||||
[2/5] Generating title... "Advanced SEO Techniques"
|
||||
...
|
||||
Tier 2: Generating 10 articles
|
||||
...
|
||||
|
||||
Summary:
|
||||
Jobs processed: 3/3
|
||||
Articles generated: 45/45
|
||||
Augmented: 12
|
||||
Failed: 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4.3: Error Handling & Retry Logic
|
||||
**File**: `src/generation/batch_processor.py`
|
||||
|
||||
**Error handling strategy**:
|
||||
- AI API errors: Log error, mark as `status='failed'`, save to DB
|
||||
- If `continue_on_error=True`: continue to next article
|
||||
- If `continue_on_error=False`: stop batch processing
|
||||
- Database errors: Always abort (data integrity)
|
||||
- Invalid job file: Fail fast with validation error
|
||||
|
||||
**Retry logic** (in AIClient):
|
||||
- Network errors: 3 retries with exponential backoff (1s, 2s, 4s)
|
||||
- Rate limit errors: Respect Retry-After header
|
||||
- Other errors: No retry, raise immediately
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 5: CLI Integration**
|
||||
|
||||
### Task 5.1: Add generate-batch Command
|
||||
**File**: `src/cli/commands.py`
|
||||
|
||||
```python
|
||||
@app.command("generate-batch")
|
||||
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True),
|
||||
help='Path to job JSON file')
|
||||
@click.option('--username', '-u', help='Username for authentication')
|
||||
@click.option('--password', '-p', help='Password for authentication')
|
||||
@click.option('--debug', is_flag=True, help='Save AI responses to debug_output/')
|
||||
@click.option('--continue-on-error', is_flag=True,
|
||||
help='Continue processing if article generation fails')
|
||||
@click.option('--model', '-m', default='gpt-4o-mini',
|
||||
help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
|
||||
def generate_batch(
|
||||
job_file: str,
|
||||
username: Optional[str],
|
||||
password: Optional[str],
|
||||
debug: bool,
|
||||
continue_on_error: bool,
|
||||
model: str
|
||||
):
|
||||
"""Generate content batch from job file"""
|
||||
# Authenticate user
|
||||
# Initialize AIClient with OpenRouter
|
||||
# Initialize PromptManager, ContentGenerator, BatchProcessor
|
||||
# Call process_job()
|
||||
# Show summary
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5.2: Add Progress Logging & Debug Output
|
||||
**File**: `src/generation/batch_processor.py`
|
||||
|
||||
**Debug output** (when `--debug` flag used):
|
||||
- Create `debug_output/` directory if not exists
|
||||
- For each AI call, save response to file:
|
||||
- `debug_output/title_project{id}_tier{tier}_{n}_{timestamp}.txt`
|
||||
- `debug_output/outline_project{id}_tier{tier}_{n}_{timestamp}.json`
|
||||
- `debug_output/content_project{id}_tier{tier}_{n}_{timestamp}.html`
|
||||
- `debug_output/augmented_project{id}_tier{tier}_{n}_{timestamp}.html`
|
||||
- Also echo to console with `click.echo()`
|
||||
|
||||
**Normal output** (without `--debug`):
|
||||
- Always show title when generated: `"Generated title: {title}"`
|
||||
- Show word counts and status
|
||||
- Show progress counter `[n/total]`
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 6: Testing & Validation**
|
||||
|
||||
### Task 6.1: Create Unit Tests
|
||||
|
||||
#### `tests/unit/test_ai_client.py`
|
||||
```python
|
||||
def test_generate_completion_success():
|
||||
"""Test successful AI completion"""
|
||||
pass
|
||||
|
||||
def test_generate_completion_json_mode():
|
||||
"""Test JSON mode returns valid JSON"""
|
||||
pass
|
||||
|
||||
def test_generate_completion_retry_on_network_error():
|
||||
"""Test retry logic for network errors"""
|
||||
pass
|
||||
```
|
||||
|
||||
#### `tests/unit/test_content_generator.py`
|
||||
```python
|
||||
def test_generate_title():
|
||||
"""Test title generation with mocked AI response"""
|
||||
pass
|
||||
|
||||
def test_generate_outline_valid_structure():
|
||||
"""Test outline generation returns valid JSON with min h2/h3"""
|
||||
pass
|
||||
|
||||
def test_generate_content_html_fragment():
|
||||
"""Test content is HTML fragment (no <html> tag)"""
|
||||
pass
|
||||
|
||||
def test_validate_word_count():
|
||||
"""Test word count validation with various HTML inputs"""
|
||||
pass
|
||||
|
||||
def test_augment_content_called_once():
|
||||
"""Test augmentation only called once"""
|
||||
pass
|
||||
```
|
||||
|
||||
#### `tests/unit/test_job_config.py`
|
||||
```python
|
||||
def test_load_job_config_valid():
|
||||
"""Test loading valid job file"""
|
||||
pass
|
||||
|
||||
def test_tier_defaults_applied():
|
||||
"""Test defaults applied when not in job file"""
|
||||
pass
|
||||
|
||||
def test_multiple_jobs_in_file():
|
||||
"""Test parsing file with multiple jobs"""
|
||||
pass
|
||||
```
|
||||
|
||||
#### `tests/unit/test_batch_processor.py`
|
||||
```python
|
||||
def test_process_job_success():
|
||||
"""Test successful batch processing"""
|
||||
pass
|
||||
|
||||
def test_process_job_with_augmentation():
|
||||
"""Test articles below min word count are augmented"""
|
||||
pass
|
||||
|
||||
def test_process_job_continue_on_error():
|
||||
"""Test continue_on_error flag behavior"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6.2: Create Integration Test
|
||||
**File**: `tests/integration/test_generate_batch.py`
|
||||
|
||||
```python
|
||||
def test_generate_batch_end_to_end(test_db, mock_ai_client):
|
||||
"""
|
||||
End-to-end test:
|
||||
1. Create test project in DB
|
||||
2. Create test job file
|
||||
3. Run batch processor
|
||||
4. Verify GeneratedContent records created
|
||||
5. Verify word counts within range
|
||||
6. Verify HTML structure
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6.3: Create Example Job Files
|
||||
|
||||
#### `jobs/example_tier1_batch.json`
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
(Uses all defaults for tier1)
|
||||
|
||||
#### `jobs/example_multi_tier_batch.json`
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 2200,
|
||||
"max_word_count": 2600
|
||||
},
|
||||
"tier2": {
|
||||
"count": 10
|
||||
},
|
||||
"tier3": {
|
||||
"count": 15,
|
||||
"max_h2_tags": 4
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"project_id": 2,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### `jobs/README.md`
|
||||
Document job file format and examples
|
||||
|
||||
---
|
||||
|
||||
## **PHASE 7: Cleanup & Deprecation**
|
||||
|
||||
### Task 7.1: Remove Old ContentRuleEngine
|
||||
**Action**: Delete or gut `src/generation/rule_engine.py`
|
||||
|
||||
Only keep if it has reusable utilities. Otherwise remove entirely.
|
||||
|
||||
---
|
||||
|
||||
### Task 7.2: Remove Old Validator Logic
|
||||
**Action**: Review `src/generation/validator.py` (if exists)
|
||||
|
||||
Remove any strict CORA validation beyond word count. Keep only simple validation utilities.
|
||||
|
||||
---
|
||||
|
||||
### Task 7.3: Update Documentation
|
||||
**Files to update**:
|
||||
- `docs/stories/story-2.2. simplified-ai-content-generation.md` - Status to "In Progress" → "Done"
|
||||
- `docs/architecture/workflows.md` - Document simplified generation flow
|
||||
- `docs/architecture/components.md` - Update generation component description
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order Recommendation
|
||||
|
||||
1. **Phase 1** (Data Layer) - Required foundation
|
||||
2. **Phase 2** (AI Client) - Required for generation
|
||||
3. **Phase 3** (Core Logic) - Implement one stage at a time, test each
|
||||
4. **Phase 4** (Batch Processing) - Orchestrate stages
|
||||
5. **Phase 5** (CLI) - Make accessible to users
|
||||
6. **Phase 6** (Testing) - Can be done in parallel with implementation
|
||||
7. **Phase 7** (Cleanup) - Final polish
|
||||
|
||||
**Estimated effort**:
|
||||
- Phase 1-2: 4-6 hours
|
||||
- Phase 3: 6-8 hours
|
||||
- Phase 4: 3-4 hours
|
||||
- Phase 5: 2-3 hours
|
||||
- Phase 6: 4-6 hours
|
||||
- Phase 7: 1-2 hours
|
||||
- **Total**: 20-29 hours
|
||||
|
||||
---
|
||||
|
||||
## Critical Dev Notes
|
||||
|
||||
### OpenRouter Specifics
|
||||
- API key from environment: `OPENROUTER_API_KEY`
|
||||
- Model format: `"provider/model-name"`
|
||||
- Supports OpenAI SDK drop-in replacement
|
||||
- Rate limits vary by model (check OpenRouter docs)
|
||||
|
||||
### HTML Fragment Format
|
||||
Content generation returns HTML like:
|
||||
```html
|
||||
<h2>Main Topic</h2>
|
||||
<p>Introduction paragraph with relevant keywords and entities.</p>
|
||||
<h3>Subtopic One</h3>
|
||||
<p>Detailed content about subtopic.</p>
|
||||
<h3>Subtopic Two</h3>
|
||||
<p>More detailed content.</p>
|
||||
<h2>Second Main Topic</h2>
|
||||
<p>Content continues...</p>
|
||||
```
|
||||
|
||||
**No document structure**: No `<!DOCTYPE>`, `<html>`, `<head>`, or `<body>` tags.
|
||||
|
||||
### Word Count Method
|
||||
```python
|
||||
import re
|
||||
from html import unescape
|
||||
|
||||
def count_words(html_content: str) -> int:
|
||||
# Strip HTML tags
|
||||
text = re.sub(r'<[^>]+>', '', html_content)
|
||||
# Unescape HTML entities
|
||||
text = unescape(text)
|
||||
# Split and count
|
||||
words = text.split()
|
||||
return len(words)
|
||||
```
|
||||
|
||||
### Debug Output Directory
|
||||
- Create `debug_output/` at project root if not exists
|
||||
- Add to `.gitignore`
|
||||
- Filename format: `{stage}_project{id}_tier{tier}_article{n}_{timestamp}.{ext}`
|
||||
- Example: `title_project5_tier1_article3_20251020_143022.txt`
|
||||
|
||||
### Tier Constants Location
|
||||
Define in `src/generation/job_config.py` as module-level constant for easy reference.
|
||||
|
||||
### Future Extensibility
|
||||
Job file structure designed to support:
|
||||
- Custom interlinking rules (Story 2.4+)
|
||||
- Template selection (Story 3.x)
|
||||
- Deployment targets (Story 4.x)
|
||||
- SEO metadata overrides
|
||||
|
||||
Keep job parsing flexible to add new fields without breaking existing jobs.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Test Mocking
|
||||
Mock `AIClient.generate_completion()` to return realistic HTML:
|
||||
```python
|
||||
@pytest.fixture
|
||||
def mock_title_response():
|
||||
return "The Ultimate Guide to Sustainable Gardening in 2025"
|
||||
|
||||
@pytest.fixture
|
||||
def mock_outline_response():
|
||||
return {
|
||||
"outline": [
|
||||
{"h2": "Getting Started", "h3": ["Tools", "Planning"]},
|
||||
{"h2": "Best Practices", "h3": ["Watering", "Composting"]}
|
||||
]
|
||||
}
|
||||
|
||||
@pytest.fixture
|
||||
def mock_content_response():
|
||||
return """<h2>Getting Started</h2>
|
||||
<p>Sustainable gardening begins with proper planning...</p>
|
||||
<h3>Tools</h3>
|
||||
<p>Essential tools include...</p>"""
|
||||
```
|
||||
|
||||
### Integration Test Database
|
||||
Use `conftest.py` fixture with in-memory SQLite and test data:
|
||||
```python
|
||||
@pytest.fixture
|
||||
def test_project(test_db):
|
||||
project_repo = ProjectRepository(test_db)
|
||||
return project_repo.create(
|
||||
user_id=1,
|
||||
name="Test Project",
|
||||
data={
|
||||
"main_keyword": "sustainable gardening",
|
||||
"entities": ["composting", "organic soil"],
|
||||
"related_searches": ["how to compost", "organic gardening tips"]
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Story is complete when:
|
||||
1. All database models and repositories implemented
|
||||
2. AIClient successfully calls OpenRouter API
|
||||
3. Three-stage generation pipeline works end-to-end
|
||||
4. Batch processor handles multiple jobs/tiers
|
||||
5. CLI command `generate-batch` functional
|
||||
6. Debug output saves to `debug_output/` when `--debug` used
|
||||
7. All unit tests pass
|
||||
8. Integration test demonstrates full workflow
|
||||
9. Example job files work correctly
|
||||
10. Documentation updated
|
||||
|
||||
**Acceptance**: Run `generate-batch` on real project, verify content saved to database with correct word count and structure.
|
||||
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
# Story 2.2: Simplified AI Content Generation via Batch Job
|
||||
|
||||
## Status
|
||||
Completed
|
||||
|
||||
## Story
|
||||
**As a** User,
|
||||
**I want** to control AI content generation via a batch file that specifies word count and heading limits,
|
||||
**so that** I can easily create topically relevant articles without unnecessary complexity or rigid validation.
|
||||
|
||||
## Acceptance Criteria
|
||||
1. **Batch Job Control:** The `generate-batch` command accepts a JSON job file that specifies `min_word_count`, `max_word_count`, `max_h2_tags`, and `max_h3_tags` for each tier.
|
||||
2. **Three-Stage Generation:** The system uses a simple three-stage pipeline:
|
||||
* Generates a title using the project's SEO data.
|
||||
* Generates an outline based on the title, SEO data, and the `max_h2`/`max_h3` limits from the job file.
|
||||
* Generates the full article content based on the validated outline.
|
||||
3. **SEO Data Integration:** The generation process for all stages is informed by the project's `keyword`, `entities`, and `related_searches` to ensure topical relevance.
|
||||
4. **Word Count Validation:** After generation, the system validates the content *only* against the `min_word_count` and `max_word_count` specified in the job file.
|
||||
5. **Simple Augmentation:** If the generated content is below `min_word_count`, the system makes **one** attempt to append additional content using a simple "expand on this article" prompt.
|
||||
6. **Database Storage:** The final generated title, outline, and content are stored in the `GeneratedContent` table.
|
||||
7. **CLI Execution:** The `generate-batch` command successfully runs the job, logs progress to the console, and indicates when the process is complete.
|
||||
|
||||
## Dev Notes
|
||||
* **Objective:** This story replaces the previous, overly complex stories 2.2 and 2.3. The goal is maximum simplicity and user control via the job file.
|
||||
* **Key Change:** Remove the entire `ContentRuleEngine` and all strict CORA validation logic. The only validation required is a final word count check.
|
||||
* **Job File is King:** All operational parameters (`min_word_count`, `max_word_count`, `max_h2_tags`, `max_h3_tags`) must be read from the job file for each tier being processed.
|
||||
* **Augmentation:** Keep it simple. If `word_count < min_word_count`, make a single API call to the AI with a prompt like: "Please expand on the following article to add more detail and depth, ensuring you maintain the existing topical focus. Here is the article: {content}". Do not create a complex augmentation system.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
See **[story-2.2-task-breakdown.md](story-2.2-task-breakdown.md)** for detailed implementation tasks.
|
||||
|
||||
The task breakdown is organized into 7 phases:
|
||||
1. **Phase 1**: Data Model & Schema Design (GeneratedContent table, repositories, job file schema)
|
||||
2. **Phase 2**: AI Client & Prompt Management (OpenRouter integration, prompt templates)
|
||||
3. **Phase 3**: Core Generation Pipeline (title, outline, content generation with validation)
|
||||
4. **Phase 4**: Batch Processing (job config parser, batch processor, error handling)
|
||||
5. **Phase 5**: CLI Integration (generate-batch command, progress logging, debug output)
|
||||
6. **Phase 6**: Testing & Validation (unit tests, integration tests, example job files)
|
||||
7. **Phase 7**: Cleanup & Deprecation (remove old rule engine and validators)
|
||||
|
|
@ -2,7 +2,7 @@
|
|||
DATABASE_URL=sqlite:///./content_automation.db
|
||||
|
||||
# AI Service Configuration (OpenRouter)
|
||||
AI_API_KEY=your_openrouter_api_key_here
|
||||
OPENROUTER_API_KEY=your_openrouter_api_key_here
|
||||
AI_API_BASE_URL=https://openrouter.ai/api/v1
|
||||
AI_MODEL=anthropic/claude-3.5-sonnet
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,16 @@
|
|||
[33m5b5bd1b[m[33m ([m[1;36mHEAD[m[33m -> [m[1;32mfeature/tier-word-count-override[m[33m)[m Add tier-specific word count and outline controls
|
||||
[33m3063fc4[m[33m ([m[1;31morigin/main[m[33m, [m[1;31morigin/HEAD[m[33m, [m[1;32mmain[m[33m)[m Story 2.3 - content generation script nightmare alomst done - fixed (maybe) outline too big issue
|
||||
[33mb6b0acf[m Story 2.3 - content generation script nightmare alomst done - pre-fix outline too big issue
|
||||
[33mf73b070[m[33m ([m[1;31mgithub/main[m[33m)[m Story 2.3 - content generation script finished - fix ci
|
||||
[33me2afabb[m Story 2.3 - content generation script finished
|
||||
[33m0069e6e[m Story 2.2 - rule engine finished
|
||||
[33md81537f[m Story 2.1 finished
|
||||
[33m02dd5a3[m Story 2.1 finished
|
||||
[33m29ecaec[m Story 1.7 finished
|
||||
[33mda797c2[m Story 1.6 finished - added sync
|
||||
[33m4cada9d[m Story 1.6 finished
|
||||
[33mb6e495e[m feat: Story 1.5 - CLI User Management
|
||||
[33m0a223e2[m Complete Story 1.4: Internal API Foundation
|
||||
[33m8641bca[m Complete Epic 1 Stories 1.1-1.3: Foundation, Database, and Authentication
|
||||
[33m70b9de2[m feat: Complete Story 1.1 - Project Initialization & Configuration
|
||||
[33m31b9580[m Initial commit: Project structure and planning documents
|
||||
|
|
@ -0,0 +1,179 @@
|
|||
# Job File Format
|
||||
|
||||
Job files define batch content generation parameters using JSON format.
|
||||
|
||||
## Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Fields
|
||||
|
||||
### Job Level
|
||||
- `project_id` (required): The project ID to generate content for
|
||||
- `tiers` (required): Dictionary of tier configurations
|
||||
|
||||
### Tier Level
|
||||
- `count` (required): Number of articles to generate for this tier
|
||||
- `min_word_count` (optional): Minimum word count (uses defaults if not specified)
|
||||
- `max_word_count` (optional): Maximum word count (uses defaults if not specified)
|
||||
- `min_h2_tags` (optional): Minimum H2 headings (uses defaults if not specified)
|
||||
- `max_h2_tags` (optional): Maximum H2 headings (uses defaults if not specified)
|
||||
- `min_h3_tags` (optional): Minimum H3 subheadings total (uses defaults if not specified)
|
||||
- `max_h3_tags` (optional): Maximum H3 subheadings total (uses defaults if not specified)
|
||||
|
||||
## Tier Defaults
|
||||
|
||||
If tier parameters are not specified, these defaults are used:
|
||||
|
||||
### tier1
|
||||
- `min_word_count`: 2000
|
||||
- `max_word_count`: 2500
|
||||
- `min_h2_tags`: 3
|
||||
- `max_h2_tags`: 5
|
||||
- `min_h3_tags`: 5
|
||||
- `max_h3_tags`: 10
|
||||
|
||||
### tier2
|
||||
- `min_word_count`: 1500
|
||||
- `max_word_count`: 2000
|
||||
- `min_h2_tags`: 2
|
||||
- `max_h2_tags`: 4
|
||||
- `min_h3_tags`: 3
|
||||
- `max_h3_tags`: 8
|
||||
|
||||
### tier3
|
||||
- `min_word_count`: 1000
|
||||
- `max_word_count`: 1500
|
||||
- `min_h2_tags`: 2
|
||||
- `max_h2_tags`: 3
|
||||
- `min_h3_tags`: 2
|
||||
- `max_h3_tags`: 6
|
||||
|
||||
## Examples
|
||||
|
||||
### Simple: Single Tier with Defaults
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Word Counts
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 3,
|
||||
"min_word_count": 2500,
|
||||
"max_word_count": 3000
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-Tier
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
},
|
||||
"tier2": {
|
||||
"count": 10
|
||||
},
|
||||
"tier3": {
|
||||
"count": 15
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Multiple Projects
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"project_id": 2,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 3
|
||||
},
|
||||
"tier2": {
|
||||
"count": 8
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Run batch generation with:
|
||||
|
||||
```bash
|
||||
python main.py generate-batch --job-file jobs/example_tier1_batch.json --username youruser --password yourpass
|
||||
```
|
||||
|
||||
### Options
|
||||
- `--job-file, -j`: Path to job JSON file (required)
|
||||
- `--username, -u`: Username for authentication
|
||||
- `--password, -p`: Password for authentication
|
||||
- `--debug`: Save AI responses to debug_output/
|
||||
- `--continue-on-error`: Continue processing if article generation fails
|
||||
- `--model, -m`: AI model to use (default: gpt-4o-mini)
|
||||
|
||||
### Debug Mode
|
||||
|
||||
When using `--debug`, AI responses are saved to `debug_output/`:
|
||||
- `title_project{id}_tier{tier}_article{n}_{timestamp}.txt`
|
||||
- `outline_project{id}_tier{tier}_article{n}_{timestamp}.json`
|
||||
- `content_project{id}_tier{tier}_article{n}_{timestamp}.html`
|
||||
- `augmented_project{id}_tier{tier}_article{n}_{timestamp}.html` (if augmented)
|
||||
|
||||
|
|
@ -0,0 +1,30 @@
|
|||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 2200,
|
||||
"max_word_count": 2600
|
||||
},
|
||||
"tier2": {
|
||||
"count": 10
|
||||
},
|
||||
"tier3": {
|
||||
"count": 15,
|
||||
"max_h2_tags": 4
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"project_id": 2,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 1,
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 1,
|
||||
"min_word_count": 500,
|
||||
"max_word_count": 800,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,27 @@
|
|||
import sys
|
||||
from pathlib import Path
|
||||
project_root = Path(__file__).parent.parent
|
||||
sys.path.insert(0, str(project_root))
|
||||
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import UserRepository
|
||||
from src.auth.service import AuthService
|
||||
|
||||
db_manager.initialize()
|
||||
session = db_manager.get_session()
|
||||
|
||||
try:
|
||||
user_repo = UserRepository(session)
|
||||
auth_service = AuthService(user_repo)
|
||||
|
||||
user = auth_service.create_user_with_hashed_password(
|
||||
username="admin",
|
||||
password="admin1234",
|
||||
role="Admin"
|
||||
)
|
||||
|
||||
print(f"Admin user created: {user.username}")
|
||||
finally:
|
||||
session.close()
|
||||
db_manager.close()
|
||||
|
||||
|
|
@ -16,6 +16,11 @@ from src.deployment.bunnynet import (
|
|||
BunnyNetResourceConflictError
|
||||
)
|
||||
from src.ingestion.parser import CORAParser, CORAParseError
|
||||
from src.generation.ai_client import AIClient, PromptManager
|
||||
from src.generation.service import ContentGenerator
|
||||
from src.generation.batch_processor import BatchProcessor
|
||||
from src.database.repositories import GeneratedContentRepository
|
||||
import os
|
||||
|
||||
|
||||
def authenticate_admin(username: str, password: str) -> Optional[User]:
|
||||
|
|
@ -871,5 +876,84 @@ def list_projects(username: Optional[str], password: Optional[str]):
|
|||
raise click.Abort()
|
||||
|
||||
|
||||
@app.command("generate-batch")
|
||||
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True),
|
||||
help='Path to job JSON file')
|
||||
@click.option('--username', '-u', help='Username for authentication')
|
||||
@click.option('--password', '-p', help='Password for authentication')
|
||||
@click.option('--debug', is_flag=True, help='Save AI responses to debug_output/')
|
||||
@click.option('--continue-on-error', is_flag=True,
|
||||
help='Continue processing if article generation fails')
|
||||
@click.option('--model', '-m', default='gpt-4o-mini',
|
||||
help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
|
||||
def generate_batch(
|
||||
job_file: str,
|
||||
username: Optional[str],
|
||||
password: Optional[str],
|
||||
debug: bool,
|
||||
continue_on_error: bool,
|
||||
model: str
|
||||
):
|
||||
"""Generate content batch from job file"""
|
||||
try:
|
||||
if not username or not password:
|
||||
username, password = prompt_admin_credentials()
|
||||
|
||||
session = db_manager.get_session()
|
||||
try:
|
||||
user_repo = UserRepository(session)
|
||||
auth_service = AuthService(user_repo)
|
||||
|
||||
user = auth_service.authenticate_user(username, password)
|
||||
if not user:
|
||||
click.echo("Error: Authentication failed", err=True)
|
||||
raise click.Abort()
|
||||
|
||||
click.echo(f"Authenticated as: {user.username} ({user.role})")
|
||||
|
||||
api_key = os.getenv("OPENROUTER_API_KEY")
|
||||
if not api_key:
|
||||
click.echo("Error: OPENROUTER_API_KEY not found in environment", err=True)
|
||||
click.echo("Please set OPENROUTER_API_KEY in your .env file", err=True)
|
||||
raise click.Abort()
|
||||
|
||||
click.echo(f"Initializing AI client with model: {model}")
|
||||
ai_client = AIClient(api_key=api_key, model=model)
|
||||
prompt_manager = PromptManager()
|
||||
|
||||
project_repo = ProjectRepository(session)
|
||||
content_repo = GeneratedContentRepository(session)
|
||||
|
||||
content_generator = ContentGenerator(
|
||||
ai_client=ai_client,
|
||||
prompt_manager=prompt_manager,
|
||||
project_repo=project_repo,
|
||||
content_repo=content_repo
|
||||
)
|
||||
|
||||
batch_processor = BatchProcessor(
|
||||
content_generator=content_generator,
|
||||
content_repo=content_repo,
|
||||
project_repo=project_repo
|
||||
)
|
||||
|
||||
click.echo(f"\nProcessing job file: {job_file}")
|
||||
if debug:
|
||||
click.echo("Debug mode: AI responses will be saved to debug_output/\n")
|
||||
|
||||
batch_processor.process_job(
|
||||
job_file_path=job_file,
|
||||
debug=debug,
|
||||
continue_on_error=continue_on_error
|
||||
)
|
||||
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
except Exception as e:
|
||||
click.echo(f"Error processing batch: {e}", err=True)
|
||||
raise click.Abort()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
app()
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@ SQLAlchemy database models
|
|||
"""
|
||||
|
||||
from datetime import datetime, timezone
|
||||
from typing import Literal, Optional
|
||||
from typing import Optional
|
||||
from sqlalchemy import String, Integer, DateTime, Float, ForeignKey, JSON, Text
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
|
||||
|
||||
|
|
@ -115,4 +115,29 @@ class Project(Base):
|
|||
)
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"<Project(id={self.id}, name='{self.name}', main_keyword='{self.main_keyword}', user_id={self.user_id})>"
|
||||
return f"<Project(id={self.id}, name='{self.name}', main_keyword='{self.main_keyword}', user_id={self.user_id})>"
|
||||
|
||||
|
||||
class GeneratedContent(Base):
|
||||
"""Generated content model for AI-created articles"""
|
||||
__tablename__ = "generated_content"
|
||||
|
||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
project_id: Mapped[int] = mapped_column(Integer, ForeignKey('projects.id'), nullable=False, index=True)
|
||||
tier: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
|
||||
keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
|
||||
title: Mapped[str] = mapped_column(Text, nullable=False)
|
||||
outline: Mapped[dict] = mapped_column(JSON, nullable=False)
|
||||
content: Mapped[str] = mapped_column(Text, nullable=False)
|
||||
word_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
status: Mapped[str] = mapped_column(String(20), nullable=False)
|
||||
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
|
||||
updated_at: Mapped[datetime] = mapped_column(
|
||||
DateTime,
|
||||
default=datetime.utcnow,
|
||||
onupdate=datetime.utcnow,
|
||||
nullable=False
|
||||
)
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"<GeneratedContent(id={self.id}, project_id={self.project_id}, tier='{self.tier}', status='{self.status}')>"
|
||||
|
|
@ -6,7 +6,7 @@ from typing import Optional, List, Dict, Any
|
|||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy.exc import IntegrityError
|
||||
from src.database.interfaces import IUserRepository, ISiteDeploymentRepository, IProjectRepository
|
||||
from src.database.models import User, SiteDeployment, Project
|
||||
from src.database.models import User, SiteDeployment, Project, GeneratedContent
|
||||
|
||||
|
||||
class UserRepository(IUserRepository):
|
||||
|
|
@ -373,3 +373,88 @@ class ProjectRepository(IProjectRepository):
|
|||
self.session.commit()
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
class GeneratedContentRepository:
|
||||
"""Repository for GeneratedContent data access"""
|
||||
|
||||
def __init__(self, session: Session):
|
||||
self.session = session
|
||||
|
||||
def create(
|
||||
self,
|
||||
project_id: int,
|
||||
tier: str,
|
||||
keyword: str,
|
||||
title: str,
|
||||
outline: dict,
|
||||
content: str,
|
||||
word_count: int,
|
||||
status: str
|
||||
) -> GeneratedContent:
|
||||
"""
|
||||
Create a new generated content record
|
||||
|
||||
Args:
|
||||
project_id: The project ID this content belongs to
|
||||
tier: Content tier (tier1, tier2, tier3)
|
||||
keyword: The keyword used for generation
|
||||
title: Generated title
|
||||
outline: Generated outline (JSON)
|
||||
content: Generated HTML content
|
||||
word_count: Final word count
|
||||
status: Status (generated, augmented, failed)
|
||||
|
||||
Returns:
|
||||
The created GeneratedContent object
|
||||
"""
|
||||
content_record = GeneratedContent(
|
||||
project_id=project_id,
|
||||
tier=tier,
|
||||
keyword=keyword,
|
||||
title=title,
|
||||
outline=outline,
|
||||
content=content,
|
||||
word_count=word_count,
|
||||
status=status
|
||||
)
|
||||
|
||||
self.session.add(content_record)
|
||||
self.session.commit()
|
||||
self.session.refresh(content_record)
|
||||
return content_record
|
||||
|
||||
def get_by_id(self, content_id: int) -> Optional[GeneratedContent]:
|
||||
"""Get content by ID"""
|
||||
return self.session.query(GeneratedContent).filter(GeneratedContent.id == content_id).first()
|
||||
|
||||
def get_by_project_id(self, project_id: int) -> List[GeneratedContent]:
|
||||
"""Get all content for a project"""
|
||||
return self.session.query(GeneratedContent).filter(GeneratedContent.project_id == project_id).all()
|
||||
|
||||
def get_by_project_and_tier(self, project_id: int, tier: str) -> List[GeneratedContent]:
|
||||
"""Get content for a project and tier"""
|
||||
return self.session.query(GeneratedContent).filter(
|
||||
GeneratedContent.project_id == project_id,
|
||||
GeneratedContent.tier == tier
|
||||
).all()
|
||||
|
||||
def get_by_keyword(self, keyword: str) -> List[GeneratedContent]:
|
||||
"""Get content by keyword"""
|
||||
return self.session.query(GeneratedContent).filter(GeneratedContent.keyword == keyword).all()
|
||||
|
||||
def update(self, content: GeneratedContent) -> GeneratedContent:
|
||||
"""Update existing content"""
|
||||
self.session.add(content)
|
||||
self.session.commit()
|
||||
self.session.refresh(content)
|
||||
return content
|
||||
|
||||
def delete(self, content_id: int) -> bool:
|
||||
"""Delete content by ID"""
|
||||
content = self.get_by_id(content_id)
|
||||
if content:
|
||||
self.session.delete(content)
|
||||
self.session.commit()
|
||||
return True
|
||||
return False
|
||||
|
|
|
|||
|
|
@ -0,0 +1,146 @@
|
|||
"""
|
||||
OpenRouter AI client and prompt management
|
||||
"""
|
||||
|
||||
import time
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, Any
|
||||
from openai import OpenAI, RateLimitError, APIError
|
||||
from src.core.config import get_config
|
||||
|
||||
AVAILABLE_MODELS = {
|
||||
"gpt-4o-mini": "openai/gpt-4o-mini",
|
||||
"claude-sonnet-4.5": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
|
||||
|
||||
class AIClient:
|
||||
"""OpenRouter API client using OpenAI SDK"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_key: str,
|
||||
model: str,
|
||||
base_url: str = "https://openrouter.ai/api/v1"
|
||||
):
|
||||
self.client = OpenAI(api_key=api_key, base_url=base_url)
|
||||
|
||||
if model in AVAILABLE_MODELS:
|
||||
self.model = AVAILABLE_MODELS[model]
|
||||
else:
|
||||
self.model = model
|
||||
|
||||
def generate_completion(
|
||||
self,
|
||||
prompt: str,
|
||||
system_message: Optional[str] = None,
|
||||
max_tokens: int = 4000,
|
||||
temperature: float = 0.7,
|
||||
json_mode: bool = False
|
||||
) -> str:
|
||||
"""
|
||||
Generate completion from OpenRouter API
|
||||
|
||||
Args:
|
||||
prompt: User prompt text
|
||||
system_message: Optional system message
|
||||
max_tokens: Maximum tokens to generate
|
||||
temperature: Sampling temperature (0-1)
|
||||
json_mode: If True, requests JSON response format
|
||||
|
||||
Returns:
|
||||
Generated text completion
|
||||
"""
|
||||
messages = []
|
||||
if system_message:
|
||||
messages.append({"role": "system", "content": system_message})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
kwargs: Dict[str, Any] = {
|
||||
"model": self.model,
|
||||
"messages": messages,
|
||||
"max_tokens": max_tokens,
|
||||
"temperature": temperature
|
||||
}
|
||||
|
||||
if json_mode:
|
||||
kwargs["response_format"] = {"type": "json_object"}
|
||||
|
||||
retries = 3
|
||||
for attempt in range(retries):
|
||||
try:
|
||||
response = self.client.chat.completions.create(**kwargs)
|
||||
content = response.choices[0].message.content or ""
|
||||
# Debug: print first 200 chars if json_mode
|
||||
if json_mode:
|
||||
print(f"[DEBUG] AI Response (first 200 chars): {content[:200]}")
|
||||
return content
|
||||
|
||||
except RateLimitError as e:
|
||||
if attempt < retries - 1:
|
||||
wait_time = 2 ** attempt
|
||||
print(f"Rate limit hit. Retrying in {wait_time}s...")
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
raise
|
||||
|
||||
except APIError as e:
|
||||
if attempt < retries - 1 and "network" in str(e).lower():
|
||||
wait_time = 2 ** attempt
|
||||
print(f"Network error. Retrying in {wait_time}s...")
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
raise
|
||||
|
||||
return ""
|
||||
|
||||
|
||||
class PromptManager:
|
||||
"""Manages loading and formatting of prompt templates"""
|
||||
|
||||
def __init__(self, prompts_dir: str = "src/generation/prompts"):
|
||||
self.prompts_dir = Path(prompts_dir)
|
||||
self.prompts: Dict[str, dict] = {}
|
||||
|
||||
def load_prompt(self, prompt_name: str) -> dict:
|
||||
"""Load prompt from JSON file"""
|
||||
if prompt_name in self.prompts:
|
||||
return self.prompts[prompt_name]
|
||||
|
||||
prompt_file = self.prompts_dir / f"{prompt_name}.json"
|
||||
if not prompt_file.exists():
|
||||
raise FileNotFoundError(f"Prompt file not found: {prompt_file}")
|
||||
|
||||
with open(prompt_file, 'r', encoding='utf-8') as f:
|
||||
prompt_data = json.load(f)
|
||||
|
||||
self.prompts[prompt_name] = prompt_data
|
||||
return prompt_data
|
||||
|
||||
def format_prompt(self, prompt_name: str, **kwargs) -> tuple[str, str]:
|
||||
"""
|
||||
Format prompt with variables
|
||||
|
||||
Args:
|
||||
prompt_name: Name of the prompt template
|
||||
**kwargs: Variables to inject into the template
|
||||
|
||||
Returns:
|
||||
Tuple of (system_message, user_prompt)
|
||||
"""
|
||||
prompt_data = self.load_prompt(prompt_name)
|
||||
|
||||
system_message = prompt_data.get("system_message", "")
|
||||
user_prompt = prompt_data.get("user_prompt", "")
|
||||
|
||||
if system_message:
|
||||
system_message = system_message.format(**kwargs)
|
||||
|
||||
user_prompt = user_prompt.format(**kwargs)
|
||||
|
||||
return system_message, user_prompt
|
||||
|
||||
|
|
@ -0,0 +1,219 @@
|
|||
"""
|
||||
Batch processor for content generation jobs
|
||||
"""
|
||||
|
||||
from typing import Dict, Any
|
||||
import click
|
||||
from src.generation.service import ContentGenerator
|
||||
from src.generation.job_config import JobConfig, Job, TierConfig
|
||||
from src.database.repositories import GeneratedContentRepository, ProjectRepository
|
||||
|
||||
|
||||
class BatchProcessor:
|
||||
"""Processes batch content generation jobs"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
content_generator: ContentGenerator,
|
||||
content_repo: GeneratedContentRepository,
|
||||
project_repo: ProjectRepository
|
||||
):
|
||||
self.generator = content_generator
|
||||
self.content_repo = content_repo
|
||||
self.project_repo = project_repo
|
||||
self.stats = {
|
||||
"total_jobs": 0,
|
||||
"processed_jobs": 0,
|
||||
"total_articles": 0,
|
||||
"generated_articles": 0,
|
||||
"augmented_articles": 0,
|
||||
"failed_articles": 0
|
||||
}
|
||||
|
||||
def process_job(
|
||||
self,
|
||||
job_file_path: str,
|
||||
debug: bool = False,
|
||||
continue_on_error: bool = False
|
||||
):
|
||||
"""
|
||||
Process all jobs in job file
|
||||
|
||||
Args:
|
||||
job_file_path: Path to job JSON file
|
||||
debug: If True, save AI responses to debug_output/
|
||||
continue_on_error: If True, continue on article generation failure
|
||||
"""
|
||||
job_config = JobConfig(job_file_path)
|
||||
jobs = job_config.get_jobs()
|
||||
|
||||
self.stats["total_jobs"] = len(jobs)
|
||||
|
||||
for job_idx, job in enumerate(jobs, 1):
|
||||
try:
|
||||
self._process_single_job(job, job_idx, debug, continue_on_error)
|
||||
self.stats["processed_jobs"] += 1
|
||||
except Exception as e:
|
||||
click.echo(f"Error processing job {job_idx}: {e}")
|
||||
if not continue_on_error:
|
||||
raise
|
||||
|
||||
self._print_summary()
|
||||
|
||||
def _process_single_job(
|
||||
self,
|
||||
job: Job,
|
||||
job_idx: int,
|
||||
debug: bool,
|
||||
continue_on_error: bool
|
||||
):
|
||||
"""Process a single job"""
|
||||
project = self.project_repo.get_by_id(job.project_id)
|
||||
if not project:
|
||||
raise ValueError(f"Project {job.project_id} not found")
|
||||
|
||||
click.echo(f"\nProcessing Job {job_idx}/{self.stats['total_jobs']}: Project ID {job.project_id}")
|
||||
|
||||
for tier_name, tier_config in job.tiers.items():
|
||||
self._process_tier(
|
||||
job.project_id,
|
||||
tier_name,
|
||||
tier_config,
|
||||
debug,
|
||||
continue_on_error
|
||||
)
|
||||
|
||||
def _process_tier(
|
||||
self,
|
||||
project_id: int,
|
||||
tier_name: str,
|
||||
tier_config: TierConfig,
|
||||
debug: bool,
|
||||
continue_on_error: bool
|
||||
):
|
||||
"""Process all articles for a tier"""
|
||||
click.echo(f" {tier_name}: Generating {tier_config.count} articles")
|
||||
|
||||
project = self.project_repo.get_by_id(project_id)
|
||||
keyword = project.main_keyword
|
||||
|
||||
for article_num in range(1, tier_config.count + 1):
|
||||
self.stats["total_articles"] += 1
|
||||
|
||||
try:
|
||||
self._generate_single_article(
|
||||
project_id,
|
||||
tier_name,
|
||||
tier_config,
|
||||
article_num,
|
||||
keyword,
|
||||
debug
|
||||
)
|
||||
self.stats["generated_articles"] += 1
|
||||
|
||||
except Exception as e:
|
||||
self.stats["failed_articles"] += 1
|
||||
import traceback
|
||||
click.echo(f" [{article_num}/{tier_config.count}] FAILED: {e}")
|
||||
click.echo(f" Traceback: {traceback.format_exc()}")
|
||||
|
||||
try:
|
||||
self.content_repo.create(
|
||||
project_id=project_id,
|
||||
tier=tier_name,
|
||||
keyword=keyword,
|
||||
title="Failed Generation",
|
||||
outline={"error": str(e)},
|
||||
content="",
|
||||
word_count=0,
|
||||
status="failed"
|
||||
)
|
||||
except Exception as db_error:
|
||||
click.echo(f" Failed to save error record: {db_error}")
|
||||
|
||||
if not continue_on_error:
|
||||
raise
|
||||
|
||||
def _generate_single_article(
|
||||
self,
|
||||
project_id: int,
|
||||
tier_name: str,
|
||||
tier_config: TierConfig,
|
||||
article_num: int,
|
||||
keyword: str,
|
||||
debug: bool
|
||||
):
|
||||
"""Generate a single article"""
|
||||
prefix = f" [{article_num}/{tier_config.count}]"
|
||||
|
||||
click.echo(f"{prefix} Generating title...")
|
||||
title = self.generator.generate_title(project_id, debug=debug)
|
||||
click.echo(f"{prefix} Generated title: \"{title}\"")
|
||||
|
||||
click.echo(f"{prefix} Generating outline...")
|
||||
outline = self.generator.generate_outline(
|
||||
project_id=project_id,
|
||||
title=title,
|
||||
min_h2=tier_config.min_h2_tags,
|
||||
max_h2=tier_config.max_h2_tags,
|
||||
min_h3=tier_config.min_h3_tags,
|
||||
max_h3=tier_config.max_h3_tags,
|
||||
debug=debug
|
||||
)
|
||||
|
||||
h2_count = len(outline["outline"])
|
||||
h3_count = sum(len(section.get("h3", [])) for section in outline["outline"])
|
||||
click.echo(f"{prefix} Generated outline: {h2_count} H2s, {h3_count} H3s")
|
||||
|
||||
click.echo(f"{prefix} Generating content...")
|
||||
content = self.generator.generate_content(
|
||||
project_id=project_id,
|
||||
title=title,
|
||||
outline=outline,
|
||||
min_word_count=tier_config.min_word_count,
|
||||
max_word_count=tier_config.max_word_count,
|
||||
debug=debug
|
||||
)
|
||||
|
||||
word_count = self.generator.count_words(content)
|
||||
click.echo(f"{prefix} Generated content: {word_count:,} words")
|
||||
|
||||
status = "generated"
|
||||
|
||||
if word_count < tier_config.min_word_count:
|
||||
click.echo(f"{prefix} Below minimum ({tier_config.min_word_count:,}), augmenting...")
|
||||
content = self.generator.augment_content(
|
||||
content=content,
|
||||
target_word_count=tier_config.min_word_count,
|
||||
debug=debug,
|
||||
project_id=project_id
|
||||
)
|
||||
word_count = self.generator.count_words(content)
|
||||
click.echo(f"{prefix} Augmented content: {word_count:,} words")
|
||||
status = "augmented"
|
||||
self.stats["augmented_articles"] += 1
|
||||
|
||||
saved_content = self.content_repo.create(
|
||||
project_id=project_id,
|
||||
tier=tier_name,
|
||||
keyword=keyword,
|
||||
title=title,
|
||||
outline=outline,
|
||||
content=content,
|
||||
word_count=word_count,
|
||||
status=status
|
||||
)
|
||||
|
||||
click.echo(f"{prefix} Saved (ID: {saved_content.id}, Status: {status})")
|
||||
|
||||
def _print_summary(self):
|
||||
"""Print job processing summary"""
|
||||
click.echo("\n" + "="*60)
|
||||
click.echo("SUMMARY")
|
||||
click.echo("="*60)
|
||||
click.echo(f"Jobs processed: {self.stats['processed_jobs']}/{self.stats['total_jobs']}")
|
||||
click.echo(f"Articles generated: {self.stats['generated_articles']}/{self.stats['total_articles']}")
|
||||
click.echo(f"Augmented: {self.stats['augmented_articles']}")
|
||||
click.echo(f"Failed: {self.stats['failed_articles']}")
|
||||
click.echo("="*60)
|
||||
|
||||
|
|
@ -0,0 +1,130 @@
|
|||
"""
|
||||
Job configuration parser for batch content generation
|
||||
"""
|
||||
|
||||
import json
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, Dict, Any
|
||||
from pathlib import Path
|
||||
|
||||
TIER_DEFAULTS = {
|
||||
"tier1": {
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
},
|
||||
"tier2": {
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
},
|
||||
"tier3": {
|
||||
"min_word_count": 1000,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class TierConfig:
|
||||
"""Configuration for a specific tier"""
|
||||
count: int
|
||||
min_word_count: int
|
||||
max_word_count: int
|
||||
min_h2_tags: int
|
||||
max_h2_tags: int
|
||||
min_h3_tags: int
|
||||
max_h3_tags: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class Job:
|
||||
"""Job definition for content generation"""
|
||||
project_id: int
|
||||
tiers: Dict[str, TierConfig]
|
||||
|
||||
|
||||
class JobConfig:
|
||||
"""Parser for job configuration files"""
|
||||
|
||||
def __init__(self, job_file_path: str):
|
||||
"""
|
||||
Load and parse job file, apply defaults
|
||||
|
||||
Args:
|
||||
job_file_path: Path to JSON job file
|
||||
"""
|
||||
self.job_file_path = Path(job_file_path)
|
||||
self.jobs: list[Job] = []
|
||||
self._load()
|
||||
|
||||
def _load(self):
|
||||
"""Load and parse the job file"""
|
||||
if not self.job_file_path.exists():
|
||||
raise FileNotFoundError(f"Job file not found: {self.job_file_path}")
|
||||
|
||||
with open(self.job_file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
if "jobs" not in data:
|
||||
raise ValueError("Job file must contain 'jobs' array")
|
||||
|
||||
for job_data in data["jobs"]:
|
||||
self._validate_job(job_data)
|
||||
job = self._parse_job(job_data)
|
||||
self.jobs.append(job)
|
||||
|
||||
def _validate_job(self, job_data: dict):
|
||||
"""Validate job structure"""
|
||||
if "project_id" not in job_data:
|
||||
raise ValueError("Job missing 'project_id'")
|
||||
|
||||
if "tiers" not in job_data:
|
||||
raise ValueError("Job missing 'tiers'")
|
||||
|
||||
if not isinstance(job_data["tiers"], dict):
|
||||
raise ValueError("'tiers' must be a dictionary")
|
||||
|
||||
def _parse_job(self, job_data: dict) -> Job:
|
||||
"""Parse a single job"""
|
||||
project_id = job_data["project_id"]
|
||||
tiers = {}
|
||||
|
||||
for tier_name, tier_data in job_data["tiers"].items():
|
||||
tier_config = self._parse_tier(tier_name, tier_data)
|
||||
tiers[tier_name] = tier_config
|
||||
|
||||
return Job(project_id=project_id, tiers=tiers)
|
||||
|
||||
def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig:
|
||||
"""Parse tier configuration with defaults"""
|
||||
defaults = TIER_DEFAULTS.get(tier_name, TIER_DEFAULTS["tier3"])
|
||||
|
||||
return TierConfig(
|
||||
count=tier_data.get("count", 1),
|
||||
min_word_count=tier_data.get("min_word_count", defaults["min_word_count"]),
|
||||
max_word_count=tier_data.get("max_word_count", defaults["max_word_count"]),
|
||||
min_h2_tags=tier_data.get("min_h2_tags", defaults["min_h2_tags"]),
|
||||
max_h2_tags=tier_data.get("max_h2_tags", defaults["max_h2_tags"]),
|
||||
min_h3_tags=tier_data.get("min_h3_tags", defaults["min_h3_tags"]),
|
||||
max_h3_tags=tier_data.get("max_h3_tags", defaults["max_h3_tags"])
|
||||
)
|
||||
|
||||
def get_jobs(self) -> list[Job]:
|
||||
"""Return list of all jobs in file"""
|
||||
return self.jobs
|
||||
|
||||
def get_tier_config(self, job: Job, tier_name: str) -> Optional[TierConfig]:
|
||||
"""Get tier config with defaults applied"""
|
||||
return job.tiers.get(tier_name)
|
||||
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
{
|
||||
"system_message": "You are an expert content editor who expands articles by adding depth, detail, and additional relevant information while maintaining topical focus and quality.",
|
||||
"user_prompt": "Please expand on the following article to add more detail and depth, ensuring you maintain the existing topical focus. Target word count: {target_word_count} words.\n\nCurrent article:\n{content}\n\nReturn the expanded article as an HTML fragment with the same structure (using <h2>, <h3>, <p> tags). You can add new paragraphs, expand existing ones, or add new subsections as needed. Do NOT change the existing headings unless necessary."
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
{
|
||||
"system_message": "You are an expert content writer who creates engaging, informative, and SEO-optimized articles that provide real value to readers while incorporating relevant keywords naturally.",
|
||||
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include naturally: {entities}\nRelated searches to address: {related_searches}\n\nTarget word count range: {min_word_count} to {max_word_count} words\n\nReturn as an HTML fragment with <h2>, <h3>, and <p> tags. Do NOT include <!DOCTYPE>, <html>, <head>, or <body> tags. Start directly with the first <h2> heading.\n\nWrite naturally and informatively. Incorporate the keyword, entities, and related searches organically throughout the content."
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
{
|
||||
"system_message": "You are an expert content outliner who creates well-structured, comprehensive article outlines that cover topics thoroughly and logically.",
|
||||
"user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- Between {min_h2} and {max_h2} H2 headings\n- Between {min_h3} and {max_h3} H3 subheadings total (distributed across H2 sections)\n\nEntities to incorporate: {entities}\nRelated searches to address: {related_searches}\n\nReturn ONLY valid JSON in this exact format:\n{{\"outline\": [{{\"h2\": \"Heading text\", \"h3\": [\"Subheading 1\", \"Subheading 2\"]}}, ...]}}\n\nEnsure the outline meets the minimum heading requirements and includes relevant entities and related searches."
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
{
|
||||
"system_message": "You are an expert SEO content writer who creates compelling, search-optimized titles that attract clicks while accurately representing the content topic.",
|
||||
"user_prompt": "Generate an SEO-optimized title for an article about: {keyword}\n\nRelated entities: {entities}\n\nRelated searches: {related_searches}\n\nReturn only the title text, no formatting or quotes."
|
||||
}
|
||||
|
||||
|
|
@ -1 +1,3 @@
|
|||
# Content validation rules
|
||||
# DEPRECATED: This module has been replaced by the simplified generation pipeline in service.py
|
||||
# Kept for reference only.
|
||||
|
|
|
|||
|
|
@ -1 +1,311 @@
|
|||
# AI API interaction
|
||||
"""
|
||||
Content generation service with three-stage pipeline
|
||||
"""
|
||||
|
||||
import re
|
||||
import json
|
||||
from html import unescape
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Optional, Tuple
|
||||
from src.generation.ai_client import AIClient, PromptManager
|
||||
from src.database.repositories import ProjectRepository, GeneratedContentRepository
|
||||
|
||||
|
||||
class ContentGenerator:
|
||||
"""Main service for generating content through AI pipeline"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
ai_client: AIClient,
|
||||
prompt_manager: PromptManager,
|
||||
project_repo: ProjectRepository,
|
||||
content_repo: GeneratedContentRepository
|
||||
):
|
||||
self.ai_client = ai_client
|
||||
self.prompt_manager = prompt_manager
|
||||
self.project_repo = project_repo
|
||||
self.content_repo = content_repo
|
||||
|
||||
def generate_title(self, project_id: int, debug: bool = False) -> str:
|
||||
"""
|
||||
Generate SEO-optimized title
|
||||
|
||||
Args:
|
||||
project_id: Project ID to generate title for
|
||||
debug: If True, save response to debug_output/
|
||||
|
||||
Returns:
|
||||
Generated title string
|
||||
"""
|
||||
project = self.project_repo.get_by_id(project_id)
|
||||
if not project:
|
||||
raise ValueError(f"Project {project_id} not found")
|
||||
|
||||
entities_str = ", ".join(project.entities or [])
|
||||
related_str = ", ".join(project.related_searches or [])
|
||||
|
||||
system_msg, user_prompt = self.prompt_manager.format_prompt(
|
||||
"title_generation",
|
||||
keyword=project.main_keyword,
|
||||
entities=entities_str,
|
||||
related_searches=related_str
|
||||
)
|
||||
|
||||
title = self.ai_client.generate_completion(
|
||||
prompt=user_prompt,
|
||||
system_message=system_msg,
|
||||
max_tokens=100,
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
title = title.strip().strip('"').strip("'")
|
||||
|
||||
if debug:
|
||||
self._save_debug_output(
|
||||
project_id, "title", title, "txt"
|
||||
)
|
||||
|
||||
return title
|
||||
|
||||
def generate_outline(
|
||||
self,
|
||||
project_id: int,
|
||||
title: str,
|
||||
min_h2: int,
|
||||
max_h2: int,
|
||||
min_h3: int,
|
||||
max_h3: int,
|
||||
debug: bool = False
|
||||
) -> dict:
|
||||
"""
|
||||
Generate article outline in JSON format
|
||||
|
||||
Args:
|
||||
project_id: Project ID
|
||||
title: Article title
|
||||
min_h2: Minimum H2 headings
|
||||
max_h2: Maximum H2 headings
|
||||
min_h3: Minimum H3 subheadings total
|
||||
max_h3: Maximum H3 subheadings total
|
||||
debug: If True, save response to debug_output/
|
||||
|
||||
Returns:
|
||||
Outline dictionary: {"outline": [{"h2": "...", "h3": ["...", "..."]}]}
|
||||
|
||||
Raises:
|
||||
ValueError: If outline doesn't meet minimum requirements
|
||||
"""
|
||||
project = self.project_repo.get_by_id(project_id)
|
||||
if not project:
|
||||
raise ValueError(f"Project {project_id} not found")
|
||||
|
||||
entities_str = ", ".join(project.entities or [])
|
||||
related_str = ", ".join(project.related_searches or [])
|
||||
|
||||
system_msg, user_prompt = self.prompt_manager.format_prompt(
|
||||
"outline_generation",
|
||||
title=title,
|
||||
keyword=project.main_keyword,
|
||||
min_h2=min_h2,
|
||||
max_h2=max_h2,
|
||||
min_h3=min_h3,
|
||||
max_h3=max_h3,
|
||||
entities=entities_str,
|
||||
related_searches=related_str
|
||||
)
|
||||
|
||||
outline_json = self.ai_client.generate_completion(
|
||||
prompt=user_prompt,
|
||||
system_message=system_msg,
|
||||
max_tokens=2000,
|
||||
temperature=0.7,
|
||||
json_mode=True
|
||||
)
|
||||
print(f"[DEBUG] Raw outline response: {outline_json}")
|
||||
# Save raw response immediately
|
||||
if debug:
|
||||
self._save_debug_output(project_id, "outline_raw", outline_json, "txt")
|
||||
print(f"[DEBUG] Raw outline response: {outline_json}")
|
||||
|
||||
try:
|
||||
outline = json.loads(outline_json)
|
||||
except json.JSONDecodeError as e:
|
||||
if debug:
|
||||
self._save_debug_output(project_id, "outline_error", outline_json, "txt")
|
||||
raise ValueError(f"Failed to parse outline JSON: {e}\nResponse: {outline_json[:500]}")
|
||||
|
||||
if "outline" not in outline:
|
||||
if debug:
|
||||
self._save_debug_output(project_id, "outline_invalid", json.dumps(outline, indent=2), "json")
|
||||
raise ValueError(f"Outline missing 'outline' key. Got keys: {list(outline.keys())}\nContent: {outline}")
|
||||
|
||||
h2_count = len(outline["outline"])
|
||||
h3_count = sum(len(section.get("h3", [])) for section in outline["outline"])
|
||||
|
||||
if h2_count < min_h2:
|
||||
raise ValueError(f"Outline has {h2_count} H2s, minimum is {min_h2}")
|
||||
|
||||
if h3_count < min_h3:
|
||||
raise ValueError(f"Outline has {h3_count} H3s, minimum is {min_h3}")
|
||||
|
||||
if debug:
|
||||
self._save_debug_output(
|
||||
project_id, "outline", json.dumps(outline, indent=2), "json"
|
||||
)
|
||||
|
||||
return outline
|
||||
|
||||
def generate_content(
|
||||
self,
|
||||
project_id: int,
|
||||
title: str,
|
||||
outline: dict,
|
||||
min_word_count: int,
|
||||
max_word_count: int,
|
||||
debug: bool = False
|
||||
) -> str:
|
||||
"""
|
||||
Generate full article HTML fragment
|
||||
|
||||
Args:
|
||||
project_id: Project ID
|
||||
title: Article title
|
||||
outline: Article outline dict
|
||||
min_word_count: Minimum word count for guidance
|
||||
max_word_count: Maximum word count for guidance
|
||||
debug: If True, save response to debug_output/
|
||||
|
||||
Returns:
|
||||
HTML string with <h2>, <h3>, <p> tags
|
||||
"""
|
||||
project = self.project_repo.get_by_id(project_id)
|
||||
if not project:
|
||||
raise ValueError(f"Project {project_id} not found")
|
||||
|
||||
entities_str = ", ".join(project.entities or [])
|
||||
related_str = ", ".join(project.related_searches or [])
|
||||
outline_str = json.dumps(outline, indent=2)
|
||||
|
||||
system_msg, user_prompt = self.prompt_manager.format_prompt(
|
||||
"content_generation",
|
||||
title=title,
|
||||
outline=outline_str,
|
||||
keyword=project.main_keyword,
|
||||
entities=entities_str,
|
||||
related_searches=related_str,
|
||||
min_word_count=min_word_count,
|
||||
max_word_count=max_word_count
|
||||
)
|
||||
|
||||
content = self.ai_client.generate_completion(
|
||||
prompt=user_prompt,
|
||||
system_message=system_msg,
|
||||
max_tokens=8000,
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
content = content.strip()
|
||||
|
||||
if debug:
|
||||
self._save_debug_output(
|
||||
project_id, "content", content, "html"
|
||||
)
|
||||
|
||||
return content
|
||||
|
||||
def validate_word_count(self, content: str, min_words: int, max_words: int) -> Tuple[bool, int]:
|
||||
"""
|
||||
Validate content word count
|
||||
|
||||
Args:
|
||||
content: HTML content string
|
||||
min_words: Minimum word count
|
||||
max_words: Maximum word count
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, actual_count)
|
||||
"""
|
||||
word_count = self.count_words(content)
|
||||
is_valid = min_words <= word_count <= max_words
|
||||
return is_valid, word_count
|
||||
|
||||
def count_words(self, html_content: str) -> int:
|
||||
"""
|
||||
Count words in HTML content
|
||||
|
||||
Args:
|
||||
html_content: HTML string
|
||||
|
||||
Returns:
|
||||
Number of words
|
||||
"""
|
||||
text = re.sub(r'<[^>]+>', '', html_content)
|
||||
text = unescape(text)
|
||||
words = text.split()
|
||||
return len(words)
|
||||
|
||||
def augment_content(
|
||||
self,
|
||||
content: str,
|
||||
target_word_count: int,
|
||||
debug: bool = False,
|
||||
project_id: Optional[int] = None
|
||||
) -> str:
|
||||
"""
|
||||
Expand article content to meet minimum word count
|
||||
|
||||
Args:
|
||||
content: Current HTML content
|
||||
target_word_count: Target word count
|
||||
debug: If True, save response to debug_output/
|
||||
project_id: Optional project ID for debug output
|
||||
|
||||
Returns:
|
||||
Expanded HTML content
|
||||
"""
|
||||
system_msg, user_prompt = self.prompt_manager.format_prompt(
|
||||
"content_augmentation",
|
||||
content=content,
|
||||
target_word_count=target_word_count
|
||||
)
|
||||
|
||||
augmented = self.ai_client.generate_completion(
|
||||
prompt=user_prompt,
|
||||
system_message=system_msg,
|
||||
max_tokens=8000,
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
augmented = augmented.strip()
|
||||
|
||||
if debug and project_id:
|
||||
self._save_debug_output(
|
||||
project_id, "augmented", augmented, "html"
|
||||
)
|
||||
|
||||
return augmented
|
||||
|
||||
def _save_debug_output(
|
||||
self,
|
||||
project_id: int,
|
||||
stage: str,
|
||||
content: str,
|
||||
extension: str,
|
||||
tier: Optional[str] = None,
|
||||
article_num: Optional[int] = None
|
||||
):
|
||||
"""Save debug output to file"""
|
||||
debug_dir = Path("debug_output")
|
||||
debug_dir.mkdir(exist_ok=True)
|
||||
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
|
||||
tier_part = f"_tier{tier}" if tier else ""
|
||||
article_part = f"_article{article_num}" if article_num else ""
|
||||
|
||||
filename = f"{stage}_project{project_id}{tier_part}{article_part}_{timestamp}.{extension}"
|
||||
filepath = debug_dir / filename
|
||||
|
||||
with open(filepath, 'w', encoding='utf-8') as f:
|
||||
f.write(content)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,52 @@
|
|||
"""
|
||||
Integration test for batch generation (stub)
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, MagicMock
|
||||
from src.generation.batch_processor import BatchProcessor
|
||||
from src.generation.service import ContentGenerator
|
||||
|
||||
|
||||
def test_batch_processor_initialization():
|
||||
"""Test BatchProcessor can be initialized"""
|
||||
mock_generator = Mock(spec=ContentGenerator)
|
||||
mock_content_repo = Mock()
|
||||
mock_project_repo = Mock()
|
||||
|
||||
processor = BatchProcessor(
|
||||
content_generator=mock_generator,
|
||||
content_repo=mock_content_repo,
|
||||
project_repo=mock_project_repo
|
||||
)
|
||||
|
||||
assert processor is not None
|
||||
assert processor.stats["total_jobs"] == 0
|
||||
assert processor.stats["processed_jobs"] == 0
|
||||
|
||||
|
||||
def test_batch_processor_stats_initialization():
|
||||
"""Test BatchProcessor initializes stats correctly"""
|
||||
mock_generator = Mock(spec=ContentGenerator)
|
||||
mock_content_repo = Mock()
|
||||
mock_project_repo = Mock()
|
||||
|
||||
processor = BatchProcessor(
|
||||
content_generator=mock_generator,
|
||||
content_repo=mock_content_repo,
|
||||
project_repo=mock_project_repo
|
||||
)
|
||||
|
||||
expected_keys = [
|
||||
"total_jobs",
|
||||
"processed_jobs",
|
||||
"total_articles",
|
||||
"generated_articles",
|
||||
"augmented_articles",
|
||||
"failed_articles"
|
||||
]
|
||||
|
||||
for key in expected_keys:
|
||||
assert key in processor.stats
|
||||
assert processor.stats[key] == 0
|
||||
|
||||
|
|
@ -0,0 +1,95 @@
|
|||
"""
|
||||
Unit tests for ContentGenerator service
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from src.generation.service import ContentGenerator
|
||||
|
||||
|
||||
def test_count_words_simple():
|
||||
"""Test word count on simple text"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
html = "<p>This is a test with five words</p>"
|
||||
count = generator.count_words(html)
|
||||
|
||||
assert count == 7
|
||||
|
||||
|
||||
def test_count_words_with_headings():
|
||||
"""Test word count with HTML headings"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
html = """
|
||||
<h2>Main Heading</h2>
|
||||
<p>This is a paragraph with some words.</p>
|
||||
<h3>Subheading</h3>
|
||||
<p>Another paragraph here.</p>
|
||||
"""
|
||||
|
||||
count = generator.count_words(html)
|
||||
|
||||
assert count > 10
|
||||
|
||||
|
||||
def test_count_words_strips_html_tags():
|
||||
"""Test that HTML tags are stripped before counting"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
html = "<p>Hello <strong>world</strong> this <em>is</em> a test</p>"
|
||||
count = generator.count_words(html)
|
||||
|
||||
assert count == 6
|
||||
|
||||
|
||||
def test_validate_word_count_within_range():
|
||||
"""Test validation when word count is within range"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
content = "<p>" + " ".join(["word"] * 100) + "</p>"
|
||||
is_valid, count = generator.validate_word_count(content, 50, 150)
|
||||
|
||||
assert is_valid is True
|
||||
assert count == 100
|
||||
|
||||
|
||||
def test_validate_word_count_below_minimum():
|
||||
"""Test validation when word count is below minimum"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
content = "<p>" + " ".join(["word"] * 30) + "</p>"
|
||||
is_valid, count = generator.validate_word_count(content, 50, 150)
|
||||
|
||||
assert is_valid is False
|
||||
assert count == 30
|
||||
|
||||
|
||||
def test_validate_word_count_above_maximum():
|
||||
"""Test validation when word count is above maximum"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
content = "<p>" + " ".join(["word"] * 200) + "</p>"
|
||||
is_valid, count = generator.validate_word_count(content, 50, 150)
|
||||
|
||||
assert is_valid is False
|
||||
assert count == 200
|
||||
|
||||
|
||||
def test_count_words_empty_content():
|
||||
"""Test word count on empty content"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
count = generator.count_words("")
|
||||
|
||||
assert count == 0
|
||||
|
||||
|
||||
def test_count_words_only_tags():
|
||||
"""Test word count on content with only HTML tags"""
|
||||
generator = ContentGenerator(None, None, None, None)
|
||||
|
||||
html = "<div><p></p><span></span></div>"
|
||||
count = generator.count_words(html)
|
||||
|
||||
assert count == 0
|
||||
|
||||
|
|
@ -0,0 +1,177 @@
|
|||
"""
|
||||
Unit tests for JobConfig parser
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import json
|
||||
from pathlib import Path
|
||||
from src.generation.job_config import JobConfig, TIER_DEFAULTS
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_job_file(tmp_path):
|
||||
"""Create a temporary job file for testing"""
|
||||
def _create_file(data):
|
||||
job_file = tmp_path / "test_job.json"
|
||||
with open(job_file, 'w') as f:
|
||||
json.dump(data, f)
|
||||
return str(job_file)
|
||||
return _create_file
|
||||
|
||||
|
||||
def test_load_job_config_valid(temp_job_file):
|
||||
"""Test loading valid job file"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
config = JobConfig(job_file)
|
||||
|
||||
assert len(config.get_jobs()) == 1
|
||||
assert config.get_jobs()[0].project_id == 1
|
||||
assert "tier1" in config.get_jobs()[0].tiers
|
||||
|
||||
|
||||
def test_tier_defaults_applied(temp_job_file):
|
||||
"""Test defaults applied when not in job file"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
config = JobConfig(job_file)
|
||||
|
||||
job = config.get_jobs()[0]
|
||||
tier1_config = job.tiers["tier1"]
|
||||
|
||||
assert tier1_config.count == 3
|
||||
assert tier1_config.min_word_count == TIER_DEFAULTS["tier1"]["min_word_count"]
|
||||
assert tier1_config.max_word_count == TIER_DEFAULTS["tier1"]["max_word_count"]
|
||||
|
||||
|
||||
def test_custom_values_override_defaults(temp_job_file):
|
||||
"""Test custom values override defaults"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 3000,
|
||||
"max_word_count": 3500
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
config = JobConfig(job_file)
|
||||
|
||||
job = config.get_jobs()[0]
|
||||
tier1_config = job.tiers["tier1"]
|
||||
|
||||
assert tier1_config.min_word_count == 3000
|
||||
assert tier1_config.max_word_count == 3500
|
||||
|
||||
|
||||
def test_multiple_jobs_in_file(temp_job_file):
|
||||
"""Test parsing file with multiple jobs"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {"tier1": {"count": 5}}
|
||||
},
|
||||
{
|
||||
"project_id": 2,
|
||||
"tiers": {"tier2": {"count": 10}}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
config = JobConfig(job_file)
|
||||
|
||||
jobs = config.get_jobs()
|
||||
assert len(jobs) == 2
|
||||
assert jobs[0].project_id == 1
|
||||
assert jobs[1].project_id == 2
|
||||
|
||||
|
||||
def test_multiple_tiers_in_job(temp_job_file):
|
||||
"""Test job with multiple tiers"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {"count": 5},
|
||||
"tier2": {"count": 10},
|
||||
"tier3": {"count": 15}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
config = JobConfig(job_file)
|
||||
|
||||
job = config.get_jobs()[0]
|
||||
assert len(job.tiers) == 3
|
||||
assert "tier1" in job.tiers
|
||||
assert "tier2" in job.tiers
|
||||
assert "tier3" in job.tiers
|
||||
|
||||
|
||||
def test_invalid_job_file_no_jobs_key(temp_job_file):
|
||||
"""Test error when jobs key is missing"""
|
||||
data = {"invalid": []}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
|
||||
with pytest.raises(ValueError, match="must contain 'jobs'"):
|
||||
JobConfig(job_file)
|
||||
|
||||
|
||||
def test_invalid_job_missing_project_id(temp_job_file):
|
||||
"""Test error when project_id is missing"""
|
||||
data = {
|
||||
"jobs": [
|
||||
{
|
||||
"tiers": {"tier1": {"count": 5}}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
job_file = temp_job_file(data)
|
||||
|
||||
with pytest.raises(ValueError, match="missing 'project_id'"):
|
||||
JobConfig(job_file)
|
||||
|
||||
|
||||
def test_file_not_found():
|
||||
"""Test error when file doesn't exist"""
|
||||
with pytest.raises(FileNotFoundError):
|
||||
JobConfig("nonexistent_file.json")
|
||||
|
||||
Loading…
Reference in New Issue