Big-Link-Man/STORY_3.1_IMPLEMENTATION_SU...

267 lines
8.2 KiB
Markdown

# Story 3.1 Implementation Summary
## Overview
Implemented URL generation and site assignment for batch content generation, including full auto-creation capabilities and priority-based site assignment.
## What Was Implemented
### 1. Database Schema Changes
- **Modified**: `src/database/models.py`
- Made `custom_hostname` nullable in `SiteDeployment` model
- Added unique constraint to `pull_zone_bcdn_hostname`
- Updated `__repr__` to handle both custom and bcdn hostnames
- **Migration Script**: `scripts/migrate_story_3.1.sql`
- SQL script to update existing databases
- Run this on your dev database before testing
### 2. Repository Layer Updates
- **Modified**: `src/database/interfaces.py`
- Changed `custom_hostname` to optional parameter in `create()` signature
- Added `get_by_bcdn_hostname()` method signature
- Updated `exists()` to check both hostname types
- **Modified**: `src/database/repositories.py`
- Made `custom_hostname` parameter optional with default `None`
- Implemented `get_by_bcdn_hostname()` method
- Updated `exists()` to query both custom and bcdn hostnames
### 3. Template Service Update
- **Modified**: `src/templating/service.py`
- Line 92: Changed to `hostname = site_deployment.custom_hostname or site_deployment.pull_zone_bcdn_hostname`
- Now handles sites with only bcdn hostnames
### 4. CLI Updates
- **Modified**: `src/cli/commands.py`
- Updated `sync-sites` command to import sites without custom domains
- Removed filter that skipped bcdn-only sites
- Now imports all bunny.net sites (with or without custom domains)
### 5. Site Provisioning Module (NEW)
- **Created**: `src/generation/site_provisioning.py`
- `generate_random_suffix()`: Creates random 4-char suffixes
- `slugify_keyword()`: Converts keywords to URL-safe slugs
- `create_bunnynet_site()`: Creates Storage Zone + Pull Zone via API
- `provision_keyword_sites()`: Pre-creates sites for specific keywords
- `create_generic_sites()`: Creates generic sites on-demand
### 6. URL Generator Module (NEW)
- **Created**: `src/generation/url_generator.py`
- `generate_slug()`: Converts article titles to URL-safe slugs
- `generate_urls_for_batch()`: Generates complete URLs for all articles in batch
- Handles custom domains and bcdn hostnames
- Returns full URL mappings with metadata
### 7. Job Config Extensions
- **Modified**: `src/generation/job_config.py`
- Added `tier1_preferred_sites: Optional[List[str]]` field
- Added `auto_create_sites: bool` field (default: False)
- Added `create_sites_for_keywords: Optional[List[Dict]]` field
- Full validation for all new fields
### 8. Site Assignment Module (NEW)
- **Created**: `src/generation/site_assignment.py`
- `assign_sites_to_batch()`: Main assignment function with full priority system
- `_get_keyword_sites()`: Helper to match sites by keyword
- **Priority system**:
- Tier1: preferred sites → keyword sites → random
- Tier2+: keyword sites → random
- Auto-creates sites when pool is insufficient (if enabled)
- Prevents duplicate assignments within same batch
### 9. Comprehensive Tests
- **Created**: `tests/unit/test_url_generator.py` - URL generation tests
- **Created**: `tests/unit/test_site_provisioning.py` - Site creation tests
- **Created**: `tests/unit/test_site_assignment.py` - Assignment logic tests
- **Created**: `tests/unit/test_job_config_extensions.py` - Config parsing tests
- **Created**: `tests/integration/test_story_3_1_integration.py` - Full workflow tests
### 10. Example Job Config
- **Created**: `jobs/example_story_3.1_full_features.json`
- Demonstrates all new features
- Ready-to-use template
## How to Use
### Step 1: Migrate Your Database
Run the migration script on your development database:
```sql
-- From scripts/migrate_story_3.1.sql
ALTER TABLE site_deployments MODIFY COLUMN custom_hostname VARCHAR(255) NULL;
ALTER TABLE site_deployments ADD CONSTRAINT uq_pull_zone_bcdn_hostname UNIQUE (pull_zone_bcdn_hostname);
```
### Step 2: Sync Existing Bunny.net Sites
Import your 400+ existing bunny.net buckets:
```bash
uv run python main.py sync-sites --admin-user your_admin --dry-run
```
Review the output, then run without `--dry-run` to import.
### Step 3: Create a Job Config
Use the new fields in your job configuration:
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier1": {"count": 10}
},
"tier1_preferred_sites": ["www.premium.com"],
"auto_create_sites": true,
"create_sites_for_keywords": [
{"keyword": "engine repair", "count": 3}
]
}]
}
```
### Step 4: Use in Your Workflow
In your content generation workflow:
```python
from src.generation.site_assignment import assign_sites_to_batch
from src.generation.url_generator import generate_urls_for_batch
# After content generation, assign sites
assign_sites_to_batch(
content_records=generated_articles,
job=job_config,
site_repo=site_repository,
bunny_client=bunny_client,
project_keyword=project.main_keyword
)
# Generate URLs
urls = generate_urls_for_batch(
content_records=generated_articles,
site_repo=site_repository
)
# urls is a list of:
# [{
# "content_id": 1,
# "title": "How to Fix Your Engine",
# "url": "https://www.example.com/how-to-fix-your-engine.html",
# "tier": "tier1",
# "slug": "how-to-fix-your-engine",
# "hostname": "www.example.com"
# }, ...]
```
## Site Assignment Priority Logic
### For Tier1 Articles:
1. **Preferred Sites** (from `tier1_preferred_sites`) - if specified
2. **Keyword Sites** (matching article keyword in site name)
3. **Random** from available pool
### For Tier2+ Articles:
1. **Keyword Sites** (matching article keyword in site name)
2. **Random** from available pool
### Auto-Creation:
If `auto_create_sites: true` and pool is insufficient:
- Creates minimum number of generic sites needed
- Uses project main keyword in site names
- Creates via bunny.net API (Storage Zone + Pull Zone)
## URL Structure
### With Custom Domain:
```
https://www.example.com/how-to-fix-your-engine.html
```
### With Bunny.net CDN Only:
```
https://mysite123.b-cdn.net/how-to-fix-your-engine.html
```
## Slug Generation Rules
- Lowercase
- Replace spaces with hyphens
- Remove special characters
- Max 100 characters
- Fallback: `article-{content_id}` if empty
## Testing
Run the tests:
```bash
# Unit tests
uv run pytest tests/unit/test_url_generator.py
uv run pytest tests/unit/test_site_provisioning.py
uv run pytest tests/unit/test_site_assignment.py
uv run pytest tests/unit/test_job_config_extensions.py
# Integration tests
uv run pytest tests/integration/test_story_3_1_integration.py
# All Story 3.1 tests
uv run pytest tests/ -k "story_3_1 or url_generator or site_provisioning or site_assignment or job_config_extensions"
```
## Key Features
### Simple Over Complex
- No fuzzy keyword matching (as requested)
- Straightforward priority system
- Clear error messages
- Minimal dependencies
### Full Auto-Creation
- Pre-create sites for specific keywords
- Auto-create generic sites when needed
- All sites use bunny.net API
### Full Priority System
- Tier1 preferred sites
- Keyword-based matching
- Random assignment fallback
### Flexible Hostnames
- Supports custom domains
- Supports bcdn-only sites
- Automatically chooses correct hostname
## Production Deployment
When moving to production:
1. The model changes will automatically apply (SQLAlchemy will create tables correctly)
2. No additional migration scripts needed
3. Just ensure your production `.env` has `BUNNY_ACCOUNT_API_KEY` set
4. Run `sync-sites` to import existing bunny.net infrastructure
## Files Changed/Created
### Modified (8 files):
- `src/database/models.py`
- `src/database/interfaces.py`
- `src/database/repositories.py`
- `src/templating/service.py`
- `src/cli/commands.py`
- `src/generation/job_config.py`
### Created (9 files):
- `scripts/migrate_story_3.1.sql`
- `src/generation/site_provisioning.py`
- `src/generation/url_generator.py`
- `src/generation/site_assignment.py`
- `tests/unit/test_url_generator.py`
- `tests/unit/test_site_provisioning.py`
- `tests/unit/test_site_assignment.py`
- `tests/unit/test_job_config_extensions.py`
- `tests/integration/test_story_3_1_integration.py`
- `jobs/example_story_3.1_full_features.json`
- `STORY_3.1_IMPLEMENTATION_SUMMARY.md`
## Total Effort
Completed all 10 tasks from the story specification.