Base T2 off of T1, some deletion of old stories
parent
785fc9c7ac
commit
81cf55f2e7
|
|
@ -1,257 +0,0 @@
|
|||
# CLI Integration Explanation - Story 3.3
|
||||
|
||||
## The Problem
|
||||
|
||||
Story 3.3's `inject_interlinks()` function (and Stories 3.1-3.2) are **implemented and tested perfectly**, but they're **never called** in the actual batch generation workflow.
|
||||
|
||||
## Current Workflow
|
||||
|
||||
When you run:
|
||||
```bash
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
```
|
||||
|
||||
Here's what actually happens:
|
||||
|
||||
### Step-by-Step Current Flow
|
||||
|
||||
```
|
||||
1. CLI Command (src/cli/commands.py)
|
||||
└─> generate_batch() function called
|
||||
└─> Creates BatchProcessor
|
||||
└─> BatchProcessor.process_job()
|
||||
|
||||
2. BatchProcessor.process_job() (src/generation/batch_processor.py)
|
||||
└─> Reads job file
|
||||
└─> For each job:
|
||||
└─> _process_single_job()
|
||||
└─> Validates deployment targets
|
||||
└─> For each tier (tier1, tier2, tier3):
|
||||
└─> _process_tier()
|
||||
|
||||
3. _process_tier()
|
||||
└─> For each article (1 to count):
|
||||
└─> _generate_single_article()
|
||||
├─> Generate title
|
||||
├─> Generate outline
|
||||
├─> Generate content
|
||||
├─> Augment if needed
|
||||
└─> SAVE to database
|
||||
|
||||
4. END! ⚠️
|
||||
|
||||
Nothing happens after articles are generated!
|
||||
No URLs, no tiered links, no interlinking!
|
||||
```
|
||||
|
||||
## What's Missing
|
||||
|
||||
After all articles are generated for a tier, we need to add Story 3.1-3.3:
|
||||
|
||||
```python
|
||||
# THIS CODE DOES NOT EXIST YET!
|
||||
# Needs to be added at the end of _process_tier() or _process_single_job()
|
||||
|
||||
# 1. Get all generated content for this batch
|
||||
content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name)
|
||||
|
||||
# 2. Assign sites (Story 3.1)
|
||||
from src.generation.site_assignment import assign_sites_to_batch
|
||||
assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword)
|
||||
|
||||
# 3. Generate URLs (Story 3.1)
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
article_urls = generate_urls_for_batch(content_records, site_repo)
|
||||
|
||||
# 4. Find tiered links (Story 3.2)
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
tiered_links = find_tiered_links(
|
||||
content_records, job_config, project_repo, content_repo, site_repo
|
||||
)
|
||||
|
||||
# 5. Inject interlinks (Story 3.3)
|
||||
from src.interlinking.content_injection import inject_interlinks
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
link_repo = ArticleLinkRepository(session)
|
||||
inject_interlinks(
|
||||
content_records, article_urls, tiered_links,
|
||||
project, job_config, content_repo, link_repo
|
||||
)
|
||||
|
||||
# 6. Apply templates (existing functionality)
|
||||
for content in content_records:
|
||||
content_generator.apply_template(content.id)
|
||||
```
|
||||
|
||||
## Why This Matters
|
||||
|
||||
### Current State
|
||||
✓ Articles are generated
|
||||
✗ Articles have NO internal links
|
||||
✗ Articles have NO tiered links
|
||||
✗ Articles have NO "See Also" section
|
||||
✗ Articles have NO final URLs assigned
|
||||
✗ Templates are NOT applied
|
||||
|
||||
**Result**: Articles sit in database with raw HTML, no links, unusable for deployment
|
||||
|
||||
### With Integration
|
||||
✓ Articles are generated
|
||||
✓ Sites are assigned to articles
|
||||
✓ Final URLs are generated
|
||||
✓ Tiered links are found
|
||||
✓ All links are injected
|
||||
✓ Templates are applied
|
||||
✓ Articles are ready for deployment
|
||||
|
||||
**Result**: Complete, interlinked articles ready for Story 4.x deployment
|
||||
|
||||
## Where to Add Integration
|
||||
|
||||
### Option 1: End of `_process_tier()` (RECOMMENDED)
|
||||
Add the integration code at line 162 (after the article generation loop):
|
||||
|
||||
```python
|
||||
def _process_tier(self, project_id, tier_name, tier_config, ...):
|
||||
# ... existing article generation loop ...
|
||||
|
||||
# NEW: Post-generation interlinking
|
||||
click.echo(f" {tier_name}: Injecting interlinks for {tier_config.count} articles...")
|
||||
self._inject_tier_interlinks(project_id, tier_name, job, debug)
|
||||
```
|
||||
|
||||
Then create new method:
|
||||
```python
|
||||
def _inject_tier_interlinks(self, project_id, tier_name, job, debug):
|
||||
"""Inject interlinks for all articles in a tier"""
|
||||
# Get all articles for this tier
|
||||
content_records = self.content_repo.get_by_project_and_tier(
|
||||
project_id, tier_name
|
||||
)
|
||||
|
||||
if not content_records:
|
||||
click.echo(f" Warning: No articles found for {tier_name}")
|
||||
return
|
||||
|
||||
# Steps 1-6 from above...
|
||||
```
|
||||
|
||||
### Option 2: End of `_process_single_job()`
|
||||
Add integration after ALL tiers are generated (processes entire job at once):
|
||||
|
||||
```python
|
||||
def _process_single_job(self, job, job_idx, debug, continue_on_error):
|
||||
# ... existing tier processing ...
|
||||
|
||||
# NEW: Process all tiers together
|
||||
click.echo(f"\nPost-processing: Injecting interlinks...")
|
||||
for tier_name in job.tiers.keys():
|
||||
self._inject_tier_interlinks(job.project_id, tier_name, job, debug)
|
||||
```
|
||||
|
||||
## Why It Wasn't Integrated Yet
|
||||
|
||||
Looking at the story implementations, it appears:
|
||||
|
||||
1. **Story 3.1** (URL Generation) - Functions exist but not integrated
|
||||
2. **Story 3.2** (Tiered Links) - Functions exist but not integrated
|
||||
3. **Story 3.3** (Content Injection) - Functions exist but not integrated
|
||||
|
||||
This suggests the stories focused on **building the functionality** with the expectation that **Story 4.x (Deployment)** would integrate everything together.
|
||||
|
||||
## Impact of Missing Integration
|
||||
|
||||
### Tests Still Pass ✓
|
||||
- Unit tests test functions in isolation
|
||||
- Integration tests use the functions directly
|
||||
- All 42 tests pass because the **functions work perfectly**
|
||||
|
||||
### But Real Usage Fails ✗
|
||||
When you actually run `generate-batch`:
|
||||
- Articles are generated
|
||||
- They're saved to database
|
||||
- But they have no links, no URLs, nothing
|
||||
- Story 4.x deployment would fail because articles aren't ready
|
||||
|
||||
## Effort to Fix
|
||||
|
||||
**Time Estimate**: 30-60 minutes
|
||||
|
||||
**Tasks**:
|
||||
1. Add imports to `batch_processor.py` (2 minutes)
|
||||
2. Create `_inject_tier_interlinks()` method (15 minutes)
|
||||
3. Add call at end of `_process_tier()` (2 minutes)
|
||||
4. Test with real job file (10 minutes)
|
||||
5. Debug any issues (10-20 minutes)
|
||||
|
||||
**Complexity**: Low - just wiring existing functions together
|
||||
|
||||
## Testing the Integration
|
||||
|
||||
After adding integration:
|
||||
|
||||
```bash
|
||||
# 1. Run batch generation
|
||||
uv run python main.py generate-batch \
|
||||
--job-file jobs/test_small.json \
|
||||
--username admin \
|
||||
--password yourpass
|
||||
|
||||
# 2. Check database for links
|
||||
uv run python -c "
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
|
||||
session = db_manager.get_session()
|
||||
link_repo = ArticleLinkRepository(session)
|
||||
links = link_repo.get_all()
|
||||
print(f'Total links: {len(links)}')
|
||||
for link in links[:5]:
|
||||
print(f' {link.link_type}: {link.anchor_text} -> {link.to_url or link.to_content_id}')
|
||||
session.close()
|
||||
"
|
||||
|
||||
# 3. Verify articles have links in content
|
||||
uv run python -c "
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import GeneratedContentRepository
|
||||
|
||||
session = db_manager.get_session()
|
||||
content_repo = GeneratedContentRepository(session)
|
||||
articles = content_repo.get_all(limit=1)
|
||||
if articles:
|
||||
print('Sample article content:')
|
||||
print(articles[0].content[:500])
|
||||
print(f'Contains links: {\"<a href=\" in articles[0].content}')
|
||||
print(f'Has See Also: {\"See Also\" in articles[0].content}')
|
||||
session.close()
|
||||
"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**The Good News**:
|
||||
- All Story 3.3 code is perfect ✓
|
||||
- Tests prove functionality works ✓
|
||||
- No bugs, no issues ✓
|
||||
|
||||
**The Bad News**:
|
||||
- Code isn't wired into CLI workflow ✗
|
||||
- Running `generate-batch` doesn't use Story 3.1-3.3 ✗
|
||||
- Articles are incomplete without integration ✗
|
||||
|
||||
**The Fix**:
|
||||
- Add ~50 lines of integration code
|
||||
- Wire existing functions into `BatchProcessor`
|
||||
- Test with real job file
|
||||
- Done! ✓
|
||||
|
||||
**When to Fix**:
|
||||
- Now (before Story 4.x) - RECOMMENDED
|
||||
- Or during Story 4.x (when deployment needs links)
|
||||
- Not urgent if not deploying yet
|
||||
|
||||
---
|
||||
|
||||
*This explains why all tests pass but the feature "isn't done" yet - the plumbing exists, it's just not connected to the main pipeline.*
|
||||
|
||||
|
|
@ -1,247 +0,0 @@
|
|||
# Deploy-Batch Analysis for test_shaft_machining.json
|
||||
|
||||
## Quick Answers to Your Questions
|
||||
|
||||
### 1. What should the anchor text be at each level?
|
||||
|
||||
**Tier 1 Articles (5 articles):**
|
||||
- **Money Site Links:** Uses `main_keyword` variations from project
|
||||
- "shaft machining"
|
||||
- "learn about shaft machining"
|
||||
- "shaft machining guide"
|
||||
- "best shaft machining"
|
||||
- "shaft machining tips"
|
||||
- System tries to find these phrases in content; picks first one that matches
|
||||
|
||||
- **Home Link:** Now in navigation menu (not injected into content)
|
||||
|
||||
- **See Also Links:** Uses article titles as anchor text
|
||||
|
||||
**Tier 2 Articles (20 articles):**
|
||||
- **Lower Tier Links:** Uses `related_searches` from CORA data
|
||||
- Depends on what related searches were in the shaft_machining.xlsx file
|
||||
- If no related searches exist, falls back to main_keyword variations
|
||||
|
||||
- **Home Link:** Now in navigation menu (not injected into content)
|
||||
|
||||
- **See Also Links:** Uses article titles as anchor text
|
||||
|
||||
**Configuration:**
|
||||
- Anchor text rules come from `master.config.json` → `interlinking.tier_anchor_text_rules`
|
||||
- Can be overridden in job config with `anchor_text_config`
|
||||
|
||||
### 2. How many links should be in each article?
|
||||
|
||||
**Tier 1 Articles:**
|
||||
- 1 link to money site (https://fzemanufacturing.com/capabilities/shaft-machining-services)
|
||||
- 4 "See Also" links (to the other 4 tier1 articles)
|
||||
- **Total: 5 links per tier1 article** (plus Home in nav menu)
|
||||
|
||||
**Tier 2 Articles:**
|
||||
- 2-4 links to tier1 articles (random selection, count is `interlinking.links_per_article_min` to `max`)
|
||||
- 19 "See Also" links (to the other 19 tier2 articles)
|
||||
- **Total: 21-23 links per tier2 article** (plus Home in nav menu)
|
||||
|
||||
**Your JSON Configuration:**
|
||||
```json
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
```
|
||||
This controls the tiered links (tier2 → tier1). Each tier2 article will get between 2-4 random tier1 articles to link to.
|
||||
|
||||
### 3. Should "Home" be a link?
|
||||
|
||||
**YES** - Home is a link in the navigation menu at the top of every page.
|
||||
|
||||
**How it works:**
|
||||
- The HTML template (`basic.html`) includes a `<nav>` menu with Home link
|
||||
- Template line 113: `<li><a href="/index.html">Home</a></li>`
|
||||
- This is part of the template wrapper, not injected into article content
|
||||
|
||||
**Old behavior (now removed):**
|
||||
- Previously, system searched article content for "Home" and tried to link it
|
||||
- This was redundant since Home is already in the nav menu
|
||||
- Code has been updated to remove this injection
|
||||
|
||||
## Step-by-Step: What Happens During deploy-batch
|
||||
|
||||
### Step 1: Load Articles from Database
|
||||
```
|
||||
- Project 1 has generated content already
|
||||
- Tier 1: 5 articles
|
||||
- Tier 2: 20 articles
|
||||
- Each article has: title, content (HTML), site_deployment_id
|
||||
```
|
||||
|
||||
### Step 2: URL Generation (already done during generate-batch)
|
||||
```
|
||||
Tier 1 URLs (round-robin between getcnc.info and textbullseye.com):
|
||||
- Article 0: https://getcnc.info/{slug}.html
|
||||
- Article 1: https://www.textbullseye.com/{slug}.html
|
||||
- Article 2: https://getcnc.info/{slug}.html
|
||||
- Article 3: https://www.textbullseye.com/{slug}.html
|
||||
- Article 4: https://getcnc.info/{slug}.html
|
||||
|
||||
Tier 2 URLs (round-robin):
|
||||
- Articles 0-19 distributed across both domains
|
||||
```
|
||||
|
||||
### Step 3: Tiered Links (already injected during generate-batch)
|
||||
|
||||
**For Tier 1:**
|
||||
- Target: Money site URL from project database
|
||||
- Anchor text: main_keyword variations
|
||||
- Links already in `generated_content.content` HTML
|
||||
|
||||
**For Tier 2:**
|
||||
- Target: Random selection of tier1 URLs (2-4 per article)
|
||||
- Anchor text: related_searches from project
|
||||
- Links already in HTML
|
||||
|
||||
### Step 4: Homepage Links
|
||||
- Home link is in the navigation menu (template)
|
||||
- No longer injected into article content
|
||||
|
||||
### Step 5: See Also Section (already injected)
|
||||
- HTML section with links to other articles in same tier
|
||||
|
||||
### Step 6: Template Application (already done)
|
||||
- HTML wrapped in template from `src/templating/templates/basic.html`
|
||||
- Navigation menu added
|
||||
- Stored in `generated_content.formatted_html`
|
||||
|
||||
### Step 7: Upload to Bunny.net
|
||||
```
|
||||
For each article:
|
||||
1. Get site deployment credentials
|
||||
2. Upload formatted_html to storage zone
|
||||
3. File path: /{slug}.html
|
||||
4. Log URL to deployment_logs/
|
||||
5. Update database: deployed_url, status='deployed'
|
||||
|
||||
For each site's boilerplate pages:
|
||||
1. Upload index.html (if exists)
|
||||
2. Upload about.html
|
||||
3. Upload contact.html
|
||||
4. Upload privacy.html
|
||||
```
|
||||
|
||||
## Database Link Tracking
|
||||
|
||||
All links are tracked in `article_links` table:
|
||||
|
||||
**Tier 1 Article Example (ID: 43):**
|
||||
```
|
||||
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||
|-----------------|---------------|--------|-------------|-----------|
|
||||
| 43 | NULL | https://fzemanufacturing.com/... | "shaft machining" | tiered |
|
||||
| 43 | 44 | NULL | "Understanding CNC..." | wheel_see_also |
|
||||
| 43 | 45 | NULL | "Advanced Shaft..." | wheel_see_also |
|
||||
| 43 | 46 | NULL | "Precision Machining..." | wheel_see_also |
|
||||
| 43 | 47 | NULL | "Modern Shaft..." | wheel_see_also |
|
||||
```
|
||||
|
||||
**Tier 2 Article Example (ID: 48):**
|
||||
```
|
||||
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||
|-----------------|---------------|--------|-------------|-----------|
|
||||
| 48 | NULL | https://getcnc.info/{slug1}.html | "cnc machining services" | tiered |
|
||||
| 48 | NULL | https://www.textbullseye.com/{slug2}.html | "precision shaft work" | tiered |
|
||||
| 48 | NULL | https://getcnc.info/{slug3}.html | "shaft turning operations" | tiered |
|
||||
| 48 | 49 | NULL | "Tier 2 Article 2 Title" | wheel_see_also |
|
||||
| ... | ... | ... | ... | ... |
|
||||
| 48 | 67 | NULL | "Tier 2 Article 20 Title" | wheel_see_also |
|
||||
```
|
||||
|
||||
**Note:** Home link is no longer tracked in the database since it's in the template, not injected into content.
|
||||
|
||||
## Your Specific JSON File Analysis
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"deployment_targets": [
|
||||
"getcnc.info",
|
||||
"www.textbullseye.com"
|
||||
],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
},
|
||||
"tier2": {
|
||||
"count": 20,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "openai/gpt-4o-mini"
|
||||
},
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**What This Configuration Does:**
|
||||
|
||||
1. **Tier 1 (5 articles):**
|
||||
- Uses Claude Sonnet for content, GPT-4o-mini for titles/outlines
|
||||
- 1500-2000 words per article
|
||||
- Distributed across getcnc.info and textbullseye.com
|
||||
- Each links to: money site (1) + See Also (4) = 5 total links (plus Home in nav menu)
|
||||
|
||||
2. **Tier 2 (20 articles):**
|
||||
- Uses GPT-4o-mini for everything (cheaper)
|
||||
- Default word count (1100-1500)
|
||||
- Each links to: 2-4 tier1 articles + See Also (19) = 21-23 total links (plus Home in nav menu)
|
||||
- Distributed across both domains
|
||||
|
||||
3. **Missing Configurations (using defaults):**
|
||||
- `tier1.interlinking`: Not specified → uses defaults (but tier1 always gets 1 money site link anyway)
|
||||
- `anchor_text_config`: Not specified → uses master.config.json rules
|
||||
|
||||
## All JSON Fields That Affect Behavior
|
||||
|
||||
See `MASTER_JSON.json` for the complete reference. Key fields:
|
||||
|
||||
**Top-level job fields:**
|
||||
- `project_id` - Which project's data to use
|
||||
- `deployment_targets` - Which domains to deploy to
|
||||
- `models` - Which AI models to use
|
||||
- `tiered_link_count_range` - How many tiered links (job-level default)
|
||||
- `anchor_text_config` - Override anchor text generation
|
||||
- `interlinking` - Job-level interlinking defaults
|
||||
|
||||
**Tier-level fields:**
|
||||
- `count` - Number of articles
|
||||
- `min_word_count`, `max_word_count` - Content length
|
||||
- `min_h2_tags`, `max_h2_tags`, `min_h3_tags`, `max_h3_tags` - Outline structure
|
||||
- `models` - Tier-specific model overrides
|
||||
- `interlinking` - Tier-specific interlinking overrides
|
||||
|
||||
**Fields in master.config.json:**
|
||||
- `interlinking.tier_anchor_text_rules` - Defines anchor text sources per tier
|
||||
- `interlinking.include_home_link` - Global default for Home links
|
||||
- `interlinking.wheel_links` - Enable/disable See Also sections
|
||||
|
||||
**Fields in project database:**
|
||||
- `main_keyword` - Used for tier1 anchor text
|
||||
- `related_searches` - Used for tier2 anchor text
|
||||
- `entities` - Used for tier3+ anchor text
|
||||
- `money_site_url` - Destination for tier1 links
|
||||
|
||||
|
|
@ -1,89 +0,0 @@
|
|||
# Image and Template Issues Analysis
|
||||
|
||||
## Problems Identified
|
||||
|
||||
### 1. Missing Image CSS in Templates
|
||||
**Issue**: None of the templates (basic, modern, classic) have CSS for `<img>` tags.
|
||||
|
||||
**Impact**: Images display at full size, breaking layout especially in modern template with constrained article width (850px).
|
||||
|
||||
**Solution**: Add responsive image CSS to all templates:
|
||||
```css
|
||||
img {
|
||||
max-width: 100%;
|
||||
height: auto;
|
||||
display: block;
|
||||
margin: 1.5rem auto;
|
||||
border-radius: 8px;
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Template Storage Inconsistency
|
||||
**Issue**: `template_used` field is only set when `apply_template()` is called. If:
|
||||
- Templates are applied at different times
|
||||
- Some articles skip template application
|
||||
- Articles are moved between sites with different templates
|
||||
- Template application fails silently
|
||||
|
||||
Then the database may show incorrect or missing template values.
|
||||
|
||||
**Evidence**: User reports articles showing "basic" when they're actually "modern".
|
||||
|
||||
**Solution**:
|
||||
- Always apply templates before deployment
|
||||
- Re-apply templates if `template_used` doesn't match site's `template_name`
|
||||
- Add validation to ensure `template_used` matches site template
|
||||
|
||||
### 3. Images Lost During Interlink Injection
|
||||
**Issue**: Processing order:
|
||||
1. Images inserted into `content` → saved
|
||||
2. Interlinks injected → BeautifulSoup parses/rewrites HTML → saved
|
||||
3. Template applied → reads `content` → creates `formatted_html`
|
||||
|
||||
BeautifulSoup parsing may break image tags or lose them during HTML rewriting.
|
||||
|
||||
**Evidence**: User reports images were generated and uploaded (URLs in database) but don't appear in deployed articles.
|
||||
|
||||
**Solution Options**:
|
||||
- **Option A**: Re-insert images after interlink injection (read from `hero_image_url` and `content_images` fields)
|
||||
- **Option B**: Use more robust HTML parsing that preserves all tags
|
||||
- **Option C**: Apply template immediately after image insertion, then inject interlinks into `formatted_html` instead of `content`
|
||||
|
||||
### 4. Image Size Not Constrained
|
||||
**Issue**: Even if images are present, they're not constrained by template CSS, causing layout issues.
|
||||
|
||||
**Solution**: Add image CSS (see #1) and ensure images are inserted with proper attributes:
|
||||
```html
|
||||
<img src="..." alt="..." style="max-width: 100%; height: auto;" />
|
||||
```
|
||||
|
||||
## Recommended Fixes
|
||||
|
||||
### Priority 1: Add Image CSS to All Templates
|
||||
Add responsive image styling to:
|
||||
- `src/templating/templates/basic.html`
|
||||
- `src/templating/templates/modern.html`
|
||||
- `src/templating/templates/classic.html`
|
||||
|
||||
### Priority 2: Fix Image Preservation
|
||||
Modify `src/interlinking/content_injection.py` to preserve images:
|
||||
- Use `html.parser` with `preserve_whitespace` or `html5lib` parser
|
||||
- Or re-insert images after interlink injection using database fields
|
||||
|
||||
### Priority 3: Fix Template Tracking
|
||||
- Add validation in deployment to ensure `template_used` matches site template
|
||||
- Re-apply templates if mismatch detected
|
||||
- Add script to backfill/correct `template_used` values
|
||||
|
||||
### Priority 4: Improve Image Insertion
|
||||
- Add `max-width` style attribute when inserting images
|
||||
- Ensure images are inserted with proper responsive attributes
|
||||
|
||||
## Code Locations
|
||||
|
||||
- Image insertion: `src/generation/image_injection.py`
|
||||
- Interlink injection: `src/interlinking/content_injection.py` (line 53-76)
|
||||
- Template application: `src/generation/service.py` (line 409-460)
|
||||
- Template files: `src/templating/templates/*.html`
|
||||
- Deployment: `src/deployment/deployment_service.py` (uses `formatted_html`)
|
||||
|
||||
|
|
@ -1,337 +0,0 @@
|
|||
# CLI Integration Complete - Story 3.3
|
||||
|
||||
## Status: DONE ✅
|
||||
|
||||
The CLI integration for Story 3.1-3.3 has been successfully implemented and is ready for testing.
|
||||
|
||||
---
|
||||
|
||||
## What Was Changed
|
||||
|
||||
### 1. Modified `src/database/repositories.py`
|
||||
**Change**: Added `require_site` parameter to `get_by_project_and_tier()`
|
||||
|
||||
```python
|
||||
def get_by_project_and_tier(self, project_id: int, tier: str, require_site: bool = True)
|
||||
```
|
||||
|
||||
**Purpose**: Allows fetching articles with or without site assignments
|
||||
|
||||
**Impact**: Backward compatible (default `require_site=True` maintains existing behavior)
|
||||
|
||||
### 2. Modified `src/generation/batch_processor.py`
|
||||
**Changes**:
|
||||
1. Added imports for Story 3.1-3.3 functions
|
||||
2. Added `job` parameter to `_process_tier()`
|
||||
3. Added post-processing call at end of `_process_tier()`
|
||||
4. Created new `_post_process_tier()` method
|
||||
|
||||
**New Workflow**:
|
||||
```python
|
||||
_process_tier():
|
||||
1. Generate all articles (existing)
|
||||
2. Handle failures (existing)
|
||||
3. ✨ NEW: Call _post_process_tier()
|
||||
|
||||
_post_process_tier():
|
||||
1. Get articles with site assignments
|
||||
2. Generate URLs (Story 3.1)
|
||||
3. Find tiered links (Story 3.2)
|
||||
4. Inject interlinks (Story 3.3)
|
||||
5. Apply templates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What Now Happens When You Run `generate-batch`
|
||||
|
||||
### Before Integration ❌
|
||||
```bash
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
```
|
||||
|
||||
Result:
|
||||
- ✅ Articles generated
|
||||
- ❌ No URLs
|
||||
- ❌ No tiered links
|
||||
- ❌ No "See Also" section
|
||||
- ❌ No templates applied
|
||||
|
||||
### After Integration ✅
|
||||
```bash
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
```
|
||||
|
||||
Result:
|
||||
- ✅ Articles generated
|
||||
- ✅ URLs generated for articles with site assignments
|
||||
- ✅ Tiered links found (T1→money site, T2→T1)
|
||||
- ✅ All interlinks injected (tiered + homepage + See Also)
|
||||
- ✅ Templates applied to final HTML
|
||||
|
||||
---
|
||||
|
||||
## CLI Output Example
|
||||
|
||||
When you run a batch job, you'll now see:
|
||||
|
||||
```
|
||||
Processing Job 1/1: Project ID 1
|
||||
Validating deployment targets: www.example.com
|
||||
All deployment targets validated successfully
|
||||
|
||||
tier1: Generating 5 articles
|
||||
[1/5] Generating title...
|
||||
[1/5] Generating outline...
|
||||
[1/5] Generating content...
|
||||
[1/5] Generated content: 2,143 words
|
||||
[1/5] Saved (ID: 43, Status: generated)
|
||||
[2/5] Generating title...
|
||||
... (repeat for all articles)
|
||||
|
||||
tier1: Post-processing 5 articles... ← NEW!
|
||||
Generating URLs... ← NEW!
|
||||
Generated 5 URLs ← NEW!
|
||||
Finding tiered links... ← NEW!
|
||||
Found tiered links for tier 1 ← NEW!
|
||||
Injecting interlinks... ← NEW!
|
||||
Interlinks injected successfully ← NEW!
|
||||
Applying templates... ← NEW!
|
||||
Applied templates to 5/5 articles ← NEW!
|
||||
tier1: Post-processing complete ← NEW!
|
||||
|
||||
SUMMARY
|
||||
Jobs processed: 1/1
|
||||
Articles generated: 5/5
|
||||
Augmented: 0
|
||||
Failed: 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing the Integration
|
||||
|
||||
### Quick Test
|
||||
|
||||
1. **Create a small test job**:
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["www.testsite.com"],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 2
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
2. **Run the batch**:
|
||||
```bash
|
||||
uv run python main.py generate-batch \
|
||||
--job-file jobs/test_integration.json \
|
||||
--username admin \
|
||||
--password yourpass
|
||||
```
|
||||
|
||||
3. **Verify the results**:
|
||||
|
||||
Check for URLs:
|
||||
```bash
|
||||
uv run python -c "
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import GeneratedContentRepository
|
||||
|
||||
session = db_manager.get_session()
|
||||
repo = GeneratedContentRepository(session)
|
||||
articles = repo.get_by_project_and_tier(1, 'tier1')
|
||||
for a in articles:
|
||||
print(f'Article {a.id}: {a.title[:50]}')
|
||||
print(f' Has links: {\"<a href=\" in a.content}')
|
||||
print(f' Has See Also: {\"See Also\" in a.content}')
|
||||
print(f' Has template: {a.formatted_html is not None}')
|
||||
session.close()
|
||||
"
|
||||
```
|
||||
|
||||
Check for link records:
|
||||
```bash
|
||||
uv run python -c "
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
|
||||
session = db_manager.get_session()
|
||||
repo = ArticleLinkRepository(session)
|
||||
links = repo.get_all()
|
||||
print(f'Total links created: {len(links)}')
|
||||
for link in links[:10]:
|
||||
print(f' {link.link_type}: {link.anchor_text[:30] if link.anchor_text else \"N/A\"}')
|
||||
session.close()
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After running a batch job with the integration:
|
||||
|
||||
- [ ] CLI shows "Post-processing X articles..." messages
|
||||
- [ ] CLI shows "Generating URLs..." step
|
||||
- [ ] CLI shows "Finding tiered links..." step
|
||||
- [ ] CLI shows "Injecting interlinks..." step
|
||||
- [ ] CLI shows "Applying templates..." step
|
||||
- [ ] Database has ArticleLink records
|
||||
- [ ] Articles have `<a href=` tags in content
|
||||
- [ ] Articles have "See Also" sections
|
||||
- [ ] Articles have `formatted_html` populated
|
||||
- [ ] No errors or exceptions during post-processing
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
The integration includes graceful error handling:
|
||||
|
||||
### Scenario 1: No Articles with Site Assignments
|
||||
```
|
||||
tier1: No articles with site assignments to post-process
|
||||
```
|
||||
**Impact**: Post-processing skipped (this is normal for tiers without deployment_targets)
|
||||
|
||||
### Scenario 2: Post-Processing Fails
|
||||
```
|
||||
Warning: Post-processing failed for tier1: [error message]
|
||||
Traceback: [if --debug enabled]
|
||||
```
|
||||
**Impact**: Article generation continues, but articles won't have links/templates
|
||||
|
||||
### Scenario 3: Template Application Fails
|
||||
```
|
||||
Warning: Failed to apply template to content 123: [error message]
|
||||
```
|
||||
**Impact**: Links are still injected, only template application skipped
|
||||
|
||||
---
|
||||
|
||||
## What Gets Post-Processed
|
||||
|
||||
**Only articles with `site_deployment_id` set** are post-processed.
|
||||
|
||||
### Tier 1 with deployment_targets
|
||||
✅ Articles assigned to sites → Post-processed
|
||||
✅ Get URLs, links, templates
|
||||
|
||||
### Tier 1 without deployment_targets
|
||||
⚠️ No site assignments → Skipped
|
||||
❌ No URLs (need sites)
|
||||
❌ No links (need URLs)
|
||||
❌ No templates (need sites)
|
||||
|
||||
### Tier 2/3
|
||||
⚠️ No automatic site assignment → Skipped
|
||||
❌ Unless Story 3.1 site assignment is added
|
||||
|
||||
**Note**: To post-process tier 2/3 articles, you'll need to implement automatic site assignment (Story 3.1 has the functions, just need to call them).
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
```
|
||||
src/database/repositories.py
|
||||
└─ get_by_project_and_tier() - added require_site parameter
|
||||
|
||||
src/generation/batch_processor.py
|
||||
├─ Added imports for Story 3.1-3.3
|
||||
├─ _process_tier() - added job parameter
|
||||
├─ _process_tier() - added post-processing call
|
||||
└─ _post_process_tier() - new method (76 lines)
|
||||
```
|
||||
|
||||
**Total lines added**: ~100 lines
|
||||
**Linter errors**: 0
|
||||
**Breaking changes**: None (backward compatible)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### 1. Test with Real Job
|
||||
Run a real batch generation with deployment targets:
|
||||
```bash
|
||||
uv run python main.py generate-batch \
|
||||
--job-file jobs/example_story_3.2_tiered_links.json \
|
||||
--username admin \
|
||||
--password yourpass \
|
||||
--debug
|
||||
```
|
||||
|
||||
### 2. Verify Database
|
||||
Check that:
|
||||
- `article_links` table has records
|
||||
- `generated_content` has links in `content` field
|
||||
- `generated_content` has `formatted_html` populated
|
||||
|
||||
### 3. Add Tier 2/3 Site Assignment (Optional)
|
||||
If you want tier 2/3 articles to also get post-processed, add site assignment logic before URL generation:
|
||||
|
||||
```python
|
||||
# In _post_process_tier(), before getting content_records:
|
||||
|
||||
if tier_name != "tier1":
|
||||
# Assign sites to tier 2/3 articles
|
||||
from src.generation.site_assignment import assign_sites_to_batch
|
||||
all_articles = self.content_repo.get_by_project_and_tier(
|
||||
project_id, tier_name, require_site=False
|
||||
)
|
||||
# ... assign sites logic ...
|
||||
```
|
||||
|
||||
### 4. Story 4.x: Deployment
|
||||
Articles are now ready for deployment with:
|
||||
- Final URLs
|
||||
- All interlinks
|
||||
- Templates applied
|
||||
- Database link records for analytics
|
||||
|
||||
---
|
||||
|
||||
## Rollback (If Needed)
|
||||
|
||||
If you need to revert the integration:
|
||||
|
||||
```bash
|
||||
git checkout src/generation/batch_processor.py
|
||||
git checkout src/database/repositories.py
|
||||
```
|
||||
|
||||
This will remove the integration and restore the previous behavior (articles generated without post-processing).
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Integration Status**: COMPLETE ✅
|
||||
**Tests Passing**: 42/42 (100%) ✅
|
||||
**Linter Errors**: 0 ✅
|
||||
**Ready for Use**: YES ✅
|
||||
|
||||
The batch generation workflow now includes the full Story 3.1-3.3 pipeline:
|
||||
1. Generate content
|
||||
2. Assign sites (for tier1 with deployment_targets)
|
||||
3. Generate URLs
|
||||
4. Find tiered links
|
||||
5. Inject all interlinks
|
||||
6. Apply templates
|
||||
|
||||
**Articles are now fully interlinked and ready for deployment!**
|
||||
|
||||
---
|
||||
|
||||
*Integration completed: October 21, 2025*
|
||||
|
||||
|
|
@ -1,241 +0,0 @@
|
|||
# Visual: The Integration Gap
|
||||
|
||||
## What Currently Happens
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ uv run python main.py generate-batch --job-file jobs/x.json │
|
||||
└────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ BatchProcessor.process_job() │
|
||||
│ │
|
||||
│ For each tier (tier1, tier2, tier3): │
|
||||
│ For each article (1 to N): │
|
||||
│ ┌──────────────────────────────────┐ │
|
||||
│ │ 1. Generate title │ │
|
||||
│ │ 2. Generate outline │ │
|
||||
│ │ 3. Generate content │ │
|
||||
│ │ 4. Augment if too short │ │
|
||||
│ │ 5. Save to database │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ⚠️ STOPS HERE! ⚠️ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
Result in database:
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ generated_content table: │
|
||||
│ - Raw HTML (no links) │
|
||||
│ - No site_deployment_id (most articles) │
|
||||
│ - No final URL │
|
||||
│ - No formatted_html │
|
||||
│ │
|
||||
│ article_links table: │
|
||||
│ - EMPTY (no records) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## What SHOULD Happen
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ uv run python main.py generate-batch --job-file jobs/x.json │
|
||||
└────────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ BatchProcessor.process_job() │
|
||||
│ │
|
||||
│ For each tier (tier1, tier2, tier3): │
|
||||
│ For each article (1 to N): │
|
||||
│ ┌──────────────────────────────────┐ │
|
||||
│ │ 1. Generate title │ │
|
||||
│ │ 2. Generate outline │ │
|
||||
│ │ 3. Generate content │ │
|
||||
│ │ 4. Augment if too short │ │
|
||||
│ │ 5. Save to database │ │
|
||||
│ └──────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ✨ NEW: After all articles in tier generated ✨ │
|
||||
│ ┌──────────────────────────────────┐ │
|
||||
│ │ 6. Assign sites (Story 3.1) │ ← MISSING │
|
||||
│ │ 7. Generate URLs (Story 3.1) │ ← MISSING │
|
||||
│ │ 8. Find tiered links (3.2) │ ← MISSING │
|
||||
│ │ 9. Inject interlinks (3.3) │ ← MISSING │
|
||||
│ │ 10. Apply templates │ ← MISSING │
|
||||
│ └──────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
Result in database:
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ generated_content table: │
|
||||
│ ✅ Final HTML with all links injected │
|
||||
│ ✅ site_deployment_id assigned │
|
||||
│ ✅ Final URL generated │
|
||||
│ ✅ formatted_html with template applied │
|
||||
│ │
|
||||
│ article_links table: │
|
||||
│ ✅ Tiered links (T1→money site, T2→T1) │
|
||||
│ ✅ Homepage links (all→/index.html) │
|
||||
│ ✅ See Also links (all→all in batch) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## The Gap in Code
|
||||
|
||||
### Current Code Structure
|
||||
|
||||
```python
|
||||
# src/generation/batch_processor.py
|
||||
|
||||
class BatchProcessor:
|
||||
def _process_tier(self, project_id, tier_name, tier_config, ...):
|
||||
"""Process all articles for a tier"""
|
||||
|
||||
# Generate each article
|
||||
for article_num in range(1, tier_config.count + 1):
|
||||
self._generate_single_article(...)
|
||||
self.stats["generated_articles"] += 1
|
||||
|
||||
# ⚠️ Method ends here!
|
||||
# Nothing happens after article generation
|
||||
```
|
||||
|
||||
### What Needs to Be Added
|
||||
|
||||
```python
|
||||
# src/generation/batch_processor.py
|
||||
|
||||
class BatchProcessor:
|
||||
def _process_tier(self, project_id, tier_name, tier_config, ...):
|
||||
"""Process all articles for a tier"""
|
||||
|
||||
# Generate each article
|
||||
for article_num in range(1, tier_config.count + 1):
|
||||
self._generate_single_article(...)
|
||||
self.stats["generated_articles"] += 1
|
||||
|
||||
# ✨ NEW: Post-processing
|
||||
click.echo(f" {tier_name}: Post-processing {tier_config.count} articles...")
|
||||
self._post_process_tier(project_id, tier_name, job, debug)
|
||||
|
||||
def _post_process_tier(self, project_id, tier_name, job, debug):
|
||||
"""Apply URL generation, interlinking, and templating"""
|
||||
|
||||
# Get all articles for this tier
|
||||
content_records = self.content_repo.get_by_project_and_tier(
|
||||
project_id, tier_name, status=["generated", "augmented"]
|
||||
)
|
||||
|
||||
if not content_records:
|
||||
click.echo(f" No articles to post-process")
|
||||
return
|
||||
|
||||
project = self.project_repo.get_by_id(project_id)
|
||||
|
||||
# Step 1: Assign sites (Story 3.1)
|
||||
# (Site assignment might already be done via deployment_targets)
|
||||
|
||||
# Step 2: Generate URLs (Story 3.1)
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
click.echo(f" Generating URLs...")
|
||||
article_urls = generate_urls_for_batch(content_records, self.site_deployment_repo)
|
||||
|
||||
# Step 3: Find tiered links (Story 3.2)
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
click.echo(f" Finding tiered links...")
|
||||
tiered_links = find_tiered_links(
|
||||
content_records, job, self.project_repo,
|
||||
self.content_repo, self.site_deployment_repo
|
||||
)
|
||||
|
||||
# Step 4: Inject interlinks (Story 3.3)
|
||||
from src.interlinking.content_injection import inject_interlinks
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
click.echo(f" Injecting interlinks...")
|
||||
|
||||
session = self.content_repo.session # Use same session
|
||||
link_repo = ArticleLinkRepository(session)
|
||||
inject_interlinks(
|
||||
content_records, article_urls, tiered_links,
|
||||
project, job, self.content_repo, link_repo
|
||||
)
|
||||
|
||||
# Step 5: Apply templates
|
||||
click.echo(f" Applying templates...")
|
||||
for content in content_records:
|
||||
self.generator.apply_template(content.id)
|
||||
|
||||
click.echo(f" Post-processing complete: {len(content_records)} articles ready")
|
||||
```
|
||||
|
||||
## Files That Need Changes
|
||||
|
||||
```
|
||||
src/generation/batch_processor.py
|
||||
├─ Add imports at top
|
||||
├─ Add call to _post_process_tier() in _process_tier()
|
||||
└─ Add new method _post_process_tier()
|
||||
|
||||
src/database/repositories.py
|
||||
└─ May need to add get_by_project_and_tier() if it doesn't exist
|
||||
```
|
||||
|
||||
## Why Tests Still Pass
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Unit Tests │
|
||||
│ ✅ Test inject_interlinks() directly │
|
||||
│ ✅ Test find_tiered_links() directly │
|
||||
│ ✅ Test generate_urls_for_batch() │
|
||||
│ │
|
||||
│ These call the functions directly, │
|
||||
│ so they work perfectly! │
|
||||
└─────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Integration Tests │
|
||||
│ ✅ Create test database │
|
||||
│ ✅ Call functions in sequence │
|
||||
│ ✅ Verify results │
|
||||
│ │
|
||||
│ These simulate the workflow manually, │
|
||||
│ so they work perfectly! │
|
||||
└─────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Real CLI Usage │
|
||||
│ ✅ Generates articles │
|
||||
│ ❌ Never calls Story 3.1-3.3 functions │
|
||||
│ ❌ Articles incomplete │
|
||||
│ │
|
||||
│ This is missing the integration! │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**The Analogy**:
|
||||
|
||||
Imagine you built a perfect car engine:
|
||||
- All parts work perfectly ✅
|
||||
- Each part tested individually ✅
|
||||
- Each part fits together ✅
|
||||
|
||||
But you never **installed it in the car** ❌
|
||||
|
||||
That's the current state:
|
||||
- Story 3.3 functions work perfectly
|
||||
- Tests prove it works
|
||||
- But the CLI never calls them
|
||||
- So users get articles with no links
|
||||
|
||||
**The Fix**: Install the engine (add 50 lines to BatchProcessor)
|
||||
|
||||
**Time**: 30-60 minutes
|
||||
|
||||
**Priority**: High (if deploying), Medium (if still developing)
|
||||
|
||||
|
|
@ -1,219 +0,0 @@
|
|||
# Job Configuration Field Reference
|
||||
|
||||
## Quick Field List
|
||||
|
||||
### Job Level (applies to all tiers)
|
||||
```
|
||||
project_id - Required, integer
|
||||
deployment_targets - Array of domain strings
|
||||
tier1_preferred_sites - Array of domain strings (subset of deployment_targets)
|
||||
auto_create_sites - Boolean (NOT IMPLEMENTED - parsed but doesn't work)
|
||||
create_sites_for_keywords - Array of {keyword, count} objects (NOT IMPLEMENTED - parsed but doesn't work)
|
||||
models - {title, outline, content} with model strings
|
||||
tiered_link_count_range - {min, max} integers
|
||||
anchor_text_config - {mode, custom_text, tier1, tier2, tier3, tier4_plus}
|
||||
- For "explicit" mode, use tier-specific arrays (tier1, tier2, etc.) instead of custom_text
|
||||
failure_config - {max_consecutive_failures, skip_on_failure}
|
||||
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max}
|
||||
tiers - Required, object with tier1/tier2/tier3
|
||||
```
|
||||
|
||||
### Tier Level (per tier configuration)
|
||||
```
|
||||
count - Required, integer (number of articles)
|
||||
min_word_count - Integer
|
||||
max_word_count - Integer
|
||||
min_h2_tags - Integer
|
||||
max_h2_tags - Integer
|
||||
min_h3_tags - Integer
|
||||
max_h3_tags - Integer
|
||||
models - {title, outline, content} - overrides job-level
|
||||
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max} - overrides job-level
|
||||
anchor_text_config - {mode, custom_text, terms} - overrides job-level for this tier only
|
||||
- For "explicit" mode, use "terms" array instead of "custom_text"
|
||||
```
|
||||
|
||||
## Field Behaviors
|
||||
|
||||
**deployment_targets**: Sites to deploy to (round-robin distribution)
|
||||
|
||||
**tier1_preferred_sites**: If set, tier1 only uses these sites
|
||||
|
||||
**models**: Use format "provider/model-name" (e.g., "openai/gpt-4o-mini")
|
||||
|
||||
**anchor_text_config**: Can be set at job-level (all tiers) or tier-level (specific tier)
|
||||
- "default" = Use master.config.json tier rules
|
||||
- "override" = Replace with custom_text
|
||||
- "append" = Add custom_text to tier rules
|
||||
- "explicit" = Use only explicitly provided terms (no algorithm-generated terms)
|
||||
- Job-level: Provide tier1, tier2, tier3, tier4_plus arrays with terms
|
||||
- Tier-level: Provide terms array for that specific tier
|
||||
- Tier-level config overrides job-level config for that tier
|
||||
|
||||
**tiered_link_count_range**: How many links to lower tier
|
||||
- Tier1: Always 1 link to money site (this setting ignored)
|
||||
- Tier2+: Random between min and max links to lower tier
|
||||
|
||||
**interlinking.links_per_article_min/max**: Same as tiered_link_count_range
|
||||
|
||||
**interlinking.see_also_min/max**: How many See Also links (default 4-5)
|
||||
- Randomly selects this many articles from same tier for See Also section
|
||||
|
||||
## Defaults
|
||||
|
||||
If not specified, these defaults apply:
|
||||
|
||||
### Tier1 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
}
|
||||
```
|
||||
|
||||
### Tier2 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 1100,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
}
|
||||
```
|
||||
|
||||
### Tier3 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 850,
|
||||
"max_word_count": 1350,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
```
|
||||
|
||||
## Minimal Working Example
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["example.com"],
|
||||
"tiers": {
|
||||
"tier1": {"count": 5},
|
||||
"tier2": {"count": 20}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Your Current Example
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["getcnc.info", "www.textbullseye.com"],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
},
|
||||
"tier2": {
|
||||
"count": 20,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "openai/gpt-4o-mini"
|
||||
},
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Result Behavior
|
||||
|
||||
**Tier 1 Articles (5):**
|
||||
- 1 link to money site
|
||||
- 4 See Also links to other tier1 articles
|
||||
- Home link in nav menu
|
||||
|
||||
**Tier 2 Articles (20):**
|
||||
- 2-4 links to random tier1 articles
|
||||
- 19 See Also links to other tier2 articles
|
||||
- Home link in nav menu
|
||||
|
||||
**Anchor Text:**
|
||||
- Tier1: Uses main_keyword from project
|
||||
- Tier2: Uses related_searches from project
|
||||
- Can override with anchor_text_config
|
||||
|
||||
## Explicit Anchor Text Example
|
||||
|
||||
Use "explicit" mode to specify exact anchor text terms for each tier:
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 26,
|
||||
"anchor_text_config": {
|
||||
"mode": "explicit",
|
||||
"tier1": ["high volume", "precision machining", "custom manufacturing"],
|
||||
"tier2": ["high volume production", "bulk manufacturing", "large scale"]
|
||||
},
|
||||
"tiers": {
|
||||
"tier1": {"count": 12},
|
||||
"tier2": {"count": 38}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
Or use tier-level explicit config to override job-level for a specific tier:
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 26,
|
||||
"anchor_text_config": {
|
||||
"mode": "explicit",
|
||||
"tier1": ["high volume", "precision machining"],
|
||||
"tier2": ["bulk manufacturing"]
|
||||
},
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 12,
|
||||
"anchor_text_config": {
|
||||
"mode": "explicit",
|
||||
"terms": ["high volume", "precision"]
|
||||
}
|
||||
},
|
||||
"tier2": {"count": 38}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
When using "explicit" mode, the system will:
|
||||
- Use only the provided terms (no algorithm-generated terms)
|
||||
- Try to find these terms in content first, then insert if not found
|
||||
- Tier-level explicit config takes precedence over job-level for that tier
|
||||
|
||||
|
|
@ -1,473 +0,0 @@
|
|||
# QA Report: Story 3.3 - Content Interlinking Injection
|
||||
|
||||
**Date**: October 21, 2025
|
||||
**Story**: Story 3.3 - Content Interlinking Injection
|
||||
**Status**: PASSED ✓
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Story 3.3 implementation is **PRODUCTION READY**. All 42 tests pass (33 unit + 9 integration), zero linter errors, comprehensive test coverage, and all acceptance criteria met.
|
||||
|
||||
### Test Results
|
||||
- **Unit Tests**: 33/33 PASSED (100%)
|
||||
- **Integration Tests**: 9/9 PASSED (100%)
|
||||
- **Linter Errors**: 0
|
||||
- **Test Execution Time**: ~4.3s total
|
||||
- **Code Coverage**: Comprehensive (all major functions and edge cases tested)
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
### ✓ Core Functionality
|
||||
- [x] **Function Signature**: `inject_interlinks()` takes raw HTML, URLs, tiered links, and project data
|
||||
- [x] **Wheel Links**: "See Also" section with ALL other articles in batch (circular linking)
|
||||
- [x] **Homepage Links**: Links to site homepage (`/index.html`) using "Home" anchor text
|
||||
- [x] **Tiered Links**:
|
||||
- Tier 1: Links to money site using T1 anchor text
|
||||
- Tier 2+: Links to 2-4 random lower-tier articles using appropriate tier anchor text
|
||||
|
||||
### ✓ Input Requirements
|
||||
- [x] Accepts raw HTML content from Epic 2
|
||||
- [x] Accepts article URL list from Story 3.1
|
||||
- [x] Accepts tiered links object from Story 3.2
|
||||
- [x] Accepts project data for anchor text generation
|
||||
- [x] Handles batch tier information correctly
|
||||
|
||||
### ✓ Output Requirements
|
||||
- [x] Generates final HTML with all links injected
|
||||
- [x] Updates content in database via `GeneratedContentRepository`
|
||||
- [x] Records link relationships in `article_links` table
|
||||
- [x] Properly categorizes link types (tiered, homepage, wheel_see_also)
|
||||
|
||||
---
|
||||
|
||||
## Test Coverage Analysis
|
||||
|
||||
### Unit Tests (33 tests)
|
||||
|
||||
#### 1. Homepage URL Extraction (5 tests)
|
||||
- [x] HTTPS URLs
|
||||
- [x] HTTP URLs
|
||||
- [x] CDN URLs (b-cdn.net)
|
||||
- [x] Custom domains (www subdomain)
|
||||
- [x] URLs with port numbers
|
||||
|
||||
#### 2. HTML Insertion (3 tests)
|
||||
- [x] Insert after last paragraph
|
||||
- [x] Insert with body tag present
|
||||
- [x] Insert with no paragraphs (fallback)
|
||||
|
||||
#### 3. Anchor Text Finding & Wrapping (5 tests)
|
||||
- [x] Exact match wrapping
|
||||
- [x] Case-insensitive matching ("Shaft Machining" matches "shaft machining")
|
||||
- [x] Match within phrase
|
||||
- [x] No match scenario
|
||||
- [x] Skip existing links (don't double-link)
|
||||
|
||||
#### 4. Link Insertion Fallback (3 tests)
|
||||
- [x] Insert into single paragraph
|
||||
- [x] Insert with multiple paragraphs
|
||||
- [x] Handle no valid paragraphs
|
||||
|
||||
#### 5. Anchor Text Configuration (4 tests)
|
||||
- [x] Default mode (tier-based)
|
||||
- [x] Override mode (custom anchor text)
|
||||
- [x] Append mode (tier-based + custom)
|
||||
- [x] No config provided
|
||||
|
||||
#### 6. Link Injection Attempts (3 tests)
|
||||
- [x] Successful injection with found anchor
|
||||
- [x] Fallback insertion when anchor not found
|
||||
- [x] Handle empty anchor list
|
||||
|
||||
#### 7. See Also Section (2 tests)
|
||||
- [x] Multiple articles (excludes current article)
|
||||
- [x] Single article (no other articles to link)
|
||||
|
||||
#### 8. Homepage Link Injection (2 tests)
|
||||
- [x] Homepage link when "Home" found in content
|
||||
- [x] Homepage link via fallback insertion
|
||||
|
||||
#### 9. Tiered Link Injection (3 tests)
|
||||
- [x] Tier 1: Money site link
|
||||
- [x] Tier 2+: Lower tier article links
|
||||
- [x] Tier 1: Missing money site (error handling)
|
||||
|
||||
#### 10. Main Function Tests (3 tests)
|
||||
- [x] Empty content records (graceful handling)
|
||||
- [x] Successful injection flow
|
||||
- [x] Missing URL for content (skip with warning)
|
||||
|
||||
### Integration Tests (9 tests)
|
||||
|
||||
#### 1. Tier 1 Content Injection (2 tests)
|
||||
- [x] Full flow: T1 batch with money site links + See Also section
|
||||
- [x] Homepage link injection to `/index.html`
|
||||
|
||||
#### 2. Tier 2 Content Injection (1 test)
|
||||
- [x] T2 articles linking to random T1 articles
|
||||
|
||||
#### 3. Anchor Text Config Overrides (2 tests)
|
||||
- [x] Override mode with custom anchor text
|
||||
- [x] Append mode (defaults + custom)
|
||||
|
||||
#### 4. Different Batch Sizes (2 tests)
|
||||
- [x] Single article batch (no See Also section)
|
||||
- [x] Large batch (20 articles with 19 See Also links each)
|
||||
|
||||
#### 5. Database Link Records (2 tests)
|
||||
- [x] All link types recorded (tiered, homepage, wheel_see_also)
|
||||
- [x] Internal vs external link handling (to_content_id vs to_url)
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Metrics
|
||||
|
||||
### Implementation Files
|
||||
- **Main Module**: `src/interlinking/content_injection.py` (410 lines)
|
||||
- **Test Files**:
|
||||
- `tests/unit/test_content_injection.py` (363 lines, 33 tests)
|
||||
- `tests/integration/test_content_injection_integration.py` (469 lines, 9 tests)
|
||||
|
||||
### Code Quality
|
||||
- **Linter Status**: Zero errors
|
||||
- **Function Modularity**: Well-structured with 9+ helper functions
|
||||
- **Error Handling**: Comprehensive try-catch blocks with logging
|
||||
- **Documentation**: All functions have docstrings
|
||||
- **Type Hints**: Proper typing throughout
|
||||
|
||||
### Dependencies
|
||||
- **BeautifulSoup4**: HTML parsing (safe, handles malformed HTML)
|
||||
- **Story 3.1**: URL generation integration ✓
|
||||
- **Story 3.2**: Tiered link finding integration ✓
|
||||
- **Anchor Text Generator**: Tier-based anchor text with config overrides ✓
|
||||
|
||||
---
|
||||
|
||||
## Feature Validation
|
||||
|
||||
### 1. Tiered Links
|
||||
**Status**: PASSED ✓
|
||||
|
||||
**Behavior**:
|
||||
- Tier 1 articles link to money site URL
|
||||
- Tier 2+ articles link to 2-4 random lower-tier articles
|
||||
- Uses tier-appropriate anchor text
|
||||
- Supports job config overrides (default/override/append modes)
|
||||
- Case-insensitive anchor text matching
|
||||
- Links first occurrence only
|
||||
|
||||
**Test Evidence**:
|
||||
```
|
||||
test_tier1_money_site_link PASSED
|
||||
test_tier2_lower_tier_links PASSED
|
||||
test_tier1_batch_with_money_site_links PASSED
|
||||
test_tier2_links_to_tier1 PASSED
|
||||
```
|
||||
|
||||
### 2. Homepage Links
|
||||
**Status**: PASSED ✓
|
||||
|
||||
**Behavior**:
|
||||
- All articles link to `/index.html` on their domain
|
||||
- Uses "Home" as anchor text
|
||||
- Searches for "Home" in content or inserts via fallback
|
||||
- Properly extracts homepage URL from article URL
|
||||
|
||||
**Test Evidence**:
|
||||
```
|
||||
test_inject_homepage_link PASSED
|
||||
test_inject_homepage_link_not_found_in_content PASSED
|
||||
test_tier1_with_homepage_links PASSED
|
||||
test_extract_from_https_url PASSED (and 4 more URL extraction tests)
|
||||
```
|
||||
|
||||
### 3. See Also Section
|
||||
**Status**: PASSED ✓
|
||||
|
||||
**Behavior**:
|
||||
- Links to ALL other articles in batch (excludes current article)
|
||||
- Formatted as `<h3>See Also</h3>` + `<ul>` list
|
||||
- Inserted after last `</p>` tag
|
||||
- Each link uses article title as anchor text
|
||||
- Creates internal links (`to_content_id`)
|
||||
|
||||
**Test Evidence**:
|
||||
```
|
||||
test_inject_see_also_with_multiple_articles PASSED
|
||||
test_inject_see_also_with_single_article PASSED
|
||||
test_large_batch PASSED (20 articles, 19 See Also links each)
|
||||
```
|
||||
|
||||
### 4. Anchor Text Configuration
|
||||
**Status**: PASSED ✓
|
||||
|
||||
**Behavior**:
|
||||
- **Default mode**: Uses tier-based anchor text
|
||||
- T1: Main keyword variations
|
||||
- T2: Related searches
|
||||
- T3: Main keyword variations
|
||||
- T4+: Entities
|
||||
- **Override mode**: Replaces tier-based with custom text
|
||||
- **Append mode**: Adds custom text to tier-based defaults
|
||||
|
||||
**Test Evidence**:
|
||||
```
|
||||
test_default_mode PASSED
|
||||
test_override_mode PASSED (unit + integration)
|
||||
test_append_mode PASSED (unit + integration)
|
||||
```
|
||||
|
||||
### 5. Database Integration
|
||||
**Status**: PASSED ✓
|
||||
|
||||
**Behavior**:
|
||||
- Updates `generated_content.content` with final HTML
|
||||
- Creates `ArticleLink` records for all links
|
||||
- Correctly categorizes link types:
|
||||
- `tiered`: Money site or lower-tier links
|
||||
- `homepage`: Homepage links
|
||||
- `wheel_see_also`: See Also section links
|
||||
- Handles internal (to_content_id) vs external (to_url) links
|
||||
|
||||
**Test Evidence**:
|
||||
```
|
||||
test_all_link_types_recorded PASSED
|
||||
test_internal_vs_external_links PASSED
|
||||
test_tier1_batch_with_money_site_links PASSED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template Integration
|
||||
|
||||
**Status**: PASSED ✓
|
||||
|
||||
All 4 HTML templates updated with navigation menu:
|
||||
- `src/templating/templates/basic.html` ✓
|
||||
- `src/templating/templates/modern.html` ✓
|
||||
- `src/templating/templates/classic.html` ✓
|
||||
- `src/templating/templates/minimal.html` ✓
|
||||
|
||||
**Navigation Structure**:
|
||||
```html
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="/index.html">Home</a></li>
|
||||
<li><a href="about.html">About</a></li>
|
||||
<li><a href="privacy.html">Privacy</a></li>
|
||||
<li><a href="contact.html">Contact</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
```
|
||||
|
||||
Each template has custom styling matching its theme.
|
||||
|
||||
---
|
||||
|
||||
## Edge Cases & Error Handling
|
||||
|
||||
### Tested Edge Cases
|
||||
- [x] Empty content records (graceful skip)
|
||||
- [x] Single article batch (no See Also section)
|
||||
- [x] Large batch (20+ articles)
|
||||
- [x] Missing URL for content (skip with warning)
|
||||
- [x] Missing money site URL (skip with error)
|
||||
- [x] No valid paragraphs for fallback insertion
|
||||
- [x] Anchor text not found in content (fallback insertion)
|
||||
- [x] Existing links in content (skip, don't double-link)
|
||||
- [x] Malformed HTML (BeautifulSoup handles gracefully)
|
||||
|
||||
### Error Handling Verification
|
||||
```python
|
||||
# Test evidence:
|
||||
test_empty_content_records PASSED
|
||||
test_missing_url_for_content PASSED
|
||||
test_tier1_no_money_site PASSED
|
||||
test_no_valid_paragraphs PASSED
|
||||
test_no_anchors PASSED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Test Execution Times
|
||||
- **Unit Tests**: ~1.66s (33 tests)
|
||||
- **Integration Tests**: ~2.40s (9 tests)
|
||||
- **Total**: ~4.3s for complete test suite
|
||||
|
||||
### Database Operations
|
||||
- Efficient batch processing
|
||||
- Single transaction per article update
|
||||
- Bulk link creation
|
||||
- No N+1 query issues observed
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Limitations
|
||||
|
||||
### None Critical
|
||||
All known limitations are by design:
|
||||
|
||||
1. **First Occurrence Only**: Only links first occurrence of anchor text
|
||||
- **Why**: Prevents over-optimization and keyword stuffing
|
||||
- **Status**: Working as intended
|
||||
|
||||
2. **Random Lower-Tier Selection**: T2+ articles randomly select 2-4 lower-tier links
|
||||
- **Why**: Natural link distribution
|
||||
- **Status**: Working as intended
|
||||
|
||||
3. **Fallback Insertion**: If anchor text not found, inserts at random position
|
||||
- **Why**: Ensures link injection even if anchor text not naturally in content
|
||||
- **Status**: Working as intended
|
||||
|
||||
---
|
||||
|
||||
## Regression Testing
|
||||
|
||||
### Dependencies Verified
|
||||
- [x] Story 3.1 (URL Generation): Integration tests pass
|
||||
- [x] Story 3.2 (Tiered Links): Integration tests pass
|
||||
- [x] Story 2.x (Content Generation): No regressions
|
||||
- [x] Database Models: No schema issues
|
||||
- [x] Templates: All 4 templates render correctly
|
||||
|
||||
### No Breaking Changes
|
||||
- All existing tests still pass (42/42)
|
||||
- No API changes to public functions
|
||||
- Backward compatible with existing job configs
|
||||
|
||||
---
|
||||
|
||||
## Production Readiness Checklist
|
||||
|
||||
- [x] **All Tests Pass**: 42/42 (100%)
|
||||
- [x] **Zero Linter Errors**: Clean code
|
||||
- [x] **Comprehensive Test Coverage**: Unit + integration
|
||||
- [x] **Error Handling**: Graceful degradation
|
||||
- [x] **Documentation**: Complete implementation summary
|
||||
- [x] **Database Integration**: All CRUD operations tested
|
||||
- [x] **Edge Cases**: Thoroughly tested
|
||||
- [x] **Performance**: Sub-5s test execution
|
||||
- [x] **Type Safety**: Full type hints
|
||||
- [x] **Logging**: Comprehensive logging at all levels
|
||||
- [x] **Template Updates**: All 4 templates updated
|
||||
|
||||
---
|
||||
|
||||
## Integration Status
|
||||
|
||||
### Current State
|
||||
Story 3.3 functions are **implemented and tested** but **NOT YET INTEGRATED** into the main CLI workflow.
|
||||
|
||||
**Evidence**:
|
||||
- `generate-batch` command in `src/cli/commands.py` uses `BatchProcessor`
|
||||
- `BatchProcessor` generates content but does NOT call:
|
||||
- `generate_urls_for_batch()` (Story 3.1)
|
||||
- `find_tiered_links()` (Story 3.2)
|
||||
- `inject_interlinks()` (Story 3.3)
|
||||
|
||||
**Impact**:
|
||||
- Functions work perfectly in isolation (as proven by tests)
|
||||
- Need integration into batch generation workflow
|
||||
- Likely will be integrated in Story 4.x (deployment)
|
||||
|
||||
### Integration Points Needed
|
||||
```python
|
||||
# After batch generation completes, need to add:
|
||||
# 1. Assign sites to articles (Story 3.1)
|
||||
assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword)
|
||||
|
||||
# 2. Generate URLs (Story 3.1)
|
||||
article_urls = generate_urls_for_batch(content_records, site_repo)
|
||||
|
||||
# 3. Find tiered links (Story 3.2)
|
||||
tiered_links = find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
|
||||
|
||||
# 4. Inject interlinks (Story 3.3)
|
||||
inject_interlinks(content_records, article_urls, tiered_links, project, job_config, content_repo, link_repo)
|
||||
|
||||
# 5. Apply templates (existing)
|
||||
for content in content_records:
|
||||
content_generator.apply_template(content.id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Ready for Production
|
||||
Story 3.3 is **APPROVED** for production deployment with one caveat:
|
||||
|
||||
**Caveat**: Requires CLI integration in batch generation workflow (likely Story 4.x scope)
|
||||
|
||||
### Next Steps
|
||||
1. **CRITICAL**: Integrate Story 3.1-3.3 into `generate-batch` CLI command
|
||||
- Add calls after content generation completes
|
||||
- Add error handling for integration failures
|
||||
- Add CLI output for URL/link generation progress
|
||||
2. **Story 4.x**: Deployment (can now use final HTML with all links)
|
||||
3. **Future Analytics**: Can leverage `article_links` table for link analysis
|
||||
4. **Future Pages**: Create About, Privacy, Contact pages to match nav menu
|
||||
|
||||
### Optional Enhancements (Low Priority)
|
||||
1. **Link Density Control**: Add configurable max links per article
|
||||
2. **Custom See Also Heading**: Make "See Also" heading configurable
|
||||
3. **Link Position Strategy**: Add preference for link placement (intro/body/conclusion)
|
||||
4. **Anchor Text Variety**: Add more sophisticated anchor text rotation
|
||||
|
||||
---
|
||||
|
||||
## Sign-Off
|
||||
|
||||
**QA Status**: PASSED ✓
|
||||
**Approved By**: AI Code Review Assistant
|
||||
**Date**: October 21, 2025
|
||||
|
||||
**Summary**: Story 3.3 implementation exceeds quality standards with 100% test pass rate, zero defects, comprehensive edge case handling, and production-ready code quality.
|
||||
|
||||
**Recommendation**: APPROVE FOR DEPLOYMENT
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Test Output
|
||||
|
||||
### Full Test Suite Execution
|
||||
```
|
||||
===== test session starts =====
|
||||
platform win32 -- Python 3.13.3, pytest-8.4.2
|
||||
collected 42 items
|
||||
|
||||
tests/unit/test_content_injection.py::TestExtractHomepageUrl PASSED [5/5]
|
||||
tests/unit/test_content_injection.py::TestInsertBeforeClosingTags PASSED [3/3]
|
||||
tests/unit/test_content_injection.py::TestFindAndWrapAnchorText PASSED [5/5]
|
||||
tests/unit/test_content_injection.py::TestInsertLinkIntoRandomParagraph PASSED [3/3]
|
||||
tests/unit/test_content_injection.py::TestGetAnchorTextsForTier PASSED [4/4]
|
||||
tests/unit/test_content_injection.py::TestTryInjectLink PASSED [3/3]
|
||||
tests/unit/test_content_injection.py::TestInjectSeeAlsoSection PASSED [2/2]
|
||||
tests/unit/test_content_injection.py::TestInjectHomepageLink PASSED [2/2]
|
||||
tests/unit/test_content_injection.py::TestInjectTieredLinks PASSED [3/3]
|
||||
tests/unit/test_content_injection.py::TestInjectInterlinks PASSED [3/3]
|
||||
|
||||
tests/integration/test_content_injection_integration.py::TestTier1ContentInjection PASSED [2/2]
|
||||
tests/integration/test_content_injection_integration.py::TestTier2ContentInjection PASSED [1/1]
|
||||
tests/integration/test_content_injection_integration.py::TestAnchorTextConfigOverrides PASSED [2/2]
|
||||
tests/integration/test_content_injection_integration.py::TestDifferentBatchSizes PASSED [2/2]
|
||||
tests/integration/test_content_injection_integration.py::TestLinkDatabaseRecords PASSED [2/2]
|
||||
|
||||
===== 42 passed in 2.64s =====
|
||||
```
|
||||
|
||||
### Linter Output
|
||||
```
|
||||
No linter errors found.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*End of QA Report*
|
||||
|
||||
|
|
@ -1,283 +0,0 @@
|
|||
# QA Report: Story 3.4 - Generate Boilerplate Site Pages
|
||||
|
||||
## QA Summary
|
||||
**Date:** October 22, 2025
|
||||
**Story:** Story 3.4 - Generate Boilerplate Site Pages
|
||||
**Status:** PASSED - Ready for Production
|
||||
**QA Engineer:** AI Assistant
|
||||
|
||||
## Executive Summary
|
||||
Story 3.4 implementation has been thoroughly tested and meets all acceptance criteria. All 37 tests pass successfully, database migration is complete, and the implementation follows the design specifications. The feature generates boilerplate pages (about, contact, privacy) for sites with proper template integration and database persistence.
|
||||
|
||||
## Test Results
|
||||
|
||||
### Unit Tests
|
||||
**Status:** PASSED
|
||||
**Tests Run:** 26
|
||||
**Passed:** 26
|
||||
**Failed:** 0
|
||||
**Coverage:** >80% on new modules
|
||||
|
||||
#### Test Breakdown
|
||||
1. **test_page_templates.py** (6 tests) - PASSED
|
||||
- Content generation for all page types (about, contact, privacy)
|
||||
- Unknown page type handling
|
||||
- HTML structure validation
|
||||
- Domain parameter handling
|
||||
|
||||
2. **test_site_page_generator.py** (9 tests) - PASSED
|
||||
- Domain extraction from custom and b-cdn hostnames
|
||||
- Page generation with different templates
|
||||
- Skipping existing pages
|
||||
- Default template fallback
|
||||
- Content structure validation
|
||||
- Page titles
|
||||
- Error handling
|
||||
|
||||
3. **test_site_page_repository.py** (11 tests) - PASSED
|
||||
- CRUD operations
|
||||
- Duplicate page prevention
|
||||
- Update and delete operations
|
||||
- Exists checks
|
||||
- Not found scenarios
|
||||
|
||||
### Integration Tests
|
||||
**Status:** PASSED
|
||||
**Tests Run:** 11
|
||||
**Passed:** 11
|
||||
**Failed:** 0
|
||||
|
||||
#### Test Coverage
|
||||
- Full flow: site creation → page generation → database storage
|
||||
- Template application (basic, modern, classic, minimal)
|
||||
- Duplicate prevention via unique constraint
|
||||
- Multiple sites with separate pages
|
||||
- Custom domain handling
|
||||
- Page retrieval by type
|
||||
- Page existence checks
|
||||
|
||||
### Database Migration
|
||||
**Status:** PASSED
|
||||
|
||||
#### Migration Verification
|
||||
- `site_pages` table exists ✓
|
||||
- All required columns present:
|
||||
- `id` ✓
|
||||
- `site_deployment_id` ✓
|
||||
- `page_type` ✓
|
||||
- `content` ✓
|
||||
- `created_at` ✓
|
||||
- `updated_at` ✓
|
||||
- Indexes created:
|
||||
- `idx_site_pages_site` ✓
|
||||
- `idx_site_pages_type` ✓
|
||||
- Foreign key constraint: CASCADE delete ✓
|
||||
- Unique constraint: `(site_deployment_id, page_type)` ✓
|
||||
|
||||
### Linter Checks
|
||||
**Status:** PASSED
|
||||
No linter errors found in:
|
||||
- `src/generation/site_page_generator.py`
|
||||
- `src/generation/page_templates.py`
|
||||
- `scripts/backfill_site_pages.py`
|
||||
- `src/generation/site_provisioning.py`
|
||||
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
### Core Functionality
|
||||
- [x] Function generates three boilerplate pages for a given site
|
||||
- [x] Pages created AFTER articles but BEFORE deployment
|
||||
- [x] Each page uses same template as articles for that site
|
||||
- [x] Pages stored in database for deployment
|
||||
- [x] Pages associated with correct site via `site_deployment_id`
|
||||
|
||||
### Page Content Requirements
|
||||
- [x] About page: Empty with heading only `<h1>About Us</h1>`
|
||||
- [x] Contact page: Empty with heading only `<h1>Contact</h1>`
|
||||
- [x] Privacy page: Empty with heading only `<h1>Privacy Policy</h1>`
|
||||
- [x] All pages use template structure with navigation
|
||||
|
||||
### Template Integration
|
||||
- [x] Uses same template engine as article content
|
||||
- [x] Reads template from `site.template_name` field
|
||||
- [x] Pages use same template as articles on same site
|
||||
- [x] Includes navigation menu
|
||||
|
||||
### Database Storage
|
||||
- [x] `site_pages` table with proper schema
|
||||
- [x] Foreign key to `site_deployments` with CASCADE delete
|
||||
- [x] Unique constraint on `(site_deployment_id, page_type)`
|
||||
- [x] Indexes on `site_deployment_id` and `page_type`
|
||||
- [x] Each site can have one of each page type
|
||||
|
||||
### URL Generation
|
||||
- [x] Pages use simple filenames: `about.html`, `contact.html`, `privacy.html`
|
||||
- [x] Full URLs: `https://{hostname}/about.html`
|
||||
- [x] No slug generation needed
|
||||
|
||||
### Integration Point
|
||||
- [x] Integrated with site provisioning
|
||||
- [x] Generates pages when new sites are created
|
||||
- [x] Graceful error handling (doesn't break site creation)
|
||||
- [x] Backfill script for existing sites
|
||||
|
||||
### Two Use Cases
|
||||
- [x] One-time backfill: Script available with dry-run mode
|
||||
- [x] Ongoing generation: Auto-generates for new sites during provisioning
|
||||
|
||||
## Code Quality Assessment
|
||||
|
||||
### Design Patterns
|
||||
- **Repository Pattern:** Properly implemented with ISitePageRepository interface
|
||||
- **Separation of Concerns:** Clean separation between page content, generation, and persistence
|
||||
- **Dependency Injection:** Optional parameters for backward compatibility
|
||||
- **Error Handling:** Graceful degradation with proper logging
|
||||
|
||||
### Code Organization
|
||||
- **Modularity:** New functionality in separate modules
|
||||
- **Naming:** Clear, descriptive function and variable names
|
||||
- **Documentation:** Comprehensive docstrings on all functions
|
||||
- **Type Hints:** Proper type annotations throughout
|
||||
|
||||
### Best Practices
|
||||
- **DRY Principle:** Reusable helper functions (`get_domain_from_site`)
|
||||
- **Single Responsibility:** Each module has clear purpose
|
||||
- **Testability:** All functions easily testable (demonstrated by 37 passing tests)
|
||||
- **Logging:** Appropriate INFO/WARNING/ERROR levels
|
||||
|
||||
## Integration Verification
|
||||
|
||||
### Site Provisioning Integration
|
||||
- [x] `create_bunnynet_site()` accepts optional parameters
|
||||
- [x] `provision_keyword_sites()` passes parameters through
|
||||
- [x] `create_generic_sites()` passes parameters through
|
||||
- [x] Backward compatible (optional parameters)
|
||||
- [x] Pages generated automatically after site creation
|
||||
|
||||
### Backfill Script
|
||||
- [x] Admin authentication required
|
||||
- [x] Dry-run mode available
|
||||
- [x] Progress reporting
|
||||
- [x] Batch processing support
|
||||
- [x] Error handling for individual failures
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Resource Usage
|
||||
- Page generation adds ~1-2 seconds per site (3 pages × template application)
|
||||
- Database operations optimized with indexes
|
||||
- Unique constraint prevents duplicate work
|
||||
- Minimal impact on batch processing (only for new sites)
|
||||
|
||||
### Scalability
|
||||
- Can handle backfilling hundreds of sites
|
||||
- Batch processing with progress checkpoints
|
||||
- Individual site failures don't stop entire process
|
||||
|
||||
## Known Issues
|
||||
**None identified**
|
||||
|
||||
## Warnings/Notes
|
||||
|
||||
### Deprecation Warnings
|
||||
- SQLAlchemy emits 96 deprecation warnings about `datetime.utcnow()`
|
||||
- **Impact:** Low - This is a SQLAlchemy internal issue, not related to Story 3.4
|
||||
- **Recommendation:** Update SQLAlchemy or adjust datetime usage in future sprint
|
||||
|
||||
### Unrelated Test Failures
|
||||
- Some tests in other modules have import errors (ContentGenerationService, ContentRuleEngine)
|
||||
- **Impact:** None on Story 3.4 functionality
|
||||
- **Recommendation:** Address in separate ticket
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate Actions
|
||||
1. **Update Story Status:** Change from "Awaiting QA" to "Complete"
|
||||
2. **Commit Changes:** All modified files are working correctly
|
||||
3. **Documentation:** Implementation summary is accurate and complete
|
||||
|
||||
### Future Enhancements (Optional)
|
||||
1. **Homepage Generation:** Add `index.html` generation (deferred to Epic 4)
|
||||
2. **Custom Page Content:** Allow projects to override generic templates
|
||||
3. **Multi-language Support:** Generate pages in different languages
|
||||
4. **CLI Edit Command:** Add command to update page content for specific sites
|
||||
5. **Fix Deprecation Warnings:** Update to `datetime.now(datetime.UTC)`
|
||||
|
||||
### Production Readiness Checklist
|
||||
- [x] All tests passing
|
||||
- [x] Database migration successful
|
||||
- [x] No linter errors
|
||||
- [x] Backward compatible
|
||||
- [x] Error handling implemented
|
||||
- [x] Logging implemented
|
||||
- [x] Documentation complete
|
||||
- [x] Integration verified
|
||||
- [x] Performance acceptable
|
||||
|
||||
## Test Execution Details
|
||||
|
||||
### Command
|
||||
```bash
|
||||
uv run pytest tests/unit/test_site_page_generator.py \
|
||||
tests/unit/test_site_page_repository.py \
|
||||
tests/unit/test_page_templates.py \
|
||||
tests/integration/test_site_page_integration.py -v
|
||||
```
|
||||
|
||||
### Results
|
||||
```
|
||||
37 passed, 96 warnings in 2.23s
|
||||
```
|
||||
|
||||
### Database Verification
|
||||
```bash
|
||||
uv run python scripts/migrate_add_site_pages.py
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
[SUCCESS] Migration completed successfully!
|
||||
```
|
||||
|
||||
## Files Reviewed
|
||||
|
||||
### New Files
|
||||
- `src/generation/site_page_generator.py` - Core generation logic ✓
|
||||
- `src/generation/page_templates.py` - Minimal content templates ✓
|
||||
- `scripts/migrate_add_site_pages.py` - Database migration ✓
|
||||
- `scripts/backfill_site_pages.py` - Backfill script ✓
|
||||
- `tests/unit/test_site_page_generator.py` - Unit tests ✓
|
||||
- `tests/unit/test_site_page_repository.py` - Repository tests ✓
|
||||
- `tests/unit/test_page_templates.py` - Template tests ✓
|
||||
- `tests/integration/test_site_page_integration.py` - Integration tests ✓
|
||||
|
||||
### Modified Files
|
||||
- `src/database/models.py` - Added SitePage model ✓
|
||||
- `src/database/interfaces.py` - Added ISitePageRepository interface ✓
|
||||
- `src/database/repositories.py` - Added SitePageRepository implementation ✓
|
||||
- `src/generation/site_provisioning.py` - Integrated page generation ✓
|
||||
- `src/generation/site_assignment.py` - Pass through parameters ✓
|
||||
- `docs/stories/story-3.4-boilerplate-site-pages.md` - Story documentation ✓
|
||||
- `STORY_3.4_IMPLEMENTATION_SUMMARY.md` - Implementation summary ✓
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Story 3.4 is APPROVED for production.**
|
||||
|
||||
All acceptance criteria have been met, tests are passing, and the implementation is robust, well-documented, and follows best practices. The feature successfully generates boilerplate pages for sites, fixing broken navigation links from Story 3.3.
|
||||
|
||||
The code is:
|
||||
- **Functional:** All features work as designed
|
||||
- **Tested:** 37/37 tests passing
|
||||
- **Maintainable:** Clean code with good documentation
|
||||
- **Scalable:** Can handle hundreds of sites
|
||||
- **Backward Compatible:** Optional parameters don't break existing code
|
||||
|
||||
**Total Effort:** 14 story points (as estimated)
|
||||
**Test Coverage:** 37 tests (26 unit + 11 integration)
|
||||
**Status:** Ready for Epic 4 (Deployment)
|
||||
|
||||
---
|
||||
|
||||
**QA Sign-off:** Story 3.4 is complete and production-ready.
|
||||
|
||||
73
README.md
73
README.md
|
|
@ -406,6 +406,69 @@ uv run python scripts/add_robots_txt_to_buckets.py --provider bunny
|
|||
|
||||
The script is idempotent (safe to run multiple times) and will overwrite existing robots.txt files. It continues processing remaining buckets if one fails and reports all failures at the end.
|
||||
|
||||
### Update Index Pages and Sitemaps
|
||||
|
||||
Automatically generate or update `index.html` and `sitemap.xml` files for all storage buckets (both S3 and Bunny). The script:
|
||||
|
||||
- Lists all HTML files in each bucket's root directory
|
||||
- Extracts titles from `<title>` tags (or formats filenames as fallback)
|
||||
- Generates article listings sorted by most recent modification date
|
||||
- Creates or updates `index.html` with article links in `<div id="article_listing">`
|
||||
- Generates `sitemap.xml` with industry-standard settings (priority, changefreq, lastmod)
|
||||
- Tracks last run timestamps to avoid unnecessary updates
|
||||
- Excludes boilerplate pages: `index.html`, `about.html`, `privacy.html`, `contact.html`
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
# Preview what would be updated (recommended first)
|
||||
uv run python scripts/update_index_pages.py --dry-run
|
||||
|
||||
# Update all buckets
|
||||
uv run python scripts/update_index_pages.py
|
||||
|
||||
# Only process S3 buckets
|
||||
uv run python scripts/update_index_pages.py --provider s3
|
||||
|
||||
# Only process Bunny storage zones
|
||||
uv run python scripts/update_index_pages.py --provider bunny
|
||||
|
||||
# Force update even if no changes detected
|
||||
uv run python scripts/update_index_pages.py --force
|
||||
|
||||
# Test on specific site
|
||||
uv run python scripts/update_index_pages.py --hostname example.com
|
||||
|
||||
# Limit number of sites (useful for testing)
|
||||
uv run python scripts/update_index_pages.py --limit 10
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Queries database for all site deployments
|
||||
2. Lists HTML files in root directory (excludes subdirectories and boilerplate pages)
|
||||
3. Checks if content has changed since last run (unless `--force` is used)
|
||||
4. Downloads and parses HTML files to extract titles
|
||||
5. Generates article listing HTML (sorted by most recent first)
|
||||
6. Creates new `index.html` or updates existing one (inserts into `<div id="article_listing">`)
|
||||
7. Generates `sitemap.xml` with all HTML files and proper metadata
|
||||
8. Uploads both files to the bucket
|
||||
9. Saves state to `.update_index_state.json` for tracking
|
||||
|
||||
**Sitemap standards:**
|
||||
- Priority: `1.0` for homepage (`index.html`), `0.8` for other pages
|
||||
- Change frequency: `weekly` for all pages
|
||||
- Last modified dates from file metadata
|
||||
- Includes all HTML files in root directory
|
||||
|
||||
**Customizing article listing HTML:**
|
||||
|
||||
The article listing format can be easily customized by editing the `generate_article_listing_html()` function in `scripts/update_index_pages.py`. The function includes detailed documentation and examples for common variations (cards, dates, descriptions, etc.).
|
||||
|
||||
**State tracking:**
|
||||
|
||||
The script maintains state in `scripts/.update_index_state.json` to track when each site was last updated. This prevents unnecessary regeneration when content hasn't changed. Use `--force` to bypass this check.
|
||||
|
||||
### Check Last Generated Content
|
||||
```bash
|
||||
uv run python check_last_gen.py
|
||||
|
|
@ -644,10 +707,12 @@ Verify `storage_zone_password` in database (set during site provisioning)
|
|||
## Documentation
|
||||
|
||||
- **CLI Command Reference**: `docs/CLI_COMMAND_REFERENCE.md` - Comprehensive documentation for all CLI commands
|
||||
- Product Requirements: `docs/prd.md`
|
||||
- Architecture: `docs/architecture/`
|
||||
- Implementation Summaries: `STORY_*.md` files
|
||||
- Quick Start Guides: `*_QUICKSTART.md` files
|
||||
- **Job Configuration Schema**: `docs/job-schema.md` - Complete reference for job configuration files
|
||||
- **Product Requirements**: `docs/prd.md` - Product requirements and epics
|
||||
- **Architecture**: `docs/architecture/` - System architecture documentation
|
||||
- **Story Specifications**: `docs/stories/` - Current story specifications
|
||||
- **Technical Debt**: `docs/technical-debt.md` - Known technical debt items
|
||||
- **Historical Documentation**: `docs/archive/` - Archived implementation summaries, QA reports, and analysis documents
|
||||
|
||||
### Regenerating CLI Documentation
|
||||
|
||||
|
|
|
|||
|
|
@ -1,142 +0,0 @@
|
|||
# Story 2.5: Deployment Target Assignment - Implementation Summary
|
||||
|
||||
## Status
|
||||
**COMPLETED** - All acceptance criteria met, 100% test coverage
|
||||
|
||||
## Overview
|
||||
Implemented deployment target assignment functionality that allows job configurations to specify which generated tier1 articles should be assigned to specific sites. **Only tier1 articles can be assigned to deployment targets** - tier2/tier3 always get `site_deployment_id = null`. The implementation uses a simple round-robin assignment strategy where the first N tier1 articles are assigned to N deployment targets, and remaining tier1 articles get null assignment.
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 1. Job Configuration Schema (`src/generation/job_config.py`)
|
||||
- Added `deployment_targets` field (optional array of strings) to `Job` dataclass
|
||||
- Added validation to ensure `deployment_targets` is an array of strings
|
||||
- Job configuration now supports specifying custom hostnames for deployment target assignment
|
||||
|
||||
### 2. Deployment Assignment Logic (`src/generation/deployment_assignment.py`) - NEW FILE
|
||||
Created new module with three core functions:
|
||||
|
||||
- `resolve_hostname_to_id()` - Resolves a hostname to its site_deployment_id
|
||||
- `validate_and_resolve_targets()` - Validates all hostnames at job start (fail-fast approach)
|
||||
- `assign_site_for_article()` - Implements round-robin assignment logic
|
||||
|
||||
### 3. Database Repository Updates (`src/database/repositories.py`)
|
||||
- Updated `GeneratedContentRepository.create()` to accept optional `site_deployment_id` parameter
|
||||
- Maintains backward compatibility - parameter defaults to `None`
|
||||
|
||||
### 4. Batch Processor Integration (`src/generation/batch_processor.py`)
|
||||
- Added `site_deployment_repo` parameter to `BatchProcessor.__init__()`
|
||||
- Validates deployment targets at job start before generating any content
|
||||
- **Only applies deployment targets to tier1 articles** - tier2/tier3 always get null
|
||||
- Assigns `site_deployment_id` to each tier1 article based on its index
|
||||
- Logs assignment decisions at INFO level
|
||||
- Passes `site_deployment_id` to repository when creating content
|
||||
|
||||
### 5. CLI Updates (`src/cli/commands.py`)
|
||||
- Updated `generate-batch` command to initialize and pass `SiteDeploymentRepository` to `BatchProcessor`
|
||||
- Fixed merge conflict markers in the file
|
||||
|
||||
### 6. Example Job Configuration (`jobs/example_deployment_targets.json`) - NEW FILE
|
||||
Created example job file demonstrating the `deployment_targets` field with 3 sites and 10 articles.
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Unit Tests (`tests/unit/test_deployment_assignment.py`) - NEW FILE
|
||||
13 unit tests covering:
|
||||
- Hostname resolution (valid and invalid)
|
||||
- Target validation (empty lists, valid hostnames, invalid hostnames, type checking)
|
||||
- Round-robin assignment logic (edge cases, overflow, single target)
|
||||
- The 10-article, 3-target scenario from the story
|
||||
|
||||
### Integration Tests (`tests/integration/test_deployment_target_assignment.py`) - NEW FILE
|
||||
10 integration tests covering:
|
||||
- Job config parsing with deployment_targets
|
||||
- Job config validation (type checking, missing field handling)
|
||||
- Batch processor validation at job start
|
||||
- End-to-end assignment logic
|
||||
- Repository backward compatibility
|
||||
- **Tier1-only deployment target assignment** (tier2+ always get null)
|
||||
|
||||
**Total Test Results: 23/23 tests passing**
|
||||
|
||||
## Assignment Logic Example
|
||||
|
||||
Job with tier1 (10 articles), tier2 (100 articles), and 3 deployment targets:
|
||||
|
||||
**Tier1 articles:**
|
||||
```
|
||||
Article 0 → www.domain1.com (site_deployment_id = 5)
|
||||
Article 1 → www.domain2.com (site_deployment_id = 8)
|
||||
Article 2 → www.domain3.com (site_deployment_id = 12)
|
||||
Articles 3-9 → null (no assignment)
|
||||
```
|
||||
|
||||
**Tier2 articles:**
|
||||
```
|
||||
All 100 articles → null (tier2+ never get deployment targets)
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 2,
|
||||
"deployment_targets": [
|
||||
"www.domain1.com",
|
||||
"www.domain2.com",
|
||||
"www.domain3.com"
|
||||
],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 10
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The implementation provides clear error messages:
|
||||
|
||||
1. **Invalid hostnames**: "Deployment targets not found in database: invalid.com. Please ensure these sites exist using 'list-sites' command."
|
||||
2. **Missing repository**: "deployment_targets specified but SiteDeploymentRepository not provided"
|
||||
3. **Invalid configuration**: Validates array type and string elements with descriptive errors
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
- All changes are backward compatible
|
||||
- Jobs without `deployment_targets` continue to work as before (all articles get `site_deployment_id = null`)
|
||||
- Existing tests remain passing
|
||||
- No database schema changes required (field already existed from Story 2.4)
|
||||
|
||||
## Integration with Story 2.4
|
||||
|
||||
The implementation correctly integrates with Story 2.4's template selection logic:
|
||||
- If `site_deployment_id` is set → Story 2.4 uses mapped/random template for that site
|
||||
- If `site_deployment_id` is null → Story 2.4 uses random template selection
|
||||
|
||||
## Acceptance Criteria Verification
|
||||
|
||||
✅ Job configuration supports optional `deployment_targets` array of custom_hostnames
|
||||
✅ Round-robin assignment: articles 0 through N-1 get assigned, N+ get null
|
||||
✅ Missing `deployment_targets` → all articles get null
|
||||
✅ `site_deployment_id` stored in GeneratedContent at creation time
|
||||
✅ Invalid hostnames cause graceful errors with clear messages
|
||||
✅ Non-existent hostnames cause graceful errors
|
||||
✅ Validation occurs at job start (fail-fast)
|
||||
✅ Assignment decisions logged at INFO level
|
||||
|
||||
## Files Created
|
||||
- `src/generation/deployment_assignment.py`
|
||||
- `tests/unit/test_deployment_assignment.py`
|
||||
- `tests/integration/test_deployment_target_assignment.py`
|
||||
- `jobs/example_deployment_targets.json`
|
||||
|
||||
## Files Modified
|
||||
- `src/generation/job_config.py`
|
||||
- `src/generation/batch_processor.py`
|
||||
- `src/database/repositories.py`
|
||||
- `src/cli/commands.py`
|
||||
|
||||
|
|
@ -1,266 +0,0 @@
|
|||
# Story 3.1 Implementation Summary
|
||||
|
||||
## Overview
|
||||
Implemented URL generation and site assignment for batch content generation, including full auto-creation capabilities and priority-based site assignment.
|
||||
|
||||
## What Was Implemented
|
||||
|
||||
### 1. Database Schema Changes
|
||||
- **Modified**: `src/database/models.py`
|
||||
- Made `custom_hostname` nullable in `SiteDeployment` model
|
||||
- Added unique constraint to `pull_zone_bcdn_hostname`
|
||||
- Updated `__repr__` to handle both custom and bcdn hostnames
|
||||
|
||||
- **Migration Script**: `scripts/migrate_story_3.1.sql`
|
||||
- SQL script to update existing databases
|
||||
- Run this on your dev database before testing
|
||||
|
||||
### 2. Repository Layer Updates
|
||||
- **Modified**: `src/database/interfaces.py`
|
||||
- Changed `custom_hostname` to optional parameter in `create()` signature
|
||||
- Added `get_by_bcdn_hostname()` method signature
|
||||
- Updated `exists()` to check both hostname types
|
||||
|
||||
- **Modified**: `src/database/repositories.py`
|
||||
- Made `custom_hostname` parameter optional with default `None`
|
||||
- Implemented `get_by_bcdn_hostname()` method
|
||||
- Updated `exists()` to query both custom and bcdn hostnames
|
||||
|
||||
### 3. Template Service Update
|
||||
- **Modified**: `src/templating/service.py`
|
||||
- Line 92: Changed to `hostname = site_deployment.custom_hostname or site_deployment.pull_zone_bcdn_hostname`
|
||||
- Now handles sites with only bcdn hostnames
|
||||
|
||||
### 4. CLI Updates
|
||||
- **Modified**: `src/cli/commands.py`
|
||||
- Updated `sync-sites` command to import sites without custom domains
|
||||
- Removed filter that skipped bcdn-only sites
|
||||
- Now imports all bunny.net sites (with or without custom domains)
|
||||
|
||||
### 5. Site Provisioning Module (NEW)
|
||||
- **Created**: `src/generation/site_provisioning.py`
|
||||
- `generate_random_suffix()`: Creates random 4-char suffixes
|
||||
- `slugify_keyword()`: Converts keywords to URL-safe slugs
|
||||
- `create_bunnynet_site()`: Creates Storage Zone + Pull Zone via API
|
||||
- `provision_keyword_sites()`: Pre-creates sites for specific keywords
|
||||
- `create_generic_sites()`: Creates generic sites on-demand
|
||||
|
||||
### 6. URL Generator Module (NEW)
|
||||
- **Created**: `src/generation/url_generator.py`
|
||||
- `generate_slug()`: Converts article titles to URL-safe slugs
|
||||
- `generate_urls_for_batch()`: Generates complete URLs for all articles in batch
|
||||
- Handles custom domains and bcdn hostnames
|
||||
- Returns full URL mappings with metadata
|
||||
|
||||
### 7. Job Config Extensions
|
||||
- **Modified**: `src/generation/job_config.py`
|
||||
- Added `tier1_preferred_sites: Optional[List[str]]` field
|
||||
- Added `auto_create_sites: bool` field (default: False)
|
||||
- Added `create_sites_for_keywords: Optional[List[Dict]]` field
|
||||
- Full validation for all new fields
|
||||
|
||||
### 8. Site Assignment Module (NEW)
|
||||
- **Created**: `src/generation/site_assignment.py`
|
||||
- `assign_sites_to_batch()`: Main assignment function with full priority system
|
||||
- `_get_keyword_sites()`: Helper to match sites by keyword
|
||||
- **Priority system**:
|
||||
- Tier1: preferred sites → keyword sites → random
|
||||
- Tier2+: keyword sites → random
|
||||
- Auto-creates sites when pool is insufficient (if enabled)
|
||||
- Prevents duplicate assignments within same batch
|
||||
|
||||
### 9. Comprehensive Tests
|
||||
- **Created**: `tests/unit/test_url_generator.py` - URL generation tests
|
||||
- **Created**: `tests/unit/test_site_provisioning.py` - Site creation tests
|
||||
- **Created**: `tests/unit/test_site_assignment.py` - Assignment logic tests
|
||||
- **Created**: `tests/unit/test_job_config_extensions.py` - Config parsing tests
|
||||
- **Created**: `tests/integration/test_story_3_1_integration.py` - Full workflow tests
|
||||
|
||||
### 10. Example Job Config
|
||||
- **Created**: `jobs/example_story_3.1_full_features.json`
|
||||
- Demonstrates all new features
|
||||
- Ready-to-use template
|
||||
|
||||
## How to Use
|
||||
|
||||
### Step 1: Migrate Your Database
|
||||
Run the migration script on your development database:
|
||||
|
||||
```sql
|
||||
-- From scripts/migrate_story_3.1.sql
|
||||
ALTER TABLE site_deployments MODIFY COLUMN custom_hostname VARCHAR(255) NULL;
|
||||
ALTER TABLE site_deployments ADD CONSTRAINT uq_pull_zone_bcdn_hostname UNIQUE (pull_zone_bcdn_hostname);
|
||||
```
|
||||
|
||||
### Step 2: Sync Existing Bunny.net Sites
|
||||
Import your 400+ existing bunny.net buckets:
|
||||
|
||||
```bash
|
||||
uv run python main.py sync-sites --admin-user your_admin --dry-run
|
||||
```
|
||||
|
||||
Review the output, then run without `--dry-run` to import.
|
||||
|
||||
### Step 3: Create a Job Config
|
||||
Use the new fields in your job configuration:
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {"count": 10}
|
||||
},
|
||||
"tier1_preferred_sites": ["www.premium.com"],
|
||||
"auto_create_sites": true,
|
||||
"create_sites_for_keywords": [
|
||||
{"keyword": "engine repair", "count": 3}
|
||||
]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Use in Your Workflow
|
||||
In your content generation workflow:
|
||||
|
||||
```python
|
||||
from src.generation.site_assignment import assign_sites_to_batch
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
|
||||
# After content generation, assign sites
|
||||
assign_sites_to_batch(
|
||||
content_records=generated_articles,
|
||||
job=job_config,
|
||||
site_repo=site_repository,
|
||||
bunny_client=bunny_client,
|
||||
project_keyword=project.main_keyword
|
||||
)
|
||||
|
||||
# Generate URLs
|
||||
urls = generate_urls_for_batch(
|
||||
content_records=generated_articles,
|
||||
site_repo=site_repository
|
||||
)
|
||||
|
||||
# urls is a list of:
|
||||
# [{
|
||||
# "content_id": 1,
|
||||
# "title": "How to Fix Your Engine",
|
||||
# "url": "https://www.example.com/how-to-fix-your-engine.html",
|
||||
# "tier": "tier1",
|
||||
# "slug": "how-to-fix-your-engine",
|
||||
# "hostname": "www.example.com"
|
||||
# }, ...]
|
||||
```
|
||||
|
||||
## Site Assignment Priority Logic
|
||||
|
||||
### For Tier1 Articles:
|
||||
1. **Preferred Sites** (from `tier1_preferred_sites`) - if specified
|
||||
2. **Keyword Sites** (matching article keyword in site name)
|
||||
3. **Random** from available pool
|
||||
|
||||
### For Tier2+ Articles:
|
||||
1. **Keyword Sites** (matching article keyword in site name)
|
||||
2. **Random** from available pool
|
||||
|
||||
### Auto-Creation:
|
||||
If `auto_create_sites: true` and pool is insufficient:
|
||||
- Creates minimum number of generic sites needed
|
||||
- Uses project main keyword in site names
|
||||
- Creates via bunny.net API (Storage Zone + Pull Zone)
|
||||
|
||||
## URL Structure
|
||||
|
||||
### With Custom Domain:
|
||||
```
|
||||
https://www.example.com/how-to-fix-your-engine.html
|
||||
```
|
||||
|
||||
### With Bunny.net CDN Only:
|
||||
```
|
||||
https://mysite123.b-cdn.net/how-to-fix-your-engine.html
|
||||
```
|
||||
|
||||
## Slug Generation Rules
|
||||
- Lowercase
|
||||
- Replace spaces with hyphens
|
||||
- Remove special characters
|
||||
- Max 100 characters
|
||||
- Fallback: `article-{content_id}` if empty
|
||||
|
||||
## Testing
|
||||
|
||||
Run the tests:
|
||||
|
||||
```bash
|
||||
# Unit tests
|
||||
uv run pytest tests/unit/test_url_generator.py
|
||||
uv run pytest tests/unit/test_site_provisioning.py
|
||||
uv run pytest tests/unit/test_site_assignment.py
|
||||
uv run pytest tests/unit/test_job_config_extensions.py
|
||||
|
||||
# Integration tests
|
||||
uv run pytest tests/integration/test_story_3_1_integration.py
|
||||
|
||||
# All Story 3.1 tests
|
||||
uv run pytest tests/ -k "story_3_1 or url_generator or site_provisioning or site_assignment or job_config_extensions"
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
### Simple Over Complex
|
||||
- No fuzzy keyword matching (as requested)
|
||||
- Straightforward priority system
|
||||
- Clear error messages
|
||||
- Minimal dependencies
|
||||
|
||||
### Full Auto-Creation
|
||||
- Pre-create sites for specific keywords
|
||||
- Auto-create generic sites when needed
|
||||
- All sites use bunny.net API
|
||||
|
||||
### Full Priority System
|
||||
- Tier1 preferred sites
|
||||
- Keyword-based matching
|
||||
- Random assignment fallback
|
||||
|
||||
### Flexible Hostnames
|
||||
- Supports custom domains
|
||||
- Supports bcdn-only sites
|
||||
- Automatically chooses correct hostname
|
||||
|
||||
## Production Deployment
|
||||
|
||||
When moving to production:
|
||||
1. The model changes will automatically apply (SQLAlchemy will create tables correctly)
|
||||
2. No additional migration scripts needed
|
||||
3. Just ensure your production `.env` has `BUNNY_ACCOUNT_API_KEY` set
|
||||
4. Run `sync-sites` to import existing bunny.net infrastructure
|
||||
|
||||
## Files Changed/Created
|
||||
|
||||
### Modified (8 files):
|
||||
- `src/database/models.py`
|
||||
- `src/database/interfaces.py`
|
||||
- `src/database/repositories.py`
|
||||
- `src/templating/service.py`
|
||||
- `src/cli/commands.py`
|
||||
- `src/generation/job_config.py`
|
||||
|
||||
### Created (9 files):
|
||||
- `scripts/migrate_story_3.1.sql`
|
||||
- `src/generation/site_provisioning.py`
|
||||
- `src/generation/url_generator.py`
|
||||
- `src/generation/site_assignment.py`
|
||||
- `tests/unit/test_url_generator.py`
|
||||
- `tests/unit/test_site_provisioning.py`
|
||||
- `tests/unit/test_site_assignment.py`
|
||||
- `tests/unit/test_job_config_extensions.py`
|
||||
- `tests/integration/test_story_3_1_integration.py`
|
||||
- `jobs/example_story_3.1_full_features.json`
|
||||
- `STORY_3.1_IMPLEMENTATION_SUMMARY.md`
|
||||
|
||||
## Total Effort
|
||||
Completed all 10 tasks from the story specification.
|
||||
|
||||
|
|
@ -1,173 +0,0 @@
|
|||
# Story 3.1 Quick Start Guide
|
||||
|
||||
## Implementation Complete!
|
||||
|
||||
All features for Story 3.1 have been implemented and tested. 44 tests passing.
|
||||
|
||||
## What You Need to Do
|
||||
|
||||
### 1. Run Database Migration (Dev Environment)
|
||||
|
||||
```sql
|
||||
-- Connect to your MySQL database and run:
|
||||
ALTER TABLE site_deployments MODIFY COLUMN custom_hostname VARCHAR(255) NULL;
|
||||
ALTER TABLE site_deployments ADD CONSTRAINT uq_pull_zone_bcdn_hostname UNIQUE (pull_zone_bcdn_hostname);
|
||||
```
|
||||
|
||||
Or run: `mysql -u your_user -p your_database < scripts/migrate_story_3.1.sql`
|
||||
|
||||
### 2. Import Existing Bunny.net Sites
|
||||
|
||||
Now you can import your 400+ existing bunny.net buckets (with or without custom domains):
|
||||
|
||||
```bash
|
||||
# Dry run first to see what will be imported
|
||||
uv run python main.py sync-sites --admin-user your_admin --dry-run
|
||||
|
||||
# Actually import
|
||||
uv run python main.py sync-sites --admin-user your_admin
|
||||
```
|
||||
|
||||
This will now import ALL bunny.net sites, including those without custom domains.
|
||||
|
||||
### 3. Run Tests
|
||||
|
||||
```bash
|
||||
# Run all Story 3.1 tests
|
||||
uv run pytest tests/unit/test_url_generator.py \
|
||||
tests/unit/test_site_provisioning.py \
|
||||
tests/unit/test_site_assignment.py \
|
||||
tests/unit/test_job_config_extensions.py \
|
||||
tests/integration/test_story_3_1_integration.py \
|
||||
-v
|
||||
```
|
||||
|
||||
Expected: 44 tests passing
|
||||
|
||||
### 4. Use New Features
|
||||
|
||||
#### Example Job Config
|
||||
|
||||
Create a job config file using the new features:
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {"count": 10},
|
||||
"tier2": {"count": 50}
|
||||
},
|
||||
"deployment_targets": ["www.primary.com"],
|
||||
"tier1_preferred_sites": [
|
||||
"www.premium-site.com",
|
||||
"site123.b-cdn.net"
|
||||
],
|
||||
"auto_create_sites": true,
|
||||
"create_sites_for_keywords": [
|
||||
{"keyword": "engine repair", "count": 3}
|
||||
]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
#### In Your Code
|
||||
|
||||
```python
|
||||
from src.generation.site_assignment import assign_sites_to_batch
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
|
||||
# After content generation
|
||||
assign_sites_to_batch(
|
||||
content_records=batch_articles,
|
||||
job=job,
|
||||
site_repo=site_repo,
|
||||
bunny_client=bunny_client,
|
||||
project_keyword=project.main_keyword,
|
||||
region="DE"
|
||||
)
|
||||
|
||||
# Generate URLs
|
||||
url_mappings = generate_urls_for_batch(
|
||||
content_records=batch_articles,
|
||||
site_repo=site_repo
|
||||
)
|
||||
|
||||
# Use the URLs
|
||||
for url_info in url_mappings:
|
||||
print(f"{url_info['title']}: {url_info['url']}")
|
||||
```
|
||||
|
||||
## New Features Available
|
||||
|
||||
### 1. Sites Without Custom Domains
|
||||
- Import and use bunny.net sites that only have `.b-cdn.net` hostnames
|
||||
- No custom domain required
|
||||
- Perfect for your 400+ existing buckets
|
||||
|
||||
### 2. Auto-Creation of Sites
|
||||
- Set `auto_create_sites: true` in job config
|
||||
- System creates sites automatically when pool is insufficient
|
||||
- Uses project keyword in site names
|
||||
|
||||
### 3. Keyword-Based Site Creation
|
||||
- Pre-create sites for specific keywords
|
||||
- Example: `{"keyword": "engine repair", "count": 3}`
|
||||
- Creates 3 sites with "engine-repair" in the name
|
||||
|
||||
### 4. Tier1 Preferred Sites
|
||||
- Specify premium sites for tier1 articles
|
||||
- Example: `"tier1_preferred_sites": ["www.premium.com"]`
|
||||
- Tier1 articles assigned to these first
|
||||
|
||||
### 5. Smart Site Assignment
|
||||
**Tier1 Priority:**
|
||||
1. Preferred sites (if specified)
|
||||
2. Keyword-matching sites
|
||||
3. Random from pool
|
||||
|
||||
**Tier2+ Priority:**
|
||||
1. Keyword-matching sites
|
||||
2. Random from pool
|
||||
|
||||
### 6. URL Generation
|
||||
- Automatic slug generation from titles
|
||||
- Works with custom domains OR bcdn hostnames
|
||||
- Format: `https://domain.com/article-slug.html`
|
||||
|
||||
## File Changes Summary
|
||||
|
||||
### Modified (6 core files):
|
||||
- `src/database/models.py` - Nullable custom_hostname
|
||||
- `src/database/interfaces.py` - Optional custom_hostname in interface
|
||||
- `src/database/repositories.py` - New get_by_bcdn_hostname() method
|
||||
- `src/templating/service.py` - Handles both hostname types
|
||||
- `src/cli/commands.py` - sync-sites imports all sites
|
||||
- `src/generation/job_config.py` - New config fields
|
||||
|
||||
### Created (3 new modules):
|
||||
- `src/generation/site_provisioning.py` - Creates bunny.net sites
|
||||
- `src/generation/url_generator.py` - Generates URLs and slugs
|
||||
- `src/generation/site_assignment.py` - Assigns sites to articles
|
||||
|
||||
### Created (5 test files):
|
||||
- `tests/unit/test_url_generator.py` (14 tests)
|
||||
- `tests/unit/test_site_provisioning.py` (8 tests)
|
||||
- `tests/unit/test_site_assignment.py` (9 tests)
|
||||
- `tests/unit/test_job_config_extensions.py` (8 tests)
|
||||
- `tests/integration/test_story_3_1_integration.py` (5 tests)
|
||||
|
||||
## Production Deployment
|
||||
|
||||
When you deploy to production:
|
||||
1. Model changes automatically apply (SQLAlchemy creates tables correctly)
|
||||
2. No special migration needed - just deploy the code
|
||||
3. Run `sync-sites` to import your bunny.net infrastructure
|
||||
4. Start using the new features
|
||||
|
||||
## Support
|
||||
|
||||
See `STORY_3.1_IMPLEMENTATION_SUMMARY.md` for detailed documentation.
|
||||
|
||||
Example job config: `jobs/example_story_3.1_full_features.json`
|
||||
|
||||
|
|
@ -1,187 +0,0 @@
|
|||
# Story 3.2: Find Tiered Links - Implementation Summary
|
||||
|
||||
## Status
|
||||
Completed
|
||||
|
||||
## What Was Implemented
|
||||
|
||||
### 1. Database Models
|
||||
- **Added `money_site_url` to Project model** - Stores the client's actual website URL for tier 1 articles to link to
|
||||
- **Created `ArticleLink` model** - Tracks all link relationships between articles (tiered, wheel, homepage)
|
||||
|
||||
### 2. Database Repositories
|
||||
- **Extended `ProjectRepository`** - Now accepts `money_site_url` in the data dict during creation
|
||||
- **Extended `GeneratedContentRepository`** - Added filter for site_deployment_id in `get_by_project_and_tier()`
|
||||
- **Created `ArticleLinkRepository`** - Full CRUD operations for article link tracking
|
||||
- `create()` - Create internal or external links
|
||||
- `get_by_source_article()` - Get all outbound links from an article
|
||||
- `get_by_target_article()` - Get all inbound links to an article
|
||||
- `get_by_link_type()` - Get all links of a specific type
|
||||
- `delete()` - Remove a link
|
||||
|
||||
### 3. Job Configuration
|
||||
- **Extended `Job` dataclass** - Added optional `tiered_link_count_range` field
|
||||
- **Validation** - Validates that min >= 1 and max >= min
|
||||
- **Defaults** - If not specified, uses `{min: 2, max: 4}`
|
||||
|
||||
### 4. Core Functionality
|
||||
Created `src/interlinking/tiered_links.py` with:
|
||||
- **`find_tiered_links()`** - Main function to find tiered links for a batch
|
||||
- For tier 1: Returns the money site URL
|
||||
- For tier 2+: Returns random selection of lower-tier article URLs
|
||||
- Respects project boundaries (only queries same project)
|
||||
- Applies link count configuration
|
||||
- Handles edge cases (insufficient articles, missing money site URL)
|
||||
|
||||
### 5. Tests
|
||||
- **22 unit tests** in `tests/unit/test_tiered_links.py` - All passing
|
||||
- **8 unit tests** in `tests/unit/test_article_link_repository.py` - All passing
|
||||
- **9 integration tests** in `tests/integration/test_story_3_2_integration.py` - All passing
|
||||
- **Total: 39 tests, all passing**
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Finding Tiered Links for Tier 1 Batch
|
||||
```python
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
|
||||
# Tier 1 articles link to the money site
|
||||
result = find_tiered_links(tier1_content_records, job, project_repo, content_repo, site_repo)
|
||||
# Returns: {
|
||||
# "tier": 1,
|
||||
# "money_site_url": "https://www.mymoneysite.com"
|
||||
# }
|
||||
```
|
||||
|
||||
### Finding Tiered Links for Tier 2 Batch
|
||||
```python
|
||||
# Tier 2 articles link to random tier 1 articles
|
||||
result = find_tiered_links(tier2_content_records, job, project_repo, content_repo, site_repo)
|
||||
# Returns: {
|
||||
# "tier": 2,
|
||||
# "lower_tier": 1,
|
||||
# "lower_tier_urls": [
|
||||
# "https://site1.b-cdn.net/article-1.html",
|
||||
# "https://site2.b-cdn.net/article-2.html",
|
||||
# "https://site3.b-cdn.net/article-3.html"
|
||||
# ]
|
||||
# }
|
||||
```
|
||||
|
||||
### Job Config with Custom Link Count
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"tiers": {
|
||||
"tier1": {"count": 5},
|
||||
"tier2": {"count": 10}
|
||||
},
|
||||
"tiered_link_count_range": {
|
||||
"min": 3,
|
||||
"max": 5
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Recording Links in Database
|
||||
```python
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
|
||||
link_repo = ArticleLinkRepository(session)
|
||||
|
||||
# Record tier 1 article linking to money site
|
||||
link_repo.create(
|
||||
from_content_id=tier1_article.id,
|
||||
to_content_id=None,
|
||||
to_url="https://www.moneysite.com",
|
||||
link_type="tiered"
|
||||
)
|
||||
|
||||
# Record tier 2 article linking to tier 1 article
|
||||
link_repo.create(
|
||||
from_content_id=tier2_article.id,
|
||||
to_content_id=tier1_article.id,
|
||||
to_url=None,
|
||||
link_type="tiered"
|
||||
)
|
||||
|
||||
# Query all links from an article
|
||||
outbound_links = link_repo.get_by_source_article(article.id)
|
||||
```
|
||||
|
||||
## Database Schema Changes
|
||||
|
||||
### Project Table
|
||||
```sql
|
||||
ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL;
|
||||
CREATE INDEX idx_projects_money_site_url ON projects(money_site_url);
|
||||
```
|
||||
|
||||
### Article Links Table (New)
|
||||
```sql
|
||||
CREATE TABLE article_links (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
from_content_id INTEGER NOT NULL,
|
||||
to_content_id INTEGER NULL,
|
||||
to_url TEXT NULL,
|
||||
anchor_text TEXT NULL, -- Added in Story 4.5
|
||||
link_type VARCHAR(20) NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
|
||||
CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_article_links_from ON article_links(from_content_id);
|
||||
CREATE INDEX idx_article_links_to ON article_links(to_content_id);
|
||||
CREATE INDEX idx_article_links_type ON article_links(link_type);
|
||||
```
|
||||
|
||||
## Link Types
|
||||
- `tiered` - Link from tier N to tier N-1 (or money site for tier 1)
|
||||
- `wheel_next` - Link to next article in wheel structure
|
||||
- `wheel_prev` - Link to previous article in wheel structure
|
||||
- `homepage` - Link to site homepage
|
||||
|
||||
## Key Features
|
||||
1. **Project Isolation** - Only queries articles from the same project
|
||||
2. **Random Selection** - Randomly selects articles within configured range
|
||||
3. **Flexible Configuration** - Supports both range (min-max) and exact counts
|
||||
4. **Error Handling** - Clear error messages for missing data
|
||||
5. **Warning Logs** - Logs warnings when fewer articles available than requested
|
||||
6. **URL Generation** - Integrates with Story 3.1 URL generation
|
||||
|
||||
## Next Steps (Future Stories)
|
||||
- Story 3.3 will use `find_tiered_links()` for actual content injection
|
||||
- Story 3.3 will populate `article_links` table with wheel and homepage links
|
||||
- Story 4.2 will log tiered links after deployment
|
||||
- Future: Analytics dashboard using `article_links` data
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
### Created
|
||||
- `src/interlinking/tiered_links.py`
|
||||
- `tests/unit/test_tiered_links.py`
|
||||
- `tests/unit/test_article_link_repository.py`
|
||||
- `tests/integration/test_story_3_2_integration.py`
|
||||
- `jobs/example_story_3.2_tiered_links.json`
|
||||
- `STORY_3.2_IMPLEMENTATION_SUMMARY.md` (this file)
|
||||
|
||||
### Modified
|
||||
- `src/database/models.py` - Added `money_site_url` to Project, added `ArticleLink` model
|
||||
- `src/database/interfaces.py` - Added `IArticleLinkRepository` interface
|
||||
- `src/database/repositories.py` - Extended `ProjectRepository`, added `ArticleLinkRepository`
|
||||
- `src/generation/job_config.py` - Added `tiered_link_count_range` to Job config
|
||||
|
||||
## Test Coverage
|
||||
All acceptance criteria from the story are covered by tests:
|
||||
- Tier 1 returns money site URL
|
||||
- Tier 2+ queries lower tier from same project
|
||||
- Custom link count ranges work
|
||||
- Error handling for missing data
|
||||
- Warning logs for insufficient articles
|
||||
- ArticleLink CRUD operations
|
||||
- Integration with URL generation
|
||||
|
||||
|
|
@ -1,327 +0,0 @@
|
|||
# Story 3.3: Content Interlinking Injection - COMPLETE ✅
|
||||
|
||||
**Status**: Implemented, Integrated, Tested, and Production-Ready
|
||||
**Date Completed**: October 21, 2025
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Story 3.3 is **100% COMPLETE** including:
|
||||
- ✅ Core implementation (`src/interlinking/content_injection.py`)
|
||||
- ✅ Full test coverage (42 tests, 100% passing)
|
||||
- ✅ CLI integration (`src/generation/batch_processor.py`)
|
||||
- ✅ Real-world validation (tested with live batch generation)
|
||||
- ✅ Zero linter errors
|
||||
- ✅ Documentation updated
|
||||
|
||||
---
|
||||
|
||||
## What Was Delivered
|
||||
|
||||
### 1. Core Functionality
|
||||
**File**: `src/interlinking/content_injection.py` (410 lines)
|
||||
|
||||
Three types of link injection:
|
||||
- **Tiered Links**: T1→money site, T2+→lower-tier articles
|
||||
- **Homepage Links**: All articles→`/index.html` with "Home" anchor
|
||||
- **See Also Section**: Each article→all other batch articles
|
||||
|
||||
Features:
|
||||
- Tier-based anchor text with job config overrides (default/override/append)
|
||||
- Case-insensitive anchor text matching
|
||||
- First occurrence only (prevents over-optimization)
|
||||
- Fallback insertion when anchor not found
|
||||
- Database link tracking (`article_links` table)
|
||||
|
||||
### 2. Template Updates
|
||||
All 4 HTML templates now have navigation menus:
|
||||
- `src/templating/templates/basic.html`
|
||||
- `src/templating/templates/modern.html`
|
||||
- `src/templating/templates/classic.html`
|
||||
- `src/templating/templates/minimal.html`
|
||||
|
||||
Each template has theme-appropriate styling for:
|
||||
```html
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="/index.html">Home</a></li>
|
||||
<li><a href="about.html">About</a></li>
|
||||
<li><a href="privacy.html">Privacy</a></li>
|
||||
<li><a href="contact.html">Contact</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
```
|
||||
|
||||
### 3. Test Coverage
|
||||
**Unit Tests**: `tests/unit/test_content_injection.py` (33 tests)
|
||||
- Homepage URL extraction
|
||||
- HTML insertion
|
||||
- Anchor text finding & wrapping
|
||||
- Link injection fallback
|
||||
- Anchor text config modes
|
||||
- All helper functions
|
||||
|
||||
**Integration Tests**: `tests/integration/test_content_injection_integration.py` (9 tests)
|
||||
- Full T1 batch with money site links
|
||||
- T2 batch linking to T1 articles
|
||||
- Anchor text config overrides
|
||||
- Different batch sizes (1-20 articles)
|
||||
- Database link records
|
||||
- Internal vs external links
|
||||
|
||||
**Result**: 42/42 tests passing (100%)
|
||||
|
||||
### 4. CLI Integration
|
||||
**File**: `src/generation/batch_processor.py`
|
||||
|
||||
Added complete post-processing pipeline:
|
||||
1. **Site Assignment** (Story 3.1) - Automatic assignment from pool
|
||||
2. **URL Generation** (Story 3.1) - Final public URLs
|
||||
3. **Tiered Links** (Story 3.2) - Find money site or lower-tier URLs
|
||||
4. **Content Injection** (Story 3.3) - Inject all links
|
||||
5. **Template Application** - Apply HTML templates
|
||||
|
||||
### 5. Database Integration
|
||||
Updated `src/database/repositories.py`:
|
||||
- Added `require_site` parameter to `get_by_project_and_tier()`
|
||||
- Backward compatible (default maintains existing behavior)
|
||||
|
||||
All links tracked in `article_links` table:
|
||||
- `link_type="tiered"` - Money site or lower-tier links
|
||||
- `link_type="homepage"` - Homepage links to `/index.html`
|
||||
- `link_type="wheel_see_also"` - See Also section links
|
||||
|
||||
---
|
||||
|
||||
## How It Works Now
|
||||
|
||||
### Before Story 3.3
|
||||
```
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
|
||||
Result:
|
||||
- Articles generated ✓
|
||||
- Raw HTML, no links ✗
|
||||
- Not ready for deployment ✗
|
||||
```
|
||||
|
||||
### After Story 3.3
|
||||
```
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
|
||||
Result:
|
||||
- Articles generated ✓
|
||||
- Sites auto-assigned ✓
|
||||
- URLs generated ✓
|
||||
- Tiered links injected ✓
|
||||
- Homepage links injected ✓
|
||||
- See Also sections added ✓
|
||||
- Templates applied ✓
|
||||
- Ready for deployment! ✓
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria - All Met ✅
|
||||
|
||||
From the story requirements:
|
||||
|
||||
### Core Functionality
|
||||
- [x] Function takes raw HTML, URL list, tiered links, and project data
|
||||
- [x] **Wheel Links**: "See Also" section with ALL other batch articles
|
||||
- [x] **Homepage Links**: Links to site's homepage (`/index.html`)
|
||||
- [x] **Tiered Links**: T1→money site, T2+→lower-tier articles
|
||||
|
||||
### Input Requirements
|
||||
- [x] Accepts raw HTML content from Epic 2
|
||||
- [x] Accepts article URL list from Story 3.1
|
||||
- [x] Accepts tiered links object from Story 3.2
|
||||
- [x] Accepts project data for anchor text
|
||||
- [x] Handles batch tier information
|
||||
|
||||
### Output Requirements
|
||||
- [x] Final HTML with all links injected
|
||||
- [x] Updated content stored in database
|
||||
- [x] Link relationships recorded in `article_links` table
|
||||
|
||||
### Technical Requirements
|
||||
- [x] Case-insensitive anchor text matching
|
||||
- [x] Links first occurrence only
|
||||
- [x] Fallback insertion when anchor not found
|
||||
- [x] Job config overrides (default/override/append)
|
||||
- [x] Preserves HTML structure
|
||||
- [x] Safe HTML parsing (BeautifulSoup)
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Created
|
||||
- `src/interlinking/content_injection.py` (410 lines)
|
||||
- `tests/unit/test_content_injection.py` (363 lines, 33 tests)
|
||||
- `tests/integration/test_content_injection_integration.py` (469 lines, 9 tests)
|
||||
- `STORY_3.3_IMPLEMENTATION_SUMMARY.md` (240 lines)
|
||||
- `docs/stories/story-3.3-content-interlinking-injection.md` (342 lines)
|
||||
- `QA_REPORT_STORY_3.3.md` (482 lines)
|
||||
- `STORY_3.3_QA_SUMMARY.md` (247 lines)
|
||||
- `INTEGRATION_COMPLETE.md` (245 lines)
|
||||
- `CLI_INTEGRATION_EXPLANATION.md` (258 lines)
|
||||
- `INTEGRATION_GAP_VISUAL.md` (242 lines)
|
||||
|
||||
### Modified
|
||||
- `src/templating/templates/basic.html` - Added navigation menu
|
||||
- `src/templating/templates/modern.html` - Added navigation menu
|
||||
- `src/templating/templates/classic.html` - Added navigation menu
|
||||
- `src/templating/templates/minimal.html` - Added navigation menu
|
||||
- `src/generation/batch_processor.py` - Added post-processing pipeline (~100 lines)
|
||||
- `src/database/repositories.py` - Added `require_site` parameter
|
||||
|
||||
**Total**: 10 new files, 6 modified files, ~3,000 lines of code/tests/docs
|
||||
|
||||
---
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
- **Test Coverage**: 42/42 tests passing (100%)
|
||||
- **Linter Errors**: 0
|
||||
- **Code Quality**: Excellent
|
||||
- **Documentation**: Comprehensive
|
||||
- **Integration**: Complete
|
||||
- **Production Ready**: Yes
|
||||
|
||||
---
|
||||
|
||||
## Validation Results
|
||||
|
||||
### Automated Tests
|
||||
```
|
||||
42 passed in 2.54s
|
||||
✅ All unit tests pass
|
||||
✅ All integration tests pass
|
||||
✅ Zero linter errors
|
||||
```
|
||||
|
||||
### Real-World Test
|
||||
```
|
||||
Job: 2 articles, 1 deployment target
|
||||
|
||||
Results:
|
||||
Article 1:
|
||||
- Site: www.testsite.com (via deployment_targets)
|
||||
- Links: 9 (tiered + homepage + See Also)
|
||||
- Template: classic
|
||||
- Status: Ready ✅
|
||||
|
||||
Article 2:
|
||||
- Site: www.testsite2.com (auto-assigned from pool)
|
||||
- Links: 6 (tiered + homepage + See Also)
|
||||
- Template: minimal
|
||||
- Status: Ready ✅
|
||||
|
||||
Database:
|
||||
- 15 link records created
|
||||
- All link types present (tiered, homepage, wheel_see_also)
|
||||
- Internal and external links tracked correctly
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Example
|
||||
|
||||
```bash
|
||||
# 1. Create a job file
|
||||
cat > jobs/my_batch.json << 'EOF'
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["www.mysite.com"],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
EOF
|
||||
|
||||
# 2. Run batch generation
|
||||
uv run python main.py generate-batch \
|
||||
--job-file jobs/my_batch.json \
|
||||
--username admin \
|
||||
--password yourpass
|
||||
|
||||
# Output shows:
|
||||
# ✓ Articles generated
|
||||
# ✓ Sites assigned
|
||||
# ✓ URLs generated
|
||||
# ✓ Tiered links found
|
||||
# ✓ Interlinks injected ← Story 3.3!
|
||||
# ✓ Templates applied
|
||||
|
||||
# 3. Articles are now deployment-ready with:
|
||||
# - Full URLs
|
||||
# - Money site links
|
||||
# - Homepage links
|
||||
# - See Also sections
|
||||
# - HTML templates applied
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Runtime
|
||||
- BeautifulSoup4 (HTML parsing)
|
||||
- Story 3.1 (URL generation, site assignment)
|
||||
- Story 3.2 (Tiered link finding)
|
||||
- Story 2.x (Content generation)
|
||||
- Existing anchor text generator
|
||||
|
||||
### Development
|
||||
- pytest (testing)
|
||||
- All dependencies satisfied and tested
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements (Optional)
|
||||
|
||||
Story 3.3 is complete as specified. Potential future improvements:
|
||||
|
||||
1. **Link Density Control**: Configurable max links per article
|
||||
2. **Custom See Also Heading**: Make "See Also" heading configurable
|
||||
3. **Link Position Strategy**: Preference for intro/body/conclusion placement
|
||||
4. **Anchor Text Variety**: More sophisticated rotation strategies
|
||||
5. ~~**About/Privacy/Contact Pages**: Create pages to match nav menu links~~ ✅ **PROMOTED TO STORY 3.4**
|
||||
|
||||
None of these are required for Story 3.3 completion.
|
||||
|
||||
### Story 3.4 Emerged from Story 3.3
|
||||
During Story 3.3 implementation, we added navigation menus to all templates that link to `about.html`, `contact.html`, and `privacy.html`. However, these pages don't exist, creating broken links. This was identified as a high-priority issue and promoted to **Story 3.4: Boilerplate Site Pages**.
|
||||
|
||||
See: `docs/stories/story-3.4-boilerplate-site-pages.md`
|
||||
|
||||
---
|
||||
|
||||
## Sign-Off
|
||||
|
||||
**Implementation**: COMPLETE ✅
|
||||
**Integration**: COMPLETE ✅
|
||||
**Testing**: COMPLETE ✅
|
||||
**Documentation**: COMPLETE ✅
|
||||
**QA**: PASSED ✅
|
||||
|
||||
**Story 3.3 is DONE and ready for production.**
|
||||
|
||||
Next: **Story 4.x** - Deployment (final HTML with all links is ready)
|
||||
|
||||
---
|
||||
|
||||
**Completed by**: AI Code Assistant
|
||||
**Completed on**: October 21, 2025
|
||||
**Total effort**: ~5 hours (implementation + integration + testing + documentation)
|
||||
|
||||
*This story delivers a complete, tested, production-ready content interlinking system that automatically creates fully interlinked article batches ready for deployment.*
|
||||
|
||||
|
|
@ -1,241 +0,0 @@
|
|||
# Story 3.3: Content Interlinking Injection - Implementation Summary
|
||||
|
||||
## Status
|
||||
✅ **COMPLETE & INTEGRATED** - All acceptance criteria met, all tests passing, CLI integration complete
|
||||
|
||||
**Date Completed**: October 21, 2025
|
||||
|
||||
## What Was Implemented
|
||||
|
||||
### Core Module: `src/interlinking/content_injection.py`
|
||||
|
||||
Main function: `inject_interlinks()` - Injects three types of links into article HTML:
|
||||
|
||||
1. **Tiered Links** (Money Site / Lower Tier Articles)
|
||||
- Tier 1: Links to money site URL
|
||||
- Tier 2+: Links to 2-4 random lower-tier articles
|
||||
- Uses tier-appropriate anchor text from `anchor_text_generator.py`
|
||||
- Supports job config overrides (default/override/append modes)
|
||||
- Searches for anchor text in content (case-insensitive)
|
||||
- Wraps first occurrence or inserts via fallback
|
||||
|
||||
2. **Homepage Links**
|
||||
- Links to `/index.html` on the article's domain
|
||||
- Uses "Home" as anchor text
|
||||
- Searches for "Home" in article content or inserts it
|
||||
|
||||
3. **"See Also" Section**
|
||||
- Added after last `</p>` tag
|
||||
- Links to ALL other articles in the batch
|
||||
- Each link uses article title as anchor text
|
||||
- Formatted as `<h3>` + `<ul>` list
|
||||
|
||||
### Template Updates: Navigation Menu
|
||||
|
||||
Added responsive navigation menu to all 4 templates (`src/templating/templates/`):
|
||||
- **basic.html** - Clean, simple nav with blue accents
|
||||
- **modern.html** - Gradient hover effects matching purple theme
|
||||
- **classic.html** - Serif font, muted brown colors
|
||||
- **minimal.html** - Uppercase, minimalist black & white
|
||||
|
||||
All templates now include:
|
||||
```html
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="/index.html">Home</a></li>
|
||||
<li><a href="about.html">About</a></li>
|
||||
<li><a href="privacy.html">Privacy</a></li>
|
||||
<li><a href="contact.html">Contact</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
```
|
||||
|
||||
### Helper Functions
|
||||
|
||||
- `_inject_tiered_links()` - Handles money site (T1) and lower-tier (T2+) links
|
||||
- `_inject_homepage_link()` - Injects "Home" link to `/index.html`
|
||||
- `_inject_see_also_section()` - Builds "See Also" section with batch links
|
||||
- `_get_anchor_texts_for_tier()` - Gets anchor text with job config overrides
|
||||
- `_try_inject_link()` - Tries to find/wrap anchor text or falls back to insertion
|
||||
- `_find_and_wrap_anchor_text()` - Case-insensitive search and wrap (first occurrence only)
|
||||
- `_insert_link_into_random_paragraph()` - Fallback insertion into random paragraph
|
||||
- `_extract_homepage_url()` - Extracts base domain URL
|
||||
- `_extract_domain_name()` - Extracts domain name (removes www.)
|
||||
- `_insert_before_closing_tags()` - Inserts content after last `</p>` tag
|
||||
|
||||
### Database Integration
|
||||
|
||||
All injected links are recorded in `article_links` table:
|
||||
- **Tiered links**: `link_type="tiered"`, `to_url` (money site or lower tier URL)
|
||||
- **Homepage links**: `link_type="homepage"`, `to_url` (domain/index.html)
|
||||
- **See Also links**: `link_type="wheel_see_also"`, `to_content_id` (internal)
|
||||
|
||||
Content is updated in `generated_content.content` field via `content_repo.update()`.
|
||||
|
||||
### Anchor Text Configuration
|
||||
|
||||
Supports three modes in job config:
|
||||
```json
|
||||
{
|
||||
"anchor_text_config": {
|
||||
"mode": "default|override|append",
|
||||
"custom_text": ["anchor 1", "anchor 2", ...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- **default**: Use tier-based anchors (T1: main keyword, T2: related searches, T3: main keyword, T4+: entities)
|
||||
- **override**: Replace defaults with custom_text
|
||||
- **append**: Add custom_text to defaults
|
||||
|
||||
### Link Injection Strategy
|
||||
|
||||
1. **Search for anchor text** in content (case-insensitive, match within phrases)
|
||||
2. **Wrap first occurrence** with `<a>` tag
|
||||
3. **Skip existing links** (don't link text already inside `<a>` tags)
|
||||
4. **Fallback to insertion** if anchor text not found
|
||||
5. **Random placement** in fallback mode
|
||||
|
||||
### Testing
|
||||
|
||||
**Unit Tests** (33 tests in `tests/unit/test_content_injection.py`):
|
||||
- Homepage URL extraction
|
||||
- "See Also" section insertion
|
||||
- Anchor text finding and wrapping (case-insensitive, within phrases)
|
||||
- Link insertion into paragraphs
|
||||
- Anchor text config modes (default, override, append)
|
||||
- Tiered link injection (T1 money site, T2+ lower tier)
|
||||
- Error handling
|
||||
|
||||
**Integration Tests** (9 tests in `tests/integration/test_content_injection_integration.py`):
|
||||
- Full flow: T1 batch with money site links + See Also section
|
||||
- Homepage link injection
|
||||
- T2 batch linking to T1 articles
|
||||
- Anchor text config overrides (override/append modes)
|
||||
- Different batch sizes (1 article, 20 articles)
|
||||
- ArticleLink database records (all link types)
|
||||
- Internal vs external link handling
|
||||
|
||||
**All 42 tests pass**
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **"Home" for homepage links**: Using "Home" as anchor text instead of domain name, now that all templates have navigation menus
|
||||
2. **Homepage URL**: Points to `/index.html` (not just `/`)
|
||||
3. **Random selection**: For T2+ articles, random selection of 2-4 lower-tier URLs to link to
|
||||
4. **Case-insensitive matching**: "Shaft Machining" matches "shaft machining"
|
||||
5. **First occurrence only**: Only link the first instance of anchor text to avoid over-optimization
|
||||
6. **BeautifulSoup for HTML parsing**: Safe, preserves structure, handles malformed HTML
|
||||
7. **Fallback insertion**: If anchor text not found, insert into random paragraph at random position
|
||||
8. **See Also section**: Simpler than wheel_next/wheel_prev - all articles link to all others
|
||||
|
||||
## Files Modified
|
||||
|
||||
### Created
|
||||
- `src/interlinking/content_injection.py` (410 lines)
|
||||
- `tests/unit/test_content_injection.py` (363 lines)
|
||||
- `tests/integration/test_content_injection_integration.py` (469 lines)
|
||||
|
||||
### Modified
|
||||
- `src/templating/templates/basic.html` - Added navigation menu
|
||||
- `src/templating/templates/modern.html` - Added navigation menu
|
||||
- `src/templating/templates/classic.html` - Added navigation menu
|
||||
- `src/templating/templates/minimal.html` - Added navigation menu
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **BeautifulSoup4**: HTML parsing and manipulation
|
||||
- **Story 3.1**: URL generation (uses `generate_urls_for_batch()`)
|
||||
- **Story 3.2**: Tiered link finding (uses `find_tiered_links()`)
|
||||
- **Existing**: `anchor_text_generator.py` for tier-based anchor text
|
||||
|
||||
## Usage Example
|
||||
|
||||
```python
|
||||
from src.interlinking.content_injection import inject_interlinks
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
|
||||
# 1. Generate URLs for batch
|
||||
article_urls = generate_urls_for_batch(content_records, site_repo)
|
||||
|
||||
# 2. Find tiered links
|
||||
tiered_links = find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
|
||||
|
||||
# 3. Inject all interlinks
|
||||
inject_interlinks(
|
||||
content_records,
|
||||
article_urls,
|
||||
tiered_links,
|
||||
project,
|
||||
job_config,
|
||||
content_repo,
|
||||
link_repo
|
||||
)
|
||||
```
|
||||
|
||||
## CLI Integration (Completed)
|
||||
|
||||
Story 3.3 is now **fully integrated** into the `generate-batch` CLI workflow:
|
||||
|
||||
### Integration Details
|
||||
- **File Modified**: `src/generation/batch_processor.py`
|
||||
- **New Method**: `_post_process_tier()` (80+ lines)
|
||||
- **Integration Point**: Automatically runs after article generation for each tier
|
||||
|
||||
### Complete Pipeline
|
||||
When you run `generate-batch`, articles now go through:
|
||||
1. Content generation (title, outline, content)
|
||||
2. Site assignment via `deployment_targets` (Story 2.5)
|
||||
3. **NEW**: Automatic site assignment for unassigned articles (Story 3.1)
|
||||
4. **NEW**: URL generation (Story 3.1)
|
||||
5. **NEW**: Tiered link finding (Story 3.2)
|
||||
6. **NEW**: Content interlinking injection (Story 3.3)
|
||||
7. **NEW**: Template application
|
||||
|
||||
### CLI Output
|
||||
```
|
||||
tier1: Generating 5 articles
|
||||
[1/5] Generating title...
|
||||
[1/5] Generating outline...
|
||||
[1/5] Generating content...
|
||||
[1/5] Saved (ID: 43, Status: generated)
|
||||
...
|
||||
tier1: Assigning sites to 2 articles...
|
||||
Assigned 2 articles to sites
|
||||
tier1: Post-processing 5 articles...
|
||||
Generating URLs...
|
||||
Generated 5 URLs
|
||||
Finding tiered links...
|
||||
Found tiered links for tier 1
|
||||
Injecting interlinks... ← Story 3.3!
|
||||
Interlinks injected successfully ← Story 3.3!
|
||||
Applying templates...
|
||||
Applied templates to 5/5 articles
|
||||
tier1: Post-processing complete
|
||||
```
|
||||
|
||||
### Verification
|
||||
Tested and confirmed:
|
||||
- ✅ Articles assigned to sites automatically
|
||||
- ✅ URLs generated for all articles
|
||||
- ✅ Tiered links injected (money site for T1)
|
||||
- ✅ Homepage links injected (`/index.html`)
|
||||
- ✅ "See Also" sections with batch links
|
||||
- ✅ Templates applied
|
||||
- ✅ All link records in database
|
||||
|
||||
## Next Steps
|
||||
|
||||
Story 3.3 is complete and integrated. Ready for:
|
||||
- **Story 4.x**: Deployment (final HTML with all links is ready)
|
||||
- **Future**: Analytics dashboard using `article_links` table
|
||||
- **Future**: Create About, Privacy, Contact pages to match nav menu links
|
||||
|
||||
## Notes
|
||||
|
||||
- Homepage links use "Home" anchor text, pointing to `/index.html`
|
||||
- All 4 templates now have consistent navigation structure
|
||||
- Link relationships fully tracked in database for analytics
|
||||
- Simple, maintainable code with comprehensive test coverage
|
||||
|
||||
|
|
@ -1,230 +0,0 @@
|
|||
# Story 3.3 QA Summary
|
||||
|
||||
**Date**: October 21, 2025
|
||||
**QA Status**: PASSED ✓
|
||||
**Production Ready**: YES (with integration caveat)
|
||||
|
||||
---
|
||||
|
||||
## Quick Stats
|
||||
|
||||
| Metric | Status |
|
||||
|--------|--------|
|
||||
| **Unit Tests** | 33/33 PASSED (100%) |
|
||||
| **Integration Tests** | 9/9 PASSED (100%) |
|
||||
| **Total Tests** | 42/42 PASSED |
|
||||
| **Linter Errors** | 0 |
|
||||
| **Test Execution Time** | ~4.3 seconds |
|
||||
| **Code Quality** | Excellent |
|
||||
|
||||
---
|
||||
|
||||
## What Was Tested
|
||||
|
||||
### Core Features (All PASSED ✓)
|
||||
1. **Tiered Links**
|
||||
- T1 articles → money site
|
||||
- T2+ articles → 2-4 random lower-tier articles
|
||||
- Tier-appropriate anchor text
|
||||
- Job config overrides (default/override/append)
|
||||
|
||||
2. **Homepage Links**
|
||||
- Links to `/index.html`
|
||||
- Uses "Home" as anchor text
|
||||
- Case-insensitive matching
|
||||
|
||||
3. **See Also Section**
|
||||
- Links to ALL other batch articles
|
||||
- Proper HTML formatting
|
||||
- Excludes current article
|
||||
|
||||
4. **Anchor Text Configuration**
|
||||
- Default mode (tier-based)
|
||||
- Override mode (custom text)
|
||||
- Append mode (tier + custom)
|
||||
|
||||
5. **Database Integration**
|
||||
- Content updates persist
|
||||
- Link records created correctly
|
||||
- Internal vs external links handled
|
||||
|
||||
6. **Template Updates**
|
||||
- All 4 templates have navigation
|
||||
- Consistent structure across themes
|
||||
|
||||
---
|
||||
|
||||
## What Works
|
||||
|
||||
Everything! All 42 tests pass with zero errors.
|
||||
|
||||
### Verified Scenarios
|
||||
- Single article batches
|
||||
- Large batches (20+ articles)
|
||||
- T1 batches with money site links
|
||||
- T2 batches linking to T1 articles
|
||||
- Custom anchor text overrides
|
||||
- Missing money site (graceful error)
|
||||
- Missing URLs (graceful skip)
|
||||
- Malformed HTML (handled safely)
|
||||
- Empty content (graceful skip)
|
||||
|
||||
---
|
||||
|
||||
## What Doesn't Work (Yet)
|
||||
|
||||
### CLI Integration Missing
|
||||
Story 3.3 is **NOT integrated** into the main `generate-batch` command.
|
||||
|
||||
**Current State**:
|
||||
```bash
|
||||
uv run python main.py generate-batch --job-file jobs/example.json
|
||||
# This generates content but DOES NOT inject interlinks
|
||||
```
|
||||
|
||||
**What's Missing**:
|
||||
- No call to `generate_urls_for_batch()`
|
||||
- No call to `find_tiered_links()`
|
||||
- No call to `inject_interlinks()`
|
||||
|
||||
**Impact**: Functions work perfectly but aren't used in main workflow yet.
|
||||
|
||||
**Solution**: Needs 5-10 lines of code in `BatchProcessor` to call these functions after content generation.
|
||||
|
||||
---
|
||||
|
||||
## Test Evidence
|
||||
|
||||
### Run All Story 3.3 Tests
|
||||
```bash
|
||||
uv run pytest tests/unit/test_content_injection.py tests/integration/test_content_injection_integration.py -v
|
||||
```
|
||||
|
||||
**Expected Output**: `42 passed in ~4s`
|
||||
|
||||
### Check Code Quality
|
||||
```bash
|
||||
# No linter errors in implementation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
All criteria from story doc met:
|
||||
|
||||
- [x] Inject tiered links (T1 → money site, T2+ → lower tier)
|
||||
- [x] Inject homepage links (to `/index.html`)
|
||||
- [x] Inject "See Also" section (all batch articles)
|
||||
- [x] Use tier-appropriate anchor text
|
||||
- [x] Support job config overrides
|
||||
- [x] Update content in database
|
||||
- [x] Record links in `article_links` table
|
||||
- [x] Handle edge cases gracefully
|
||||
|
||||
---
|
||||
|
||||
## Next Actions
|
||||
|
||||
### For Story 3.3 Completion
|
||||
**Priority**: HIGH
|
||||
**Effort**: ~30 minutes
|
||||
|
||||
Integrate into `BatchProcessor.process_job()`:
|
||||
|
||||
```python
|
||||
# Add after content generation loop
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
from src.interlinking.content_injection import inject_interlinks
|
||||
from src.database.repositories import ArticleLinkRepository
|
||||
|
||||
# Get all generated content for this tier
|
||||
content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name)
|
||||
|
||||
# Generate URLs
|
||||
article_urls = generate_urls_for_batch(content_records, self.site_deployment_repo)
|
||||
|
||||
# Find tiered links
|
||||
tiered_links = find_tiered_links(
|
||||
content_records, job_config,
|
||||
self.project_repo, self.content_repo, self.site_deployment_repo
|
||||
)
|
||||
|
||||
# Inject interlinks
|
||||
link_repo = ArticleLinkRepository(session)
|
||||
inject_interlinks(
|
||||
content_records, article_urls, tiered_links,
|
||||
project, job_config, self.content_repo, link_repo
|
||||
)
|
||||
```
|
||||
|
||||
### For Story 4.x
|
||||
- Deploy final HTML with all links
|
||||
- Use `article_links` table for analytics
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Created
|
||||
- `src/interlinking/content_injection.py` (410 lines)
|
||||
- `tests/unit/test_content_injection.py` (363 lines, 33 tests)
|
||||
- `tests/integration/test_content_injection_integration.py` (469 lines, 9 tests)
|
||||
- `STORY_3.3_IMPLEMENTATION_SUMMARY.md`
|
||||
- `docs/stories/story-3.3-content-interlinking-injection.md`
|
||||
|
||||
### Modified
|
||||
- `src/templating/templates/basic.html`
|
||||
- `src/templating/templates/modern.html`
|
||||
- `src/templating/templates/classic.html`
|
||||
- `src/templating/templates/minimal.html`
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
**Risk Level**: LOW
|
||||
|
||||
**Why?**
|
||||
- 100% test pass rate
|
||||
- Comprehensive edge case coverage
|
||||
- No breaking changes to existing code
|
||||
- Only adds new functionality
|
||||
- Functions are isolated and well-tested
|
||||
|
||||
**Mitigation**:
|
||||
- Integration testing needed when adding to CLI
|
||||
- Monitor for performance with large batches (>100 articles)
|
||||
- Add logging when integrated into main workflow
|
||||
|
||||
---
|
||||
|
||||
## Approval
|
||||
|
||||
**Code Quality**: APPROVED ✓
|
||||
**Test Coverage**: APPROVED ✓
|
||||
**Functionality**: APPROVED ✓
|
||||
**Integration**: PENDING (needs CLI integration)
|
||||
|
||||
**Overall Status**: APPROVED FOR MERGE
|
||||
|
||||
**Recommendation**:
|
||||
1. Merge Story 3.3 code
|
||||
2. Add CLI integration in separate commit
|
||||
3. Test end-to-end with real batch
|
||||
4. Proceed to Story 4.x
|
||||
|
||||
---
|
||||
|
||||
## Contact
|
||||
|
||||
For questions about this QA report, see:
|
||||
- Full QA Report: `QA_REPORT_STORY_3.3.md`
|
||||
- Implementation Summary: `STORY_3.3_IMPLEMENTATION_SUMMARY.md`
|
||||
- Story Documentation: `docs/stories/story-3.3-content-interlinking-injection.md`
|
||||
|
||||
---
|
||||
|
||||
*QA conducted: October 21, 2025*
|
||||
|
||||
|
|
@ -1,281 +0,0 @@
|
|||
# Story 3.4: Boilerplate Site Pages - CREATED
|
||||
|
||||
**Status**: Specification Complete, Ready for Implementation
|
||||
**Date Created**: October 21, 2025
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Story 3.4 has been created to address the broken navigation menu links introduced in Story 3.3.
|
||||
|
||||
### The Problem
|
||||
|
||||
In Story 3.3, we added navigation menus to all HTML templates:
|
||||
```html
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="/index.html">Home</a></li>
|
||||
<li><a href="about.html">About</a></li>
|
||||
<li><a href="privacy.html">Privacy</a></li>
|
||||
<li><a href="contact.html">Contact</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
```
|
||||
|
||||
However, we never created the `about.html`, `contact.html`, or `privacy.html` pages, resulting in broken links.
|
||||
|
||||
### The Solution
|
||||
|
||||
Story 3.4 will automatically generate these boilerplate pages for each site during batch generation.
|
||||
|
||||
---
|
||||
|
||||
## What Will Be Delivered
|
||||
|
||||
### 1. Three Boilerplate Pages Per Site (Heading Only)
|
||||
- **About Page** (`about.html`) - `<h1>About Us</h1>` + template/navigation
|
||||
- **Contact Page** (`contact.html`) - `<h1>Contact</h1>` + template/navigation
|
||||
- **Privacy Policy** (`privacy.html`) - `<h1>Privacy Policy</h1>` + template/navigation
|
||||
|
||||
All pages have just a heading wrapped in the template structure. No other content text. User can add content manually later if desired.
|
||||
|
||||
### 2. Database Storage
|
||||
New `site_pages` table stores pages separately from articles:
|
||||
```sql
|
||||
CREATE TABLE site_pages (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
site_deployment_id INTEGER NOT NULL,
|
||||
page_type VARCHAR(20) NOT NULL, -- about, contact, privacy
|
||||
content TEXT NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (site_deployment_id) REFERENCES site_deployments(id),
|
||||
UNIQUE (site_deployment_id, page_type)
|
||||
);
|
||||
```
|
||||
|
||||
### 3. Template Integration
|
||||
- Pages use the same template as articles on the same site
|
||||
- Template read from `site.template_name` field in database
|
||||
- Professional, visually consistent with article content
|
||||
- Navigation menu included (which links to these same pages)
|
||||
|
||||
### 4. Smart Generation
|
||||
- Generated ONLY when new sites are created (not for existing sites)
|
||||
- One-time backfill script for all existing imported sites
|
||||
- Integrated into site creation workflow (not batch generation)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Scope
|
||||
|
||||
### Effort Estimate
|
||||
**14 story points** (reduced from 20, approximately 1.5-2 days of development)
|
||||
|
||||
Simplified due to:
|
||||
- Heading-only pages (no complex content generation)
|
||||
- No template service changes needed (template tracked in database)
|
||||
- No database tracking overhead (just check if files exist on bunny.net)
|
||||
|
||||
### Key Components
|
||||
|
||||
1. **Database Schema** (2 points)
|
||||
- New `SitePage` model
|
||||
- Migration script
|
||||
- Repository layer
|
||||
|
||||
2. **Page Content Templates** (1 point - simplified)
|
||||
- Heading-only page content
|
||||
- Returns `<h1>About Us</h1>`, `<h1>Contact</h1>`, `<h1>Privacy Policy</h1>`
|
||||
- No complex content generation
|
||||
|
||||
3. **Generation Logic** (2 points - simplified)
|
||||
- Generate heading-only pages for each site
|
||||
- Wrap heading in HTML template
|
||||
- Store in database
|
||||
|
||||
4. **Site Creation Integration** (2 points)
|
||||
- Hook into `site_provisioning.py`
|
||||
- Generate pages when new sites are created
|
||||
- Handle errors gracefully
|
||||
|
||||
5. **Backfill Script** (2 points)
|
||||
- CLI script to generate pages for all existing sites
|
||||
- Dry-run mode for safety
|
||||
- Progress reporting and error handling
|
||||
|
||||
6. **Testing** (3 points - simplified)
|
||||
- Unit tests for heading-only page generation
|
||||
- Integration tests with site creation
|
||||
- Backfill script testing
|
||||
- Template application tests
|
||||
|
||||
**Total: 14 story points** (reduced from 20)
|
||||
|
||||
---
|
||||
|
||||
## Integration Point
|
||||
|
||||
Story 3.4 hooks into site creation, not batch generation:
|
||||
|
||||
### One-Time Setup (Existing Sites)
|
||||
```bash
|
||||
# Backfill all existing imported sites (hundreds of sites)
|
||||
uv run python scripts/backfill_site_pages.py \
|
||||
--username admin \
|
||||
--password yourpass \
|
||||
--template basic
|
||||
|
||||
# Output: Generated pages for 423 sites
|
||||
```
|
||||
|
||||
### Ongoing (New Sites Only)
|
||||
```
|
||||
When creating new sites:
|
||||
1. Create Storage Zone (bunny.net)
|
||||
2. Create Pull Zone (bunny.net)
|
||||
3. Save to database
|
||||
4. ✨ Generate boilerplate pages (Story 3.4) ← NEW
|
||||
5. Return site ready to use
|
||||
|
||||
Triggered by:
|
||||
- provision-site CLI command
|
||||
- auto_create_sites in job config
|
||||
- create_sites_for_keywords in job config
|
||||
```
|
||||
|
||||
### Batch Generation (Unchanged)
|
||||
```
|
||||
1. Generate articles (Epic 2)
|
||||
2. Assign sites (Story 3.1) ← May use existing sites with pages
|
||||
3. Generate URLs (Story 3.1)
|
||||
4. Find tiered links (Story 3.2)
|
||||
5. Inject interlinks (Story 3.3)
|
||||
6. Apply templates (Story 2.4)
|
||||
7. Deploy (Epic 4) ← Pages already exist on site
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Created
|
||||
|
||||
### Documentation
|
||||
- `docs/stories/story-3.4-boilerplate-site-pages.md` (full specification)
|
||||
- `docs/prd/epic-3-pre-deployment.md` (updated to include Story 3.4)
|
||||
- `STORY_3.4_CREATED.md` (this summary)
|
||||
|
||||
### Implementation Files (To Be Created)
|
||||
- `src/generation/page_templates.py` - Generic page content
|
||||
- `src/generation/site_page_generator.py` - Page generation logic
|
||||
- `src/database/models.py` - SitePage model (update)
|
||||
- `scripts/migrate_add_site_pages.py` - Database migration
|
||||
- `scripts/backfill_site_pages.py` - One-time backfill script
|
||||
- `tests/unit/test_site_page_generator.py` - Unit tests
|
||||
- `tests/integration/test_site_pages_integration.py` - Integration tests
|
||||
- `tests/unit/test_backfill_script.py` - Backfill script tests
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Requires (Already Complete)
|
||||
- ✅ Story 3.1: Site assignment (need to know which sites are in use)
|
||||
- ✅ Story 3.3: Navigation menus (these pages fulfill those links)
|
||||
- ✅ Story 2.4: Template service (for applying HTML templates)
|
||||
- ✅ Story 1.6: SiteDeployment table (for site relationships)
|
||||
|
||||
### Enables
|
||||
- Story 4.1: Deployment (pages will be deployed along with articles)
|
||||
- Complete, professional-looking sites with working navigation
|
||||
|
||||
---
|
||||
|
||||
## Example Output
|
||||
|
||||
### Site Structure After Story 3.4
|
||||
```
|
||||
https://example.com/
|
||||
├── index.html (homepage - future/Epic 4)
|
||||
├── about.html ← NEW (Story 3.4)
|
||||
├── contact.html ← NEW (Story 3.4)
|
||||
├── privacy.html ← NEW (Story 3.4)
|
||||
├── how-to-fix-your-engine.html (article)
|
||||
├── engine-maintenance-tips.html (article)
|
||||
└── best-engine-oil-brands.html (article)
|
||||
```
|
||||
|
||||
### About Page Preview (Heading Only)
|
||||
```html
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>About Us</title>
|
||||
<!-- Same template/styling as articles -->
|
||||
</head>
|
||||
<body>
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="/index.html">Home</a></li>
|
||||
<li><a href="about.html">About</a></li>
|
||||
<li><a href="privacy.html">Privacy</a></li>
|
||||
<li><a href="contact.html">Contact</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
<main>
|
||||
<h1>About Us</h1>
|
||||
<!-- No other content - user can add manually later if desired -->
|
||||
</main>
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
**Why heading-only pages?**
|
||||
- Fixes broken navigation links (no 404 errors)
|
||||
- Better UX than completely blank (user sees page title)
|
||||
- Minimal implementation effort
|
||||
- User can customize specific sites later if needed
|
||||
- Deployment ready as-is
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Option 1: Implement Now
|
||||
- Start implementation of Story 3.4
|
||||
- Fixes broken navigation links
|
||||
- Makes sites look complete and professional
|
||||
|
||||
### Option 2: Defer to Later
|
||||
- Add to backlog/technical debt
|
||||
- Focus on Epic 4 deployment first
|
||||
- Sites work but have broken nav links temporarily
|
||||
|
||||
### Option 3: Minimal Quick Fix
|
||||
- Create simple placeholder pages without full story implementation
|
||||
- Just enough to avoid 404 errors
|
||||
- Come back later for full implementation
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Implement Story 3.4 before Epic 4 deployment** because:
|
||||
|
||||
1. Sites look unprofessional with broken nav links
|
||||
2. Fixes 404 errors on every deployed site
|
||||
3. Only 15 story points (1.5-2 days) - simplified implementation
|
||||
4. Empty pages are deployment-ready
|
||||
5. User can add content to specific pages later if desired
|
||||
|
||||
The alternative is to deploy with broken links and fix later, but that creates technical debt and poor user experience.
|
||||
|
||||
**Simplified approach:** Pages have heading only (e.g., `<h1>About Us</h1>`), no body content. This makes implementation faster while still fixing the broken link issue and providing better UX than completely blank pages.
|
||||
|
||||
---
|
||||
|
||||
**Created by**: AI Code Assistant
|
||||
**Created on**: October 21, 2025
|
||||
**Next**: Decide when to implement Story 3.4 (now vs. later vs. minimal fix)
|
||||
|
||||
|
|
@ -1,316 +0,0 @@
|
|||
# Story 3.4: Generate Boilerplate Site Pages - Implementation Summary
|
||||
|
||||
## Status
|
||||
**QA COMPLETE** - Ready for Production
|
||||
|
||||
## Story Overview
|
||||
Automatically generate boilerplate `about.html`, `contact.html`, and `privacy.html` pages for each site in a batch, so that the navigation menu links from Story 3.3 work and the sites appear complete.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. Database Layer
|
||||
|
||||
#### SitePage Model (`src/database/models.py`)
|
||||
- Created `SitePage` model with following fields:
|
||||
- `id`, `site_deployment_id`, `page_type`, `content`, `created_at`, `updated_at`
|
||||
- Foreign key to `site_deployments` with CASCADE delete
|
||||
- Unique constraint on `(site_deployment_id, page_type)`
|
||||
- Indexes on `site_deployment_id` and `page_type`
|
||||
|
||||
#### ISitePageRepository Interface (`src/database/interfaces.py`)
|
||||
- Defined repository interface with methods:
|
||||
- `create(site_deployment_id, page_type, content) -> SitePage`
|
||||
- `get_by_site(site_deployment_id) -> List[SitePage]`
|
||||
- `get_by_site_and_type(site_deployment_id, page_type) -> Optional[SitePage]`
|
||||
- `update_content(page_id, content) -> SitePage`
|
||||
- `exists(site_deployment_id, page_type) -> bool`
|
||||
- `delete(page_id) -> bool`
|
||||
|
||||
#### SitePageRepository Implementation (`src/database/repositories.py`)
|
||||
- Implemented all repository methods with proper error handling
|
||||
- Enforces unique constraint (one page of each type per site)
|
||||
- Handles IntegrityError for duplicate pages
|
||||
|
||||
### 2. Page Content Generation
|
||||
|
||||
#### Page Templates (`src/generation/page_templates.py`)
|
||||
- Simple heading-only content generation
|
||||
- Returns `<h1>About Us</h1>`, `<h1>Contact</h1>`, `<h1>Privacy Policy</h1>`
|
||||
- Takes domain parameter for future enhancements
|
||||
|
||||
#### Site Page Generator (`src/generation/site_page_generator.py`)
|
||||
- Main function: `generate_site_pages(site_deployment, page_repo, template_service)`
|
||||
- Generates all three page types (about, contact, privacy)
|
||||
- Uses site's template (from `site.template_name` field)
|
||||
- Skips pages that already exist
|
||||
- Logs generation progress at INFO level
|
||||
- Helper function: `get_domain_from_site()` extracts custom or b-cdn hostname
|
||||
|
||||
### 3. Integration with Site Provisioning
|
||||
|
||||
#### Site Provisioning Updates (`src/generation/site_provisioning.py`)
|
||||
- Updated `create_bunnynet_site()` to accept optional `page_repo` and `template_service`
|
||||
- Generates pages automatically after site creation
|
||||
- Graceful error handling - logs warning if page generation fails but continues site creation
|
||||
- Updated `provision_keyword_sites()` and `create_generic_sites()` to pass through parameters
|
||||
|
||||
#### Site Assignment Updates (`src/generation/site_assignment.py`)
|
||||
- Updated `assign_sites_to_batch()` to accept optional `page_repo` and `template_service`
|
||||
- Passes parameters through to provisioning functions
|
||||
- Pages generated when new sites are auto-created
|
||||
|
||||
### 4. Database Migration
|
||||
|
||||
#### Migration Script (`scripts/migrate_add_site_pages.py`)
|
||||
- Creates `site_pages` table with proper schema
|
||||
- Creates indexes on `site_deployment_id` and `page_type`
|
||||
- Verification step confirms table and columns exist
|
||||
- Idempotent - checks if table exists before creating
|
||||
|
||||
### 5. Backfill Script
|
||||
|
||||
#### Backfill Script (`scripts/backfill_site_pages.py`)
|
||||
- Generates pages for all existing sites without them
|
||||
- Admin authentication required
|
||||
- Supports dry-run mode to preview changes
|
||||
- Progress reporting with batch checkpoints
|
||||
- Usage:
|
||||
```bash
|
||||
uv run python scripts/backfill_site_pages.py \
|
||||
--username admin \
|
||||
--password yourpass \
|
||||
--dry-run
|
||||
|
||||
# Actually generate pages
|
||||
uv run python scripts/backfill_site_pages.py \
|
||||
--username admin \
|
||||
--password yourpass \
|
||||
--batch-size 50
|
||||
```
|
||||
|
||||
### 6. Testing
|
||||
|
||||
#### Unit Tests
|
||||
- **test_site_page_generator.py** (9 tests):
|
||||
- Domain extraction (custom vs b-cdn hostname)
|
||||
- Page generation success cases
|
||||
- Template selection
|
||||
- Skipping existing pages
|
||||
- Error handling
|
||||
|
||||
- **test_site_page_repository.py** (11 tests):
|
||||
- CRUD operations
|
||||
- Duplicate page prevention
|
||||
- Update and delete operations
|
||||
- Exists checks
|
||||
|
||||
- **test_page_templates.py** (6 tests):
|
||||
- Content generation for all page types
|
||||
- Unknown page type handling
|
||||
- HTML structure validation
|
||||
|
||||
#### Integration Tests
|
||||
- **test_site_page_integration.py** (11 tests):
|
||||
- Full flow: site creation → page generation → database storage
|
||||
- Template application
|
||||
- Duplicate prevention
|
||||
- Multiple sites with separate pages
|
||||
- Custom domain handling
|
||||
- Page retrieval by type
|
||||
|
||||
**All tests passing:** 37/37
|
||||
|
||||
## Key Features
|
||||
|
||||
1. **Heading-Only Pages**: Simple approach - just `<h1>` tags wrapped in templates
|
||||
2. **Template Integration**: Uses same template as site's articles (consistent look)
|
||||
3. **Automatic Generation**: Pages created when new sites are provisioned
|
||||
4. **Backfill Support**: Script to add pages to existing sites
|
||||
5. **Database Integrity**: Unique constraint prevents duplicates
|
||||
6. **Graceful Degradation**: Page generation failures don't break site creation
|
||||
7. **Optional Parameters**: Backward compatible - old code still works without page generation
|
||||
|
||||
## Integration Points
|
||||
|
||||
### When Pages Are Generated
|
||||
1. **Site Provisioning**: When `create_bunnynet_site()` is called with `page_repo` and `template_service`
|
||||
2. **Keyword Site Creation**: When `provision_keyword_sites()` creates new sites
|
||||
3. **Generic Site Creation**: When `create_generic_sites()` creates sites for batch jobs
|
||||
4. **Backfill**: When running the backfill script on existing sites
|
||||
|
||||
### When Pages Are NOT Generated
|
||||
- During batch processing (sites already exist)
|
||||
- When parameters are not provided (backward compatibility)
|
||||
- When bunny_client is None (no site creation happening)
|
||||
|
||||
## Files Modified
|
||||
|
||||
### New Files
|
||||
- `src/generation/site_page_generator.py`
|
||||
- `tests/unit/test_site_page_generator.py`
|
||||
- `tests/unit/test_site_page_repository.py`
|
||||
- `tests/unit/test_page_templates.py`
|
||||
- `tests/integration/test_site_page_integration.py`
|
||||
|
||||
### Modified Files
|
||||
- `src/database/models.py` - Added SitePage model
|
||||
- `src/database/interfaces.py` - Added ISitePageRepository interface
|
||||
- `src/database/repositories.py` - Added SitePageRepository implementation
|
||||
- `src/generation/site_provisioning.py` - Integrated page generation
|
||||
- `src/generation/site_assignment.py` - Pass through parameters
|
||||
- `scripts/backfill_site_pages.py` - Fixed imports and function calls
|
||||
|
||||
### Existing Files (Already Present)
|
||||
- `src/generation/page_templates.py` - Simple content generation
|
||||
- `scripts/migrate_add_site_pages.py` - Database migration
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
### 1. Empty Pages Instead of Full Content
|
||||
**Decision**: Use heading-only pages (`<h1>` tag only)
|
||||
|
||||
**Rationale**:
|
||||
- Fixes broken navigation links (pages exist, no 404s)
|
||||
- Better UX than completely empty (user sees page title)
|
||||
- Minimal maintenance overhead
|
||||
- User can add custom content later if needed
|
||||
- Reduces Story 3.4 effort from 20 to 14 story points
|
||||
|
||||
### 2. Separate `site_pages` Table
|
||||
**Decision**: Store pages in separate table from `generated_content`
|
||||
|
||||
**Rationale**:
|
||||
- Pages are fundamentally different from articles
|
||||
- Different schema requirements (no tier, keyword, etc.)
|
||||
- Clean separation of concerns
|
||||
- Easier to query and manage
|
||||
|
||||
### 3. Template from Site Record
|
||||
**Decision**: Read `site.template_name` from database instead of passing as parameter
|
||||
|
||||
**Rationale**:
|
||||
- Template is already stored on site record
|
||||
- Ensures consistency with articles on same site
|
||||
- Simpler function signatures
|
||||
- Single source of truth
|
||||
|
||||
### 4. Optional Parameters
|
||||
**Decision**: Make `page_repo` and `template_service` optional in provisioning functions
|
||||
|
||||
**Rationale**:
|
||||
- Backward compatibility with existing code
|
||||
- Graceful degradation if not provided
|
||||
- Easy to add to new code paths incrementally
|
||||
|
||||
### 5. Integration at Site Creation
|
||||
**Decision**: Generate pages when sites are created, not during batch processing
|
||||
|
||||
**Rationale**:
|
||||
- Pages are site-level resources, not article-level
|
||||
- Only generate once per site (not per batch)
|
||||
- Backfill script handles existing sites
|
||||
- Clean separation: provisioning creates infrastructure, batch creates content
|
||||
|
||||
## Deferred to Later
|
||||
|
||||
### Homepage Generation
|
||||
- **Status**: Deferred to Epic 4
|
||||
- **Reason**: Homepage requires listing all articles on site, which is deployment-time logic
|
||||
- **Workaround**: `/index.html` link can 404 until Epic 4
|
||||
|
||||
### Custom Page Content
|
||||
- **Status**: Not implemented
|
||||
- **Future Enhancement**: Allow projects to override generic templates
|
||||
- **Alternative**: Users can manually edit pages via backfill update or direct database access
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### 1. Creating a New Site with Pages
|
||||
```python
|
||||
from src.generation.site_provisioning import create_bunnynet_site
|
||||
from src.database.repositories import SiteDeploymentRepository, SitePageRepository
|
||||
from src.templating.service import TemplateService
|
||||
|
||||
site_repo = SiteDeploymentRepository(session)
|
||||
page_repo = SitePageRepository(session)
|
||||
template_service = TemplateService()
|
||||
|
||||
site = create_bunnynet_site(
|
||||
name_prefix="my-site",
|
||||
bunny_client=bunny_client,
|
||||
site_repo=site_repo,
|
||||
region="DE",
|
||||
page_repo=page_repo,
|
||||
template_service=template_service
|
||||
)
|
||||
# Pages are automatically created for about, contact, privacy
|
||||
```
|
||||
|
||||
### 2. Backfilling Existing Sites
|
||||
```bash
|
||||
# Dry run first
|
||||
uv run python scripts/backfill_site_pages.py \
|
||||
--username admin \
|
||||
--password yourpass \
|
||||
--dry-run
|
||||
|
||||
# Actually generate pages
|
||||
uv run python scripts/backfill_site_pages.py \
|
||||
--username admin \
|
||||
--password yourpass
|
||||
```
|
||||
|
||||
### 3. Checking if Pages Exist
|
||||
```python
|
||||
page_repo = SitePageRepository(session)
|
||||
|
||||
if page_repo.exists(site_id, "about"):
|
||||
print("About page exists")
|
||||
|
||||
pages = page_repo.get_by_site(site_id)
|
||||
print(f"Site has {len(pages)} pages")
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- Page generation adds ~1-2 seconds per site (3 pages × template application)
|
||||
- Database operations are optimized with indexes
|
||||
- Unique constraint prevents duplicate work
|
||||
- Batch processing unaffected (only generates for new sites)
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Epic 4: Deployment
|
||||
- Deploy generated pages to bunny.net storage
|
||||
- Create homepage (`index.html`) with article listing
|
||||
- Implement deployment pipeline for all HTML files
|
||||
|
||||
### Future Enhancements
|
||||
- Custom page content templates
|
||||
- Multi-language support
|
||||
- User-editable pages via CLI/web interface
|
||||
- Additional pages (terms, disclaimer, etc.)
|
||||
- Privacy policy content generation
|
||||
|
||||
## Acceptance Criteria Checklist
|
||||
|
||||
- [x] Function generates three boilerplate pages for a given site
|
||||
- [x] Pages created AFTER articles are generated but BEFORE deployment
|
||||
- [x] Each page uses same template as articles for that site
|
||||
- [x] Pages stored in database for deployment
|
||||
- [x] Pages associated with correct site via `site_deployment_id`
|
||||
- [x] Empty pages with just template applied (heading only)
|
||||
- [x] Template integration uses existing `format_content()` method
|
||||
- [x] Database table with proper schema and constraints
|
||||
- [x] Integration with site creation (not batch processor)
|
||||
- [x] Backfill script for existing sites with dry-run mode
|
||||
- [x] Unit tests with >80% coverage
|
||||
- [x] Integration tests covering full flow
|
||||
|
||||
## Conclusion
|
||||
|
||||
Story 3.4 is **COMPLETE**. All acceptance criteria met, tests passing, and code integrated into the main workflow. Sites now automatically get boilerplate pages that match their template, fixing broken navigation links from Story 3.3.
|
||||
|
||||
**Effort**: 14 story points (completed as estimated)
|
||||
**Test Coverage**: 37 tests (26 unit + 11 integration)
|
||||
**Status**: Ready for Epic 4 (Deployment)
|
||||
|
|
@ -1,49 +0,0 @@
|
|||
# Story 3.4 QA Summary
|
||||
|
||||
## Status: PASSED - Ready for Production
|
||||
|
||||
## Test Results
|
||||
- **37/37 tests passing** (26 unit + 11 integration)
|
||||
- **0 failures**
|
||||
- **0 linter errors** in new code
|
||||
- **Database migration verified**
|
||||
|
||||
## Key Findings
|
||||
|
||||
### Strengths
|
||||
1. **Complete test coverage** - All functionality tested at unit and integration level
|
||||
2. **Clean implementation** - No code quality issues, proper error handling
|
||||
3. **Backward compatible** - Optional parameters don't break existing code
|
||||
4. **Well documented** - Clear docstrings and comprehensive documentation
|
||||
5. **Database integrity** - Proper indexes, constraints, and foreign keys
|
||||
|
||||
### Acceptance Criteria
|
||||
All acceptance criteria verified:
|
||||
- Generates 3 pages (about, contact, privacy) per site
|
||||
- Uses same template as site articles
|
||||
- Stores in database with proper associations
|
||||
- Integrates with site provisioning
|
||||
- Backfill script available with dry-run mode
|
||||
- Graceful error handling
|
||||
|
||||
### Implementation Quality
|
||||
- **Design patterns:** Repository pattern, dependency injection
|
||||
- **Code organization:** Modular with clear separation of concerns
|
||||
- **Performance:** ~1-2 seconds per site (acceptable)
|
||||
- **Scalability:** Can handle hundreds of sites via backfill
|
||||
|
||||
## Minor Notes
|
||||
- SQLAlchemy deprecation warnings (96) about `datetime.utcnow()` - not related to Story 3.4
|
||||
- Markdown linter warnings - pre-existing style issues, not functional problems
|
||||
|
||||
## Recommendation
|
||||
**APPROVED for production**. Story 3.4 is complete, tested, and ready for deployment.
|
||||
|
||||
## Next Steps
|
||||
1. Story status updated to "QA COMPLETE"
|
||||
2. Ready for Epic 4 (Deployment)
|
||||
3. Consider addressing SQLAlchemy deprecation warnings in future sprint
|
||||
|
||||
---
|
||||
QA completed: October 22, 2025
|
||||
|
||||
|
|
@ -1,196 +0,0 @@
|
|||
# Story 4.1 Implementation Summary
|
||||
|
||||
## Status: COMPLETE
|
||||
|
||||
## Overview
|
||||
Successfully implemented deployment of generated content to Bunny.net cloud storage with tier-segregated URL logging and automatic deployment after batch generation.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. Bunny.net Storage Client (`src/deployment/bunny_storage.py`)
|
||||
- `BunnyStorageClient` class for uploading files to Bunny.net storage zones
|
||||
- Uses per-zone `storage_zone_password` from database for authentication
|
||||
- Region-aware URL generation:
|
||||
- Frankfurt (DE): `storage.bunnycdn.com` (no prefix)
|
||||
- Other regions: `{region}.storage.bunnycdn.com` (e.g., `la.storage.bunnycdn.com`)
|
||||
- Uses `application/octet-stream` content-type per Bunny.net API requirements
|
||||
- Implements retry logic with exponential backoff (3 attempts)
|
||||
- Methods:
|
||||
- `upload_file()`: Upload HTML content to storage zone
|
||||
- `file_exists()`: Check if file exists in storage
|
||||
- `list_files()`: List files in storage zone
|
||||
- **Tested with real Bunny.net storage** - successful upload verified
|
||||
|
||||
### 2. Database Updates
|
||||
- Added `deployed_url` (TEXT, nullable) to `generated_content` table
|
||||
- Added `deployed_at` (TIMESTAMP, nullable, indexed) to `generated_content` table
|
||||
- Created migration script: `scripts/migrate_add_deployment_fields.py`
|
||||
- Added repository methods:
|
||||
- `GeneratedContentRepository.mark_as_deployed()`: Update deployment status
|
||||
- `GeneratedContentRepository.get_deployed_content()`: Query deployed articles
|
||||
|
||||
### 3. URL Logger (`src/deployment/url_logger.py`)
|
||||
- `URLLogger` class for tier-segregated URL logging
|
||||
- Creates daily log files in `deployment_logs/` directory:
|
||||
- `YYYY-MM-DD_tier1_urls.txt` for Tier 1 articles
|
||||
- `YYYY-MM-DD_other_tiers_urls.txt` for Tier 2+ articles
|
||||
- Automatic duplicate prevention by reading existing URLs before appending
|
||||
- Boilerplate pages (about, contact, privacy) are NOT logged
|
||||
|
||||
### 4. URL Generation (`src/generation/url_generator.py`)
|
||||
Extended with new functions:
|
||||
- `generate_public_url()`: Create full HTTPS URL from site + file path
|
||||
- `generate_file_path()`: Generate storage path for articles (slug-based)
|
||||
- `generate_page_file_path()`: Generate storage path for boilerplate pages
|
||||
|
||||
### 5. Deployment Service (`src/deployment/deployment_service.py`)
|
||||
- `DeploymentService` class orchestrates deployment workflow
|
||||
- `deploy_batch()`: Deploy all content for a project
|
||||
- Uploads articles with `formatted_html`
|
||||
- Uploads boilerplate pages (about, contact, privacy)
|
||||
- Logs article URLs to tier-segregated files
|
||||
- Updates database with deployment status and URLs
|
||||
- Returns detailed statistics
|
||||
- `deploy_article()`: Deploy single article
|
||||
- `deploy_boilerplate_page()`: Deploy single boilerplate page
|
||||
- Continues on error by default (configurable)
|
||||
|
||||
### 6. CLI Command (`src/cli/commands.py`)
|
||||
Added `deploy-batch` command:
|
||||
```bash
|
||||
uv run python -m src.cli deploy-batch \
|
||||
--batch-id 123 \
|
||||
--admin-user admin \
|
||||
--admin-password mypass
|
||||
```
|
||||
|
||||
Options:
|
||||
- `--batch-id` (required): Project/batch ID to deploy
|
||||
- `--admin-user` / `--admin-password`: Authentication
|
||||
- `--continue-on-error`: Continue if file fails (default: True)
|
||||
- `--dry-run`: Preview what would be deployed
|
||||
|
||||
### 7. Automatic Deployment Integration (`src/generation/batch_processor.py`)
|
||||
- Added `auto_deploy` parameter to `process_job()` (default: True)
|
||||
- Deployment triggers automatically after all tiers complete
|
||||
- Uses same `DeploymentService` as manual CLI command
|
||||
- Graceful error handling (logs warning, continues batch processing)
|
||||
- Can be disabled via `auto_deploy=False` for testing
|
||||
|
||||
### 8. Configuration (`src/core/config.py`)
|
||||
- Added `get_bunny_storage_api_key()` validation function
|
||||
- Checks for `BUNNY_API_KEY` in `.env` file
|
||||
- Clear error messages if keys are missing
|
||||
|
||||
### 9. Testing (`tests/integration/test_deployment.py`)
|
||||
Comprehensive integration tests covering:
|
||||
- URL generation and slug creation
|
||||
- Tier-segregated URL logging with duplicate prevention
|
||||
- Bunny.net storage client uploads
|
||||
- Deployment service (articles, pages, batches)
|
||||
- All 13 tests passing
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
deployment_logs/
|
||||
YYYY-MM-DD_tier1_urls.txt # Tier 1 article URLs
|
||||
YYYY-MM-DD_other_tiers_urls.txt # Tier 2+ article URLs
|
||||
|
||||
src/deployment/
|
||||
bunny_storage.py # Storage upload client
|
||||
deployment_service.py # Main deployment orchestration
|
||||
url_logger.py # Tier-segregated URL logging
|
||||
|
||||
scripts/
|
||||
migrate_add_deployment_fields.py # Database migration
|
||||
|
||||
tests/integration/
|
||||
test_deployment.py # Integration tests
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
ALTER TABLE generated_content ADD COLUMN deployed_url TEXT NULL;
|
||||
ALTER TABLE generated_content ADD COLUMN deployed_at TIMESTAMP NULL;
|
||||
CREATE INDEX idx_generated_content_deployed ON generated_content(deployed_at);
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Manual Deployment
|
||||
```bash
|
||||
# Deploy a specific batch
|
||||
uv run python -m src.cli deploy-batch --batch-id 123 --admin-user admin --admin-password pass
|
||||
|
||||
# Dry run to preview
|
||||
uv run python -m src.cli deploy-batch --batch-id 123 --dry-run
|
||||
```
|
||||
|
||||
### Automatic Deployment
|
||||
```bash
|
||||
# Generate batch with auto-deployment (default)
|
||||
uv run python -m src.cli generate-batch --job-file jobs/my_job.json
|
||||
|
||||
# Generate without auto-deployment
|
||||
# (Add --auto-deploy flag to generate-batch command if needed)
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Authentication**: Uses per-zone `storage_zone_password` from database for uploads. No API key from `.env` needed for storage operations. The `BUNNY_ACCOUNT_API_KEY` is only for zone creation/management.
|
||||
2. **File Locking**: Skipped for simplicity - duplicate prevention via file reading is sufficient
|
||||
3. **Auto-deploy Default**: ON by default for convenience, can be disabled for testing
|
||||
4. **Continue on Error**: Enabled by default to ensure partial deployments complete
|
||||
5. **URL Logging**: Simple text files (one URL per line) for easy parsing by Story 4.2
|
||||
6. **Boilerplate Pages**: Deploy stored HTML from `site_pages.content` (from Story 3.4)
|
||||
|
||||
## Dependencies Met
|
||||
|
||||
- Story 3.1: Site assignment (articles have `site_deployment_id`)
|
||||
- Story 3.3: Content interlinking (HTML is finalized)
|
||||
- Story 3.4: Boilerplate pages (`SitePage` table exists)
|
||||
|
||||
## Environment Variables Required
|
||||
|
||||
```bash
|
||||
BUNNY_ACCOUNT_API_KEY=your_account_api_key_here # For zone creation (already existed)
|
||||
```
|
||||
|
||||
**Important**: File uploads do NOT use an API key from `.env`. They use the per-zone `storage_zone_password` stored in the database (in the `site_deployments` table). This password is set automatically when zones are created via `provision-site` or `sync-sites` commands.
|
||||
|
||||
## Testing Results
|
||||
|
||||
All 18 tests passing:
|
||||
- URL generation (4 tests)
|
||||
- URL logging (4 tests)
|
||||
- Storage client (2 tests)
|
||||
- Deployment service (3 tests)
|
||||
- Storage URL generation (5 tests - including DE region special case)
|
||||
|
||||
**Real-world validation:** Successfully uploaded test file to Bunny.net storage and verified HTTP 201 response.
|
||||
|
||||
## Known Limitations / Technical Debt
|
||||
|
||||
1. Only supports Bunny.net (multi-cloud deferred to future stories)
|
||||
2. No CDN cache purging after deployment (Story 4.x)
|
||||
3. No deployment verification/validation (Story 4.4)
|
||||
4. URL logging is file-based (no database tracking)
|
||||
5. Boilerplate pages stored as full HTML in DB (inefficient, works for now)
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Story 4.2: URL logging enhancements (partially implemented here)
|
||||
- Story 4.3: Database status updates (partially implemented here)
|
||||
- Story 4.4: Post-deployment verification
|
||||
- Future: Multi-cloud support, CDN cache purging, parallel uploads
|
||||
|
||||
## Notes
|
||||
|
||||
- Simple and reliable implementation prioritized over complex features
|
||||
- Auto-deployment is the default happy path
|
||||
- Manual CLI command available for re-deployment or troubleshooting
|
||||
- Comprehensive error reporting for debugging
|
||||
- All API keys managed via `.env` only (not `master.config.json`)
|
||||
|
||||
|
|
@ -1,172 +0,0 @@
|
|||
# Story 4.1: Deploy Content to Cloud - Quick Start Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Ensure `BUNNY_ACCOUNT_API_KEY` is in your `.env` file (for creating zones):
|
||||
```bash
|
||||
BUNNY_ACCOUNT_API_KEY=your_account_api_key_here
|
||||
```
|
||||
|
||||
**Note**: File uploads use per-zone `storage_zone_password` from the database, NOT an API key from `.env`. These passwords are set automatically when sites are created via `provision-site` or `sync-sites` commands.
|
||||
|
||||
2. Run the database migration:
|
||||
```bash
|
||||
uv run python scripts/migrate_add_deployment_fields.py
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Automatic Deployment (Recommended)
|
||||
|
||||
Content deploys automatically after batch generation completes:
|
||||
|
||||
```bash
|
||||
uv run python -m src.cli generate-batch \
|
||||
--job-file jobs/my_job.json \
|
||||
--username admin \
|
||||
--password mypass
|
||||
```
|
||||
|
||||
Output will show deployment progress after all tiers complete:
|
||||
```
|
||||
Deployment: Starting automatic deployment for project 123...
|
||||
Deployment: 48 articles, 6 pages deployed
|
||||
Deployment: Complete in 45.2s
|
||||
```
|
||||
|
||||
### Manual Deployment
|
||||
|
||||
Deploy (or re-deploy) a batch manually:
|
||||
|
||||
```bash
|
||||
uv run python -m src.cli deploy-batch \
|
||||
--batch-id 123 \
|
||||
--admin-user admin \
|
||||
--admin-password mypass
|
||||
```
|
||||
|
||||
### Dry Run Mode
|
||||
|
||||
Preview what would be deployed without actually uploading:
|
||||
|
||||
```bash
|
||||
uv run python -m src.cli deploy-batch \
|
||||
--batch-id 123 \
|
||||
--dry-run
|
||||
```
|
||||
|
||||
## What Gets Deployed
|
||||
|
||||
1. **Articles**: All generated articles with `formatted_html`
|
||||
- Uploaded to: `{slug}.html` (e.g., `how-to-fix-engines.html`)
|
||||
- URL logged to: `deployment_logs/YYYY-MM-DD_tier1_urls.txt` (Tier 1)
|
||||
- URL logged to: `deployment_logs/YYYY-MM-DD_other_tiers_urls.txt` (Tier 2+)
|
||||
|
||||
2. **Boilerplate Pages**: About, contact, privacy (if they exist)
|
||||
- Uploaded to: `about.html`, `contact.html`, `privacy.html`
|
||||
- NOT logged to URL files
|
||||
|
||||
## URL Logging
|
||||
|
||||
Deployed article URLs are automatically logged to tier-segregated files:
|
||||
|
||||
```
|
||||
deployment_logs/
|
||||
2025-10-22_tier1_urls.txt
|
||||
2025-10-22_other_tiers_urls.txt
|
||||
```
|
||||
|
||||
Each file contains one URL per line:
|
||||
```
|
||||
https://example.com/article-1.html
|
||||
https://example.com/article-2.html
|
||||
https://example.com/article-3.html
|
||||
```
|
||||
|
||||
Duplicate URLs are automatically prevented (safe to re-run deployments).
|
||||
|
||||
## Database Updates
|
||||
|
||||
After successful deployment, each article is updated with:
|
||||
- `deployed_url`: Public URL where content is live
|
||||
- `deployed_at`: Timestamp of deployment
|
||||
- `status`: Changed to 'deployed'
|
||||
|
||||
Query deployed content:
|
||||
```python
|
||||
from src.database.session import db_manager
|
||||
from src.database.repositories import GeneratedContentRepository
|
||||
|
||||
session = db_manager.get_session()
|
||||
repo = GeneratedContentRepository(session)
|
||||
|
||||
deployed = repo.get_deployed_content(project_id=123)
|
||||
for article in deployed:
|
||||
print(f"{article.title}: {article.deployed_url}")
|
||||
```
|
||||
|
||||
## Deployment Summary
|
||||
|
||||
After deployment completes, you'll see a summary:
|
||||
|
||||
```
|
||||
======================================================================
|
||||
Deployment Summary
|
||||
======================================================================
|
||||
Articles deployed: 48
|
||||
Articles failed: 2
|
||||
Pages deployed: 6
|
||||
Pages failed: 0
|
||||
Total time: 45.2s
|
||||
|
||||
Errors:
|
||||
Article 15 (Engine Maintenance Tips): Connection timeout
|
||||
Article 32 (Common Problems): Invalid HTML content
|
||||
======================================================================
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
By default, deployment continues even if individual files fail. This ensures partial deployments complete successfully.
|
||||
|
||||
Failed files are:
|
||||
- Logged to console with error details
|
||||
- Listed in deployment summary
|
||||
- NOT marked as deployed in database
|
||||
|
||||
To stop on first error:
|
||||
```bash
|
||||
uv run python -m src.cli deploy-batch \
|
||||
--batch-id 123 \
|
||||
--continue-on-error false
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Authentication failed for zone"
|
||||
Check that the `storage_zone_password` in your database is correct. This is set when sites are created via `provision-site` or `sync-sites` commands.
|
||||
|
||||
### "Article has no formatted_html to deploy"
|
||||
Ensure articles have templates applied. This happens automatically during batch processing in `_post_process_tier()`.
|
||||
|
||||
### "Site not found"
|
||||
Ensure articles are assigned to sites. This happens automatically during batch processing via site assignment logic.
|
||||
|
||||
## Manual Re-deployment
|
||||
|
||||
To re-deploy content after fixes:
|
||||
|
||||
1. Fix the issue (update HTML, fix credentials, etc.)
|
||||
2. Run manual deployment:
|
||||
```bash
|
||||
uv run python -m src.cli deploy-batch --batch-id 123
|
||||
```
|
||||
3. Duplicate URLs are automatically prevented in log files
|
||||
|
||||
## Integration with Other Stories
|
||||
|
||||
- **Story 3.1**: Articles must be assigned to sites before deployment
|
||||
- **Story 3.4**: Boilerplate pages are deployed if they exist in `site_pages` table
|
||||
- **Story 4.2**: URL log files are consumed by post-deployment processes
|
||||
- **Story 4.3**: Database status updates enable deployment tracking
|
||||
|
||||
|
|
@ -1,65 +0,0 @@
|
|||
# Story 4.1: Real Upload Validation
|
||||
|
||||
## Summary
|
||||
Successfully validated real file uploads to Bunny.net storage on October 22, 2025.
|
||||
|
||||
## Test Details
|
||||
|
||||
**Storage Zone:** 5axislaser925
|
||||
**Region:** DE (Frankfurt)
|
||||
**File:** story-4.1-test.html
|
||||
**Result:** HTTP 201 (Success)
|
||||
**Public URL:** https://5axislaser925.b-cdn.net/story-4.1-test.html
|
||||
|
||||
## Key Learnings
|
||||
|
||||
### 1. Region-Specific URLs
|
||||
Frankfurt (DE) is the default region and uses a different URL pattern:
|
||||
- **DE:** `https://storage.bunnycdn.com/{zone}/{file}`
|
||||
- **Other regions:** `https://{region}.storage.bunnycdn.com/{zone}/{file}`
|
||||
|
||||
This was implemented in `BunnyStorageClient._get_storage_url()`.
|
||||
|
||||
### 2. Content-Type Requirements
|
||||
Per Bunny.net API documentation:
|
||||
- **Required:** `application/octet-stream`
|
||||
- **NOT** `text/html` or other MIME types
|
||||
- File content must be raw binary (we use `.encode('utf-8')`)
|
||||
|
||||
### 3. Success Response Code
|
||||
- Bunny.net returns **HTTP 201** for successful uploads (not 200)
|
||||
- This is documented in their API reference
|
||||
|
||||
### 4. Authentication
|
||||
- Uses per-zone `storage_zone_password` via `AccessKey` header
|
||||
- Password is stored in database (`site_deployments.storage_zone_password`)
|
||||
- Set automatically when zones are created via `provision-site` or `sync-sites`
|
||||
- NO API key from `.env` needed for uploads
|
||||
|
||||
## Implementation Changes Made
|
||||
|
||||
1. **Fixed region URL logic** - DE uses no prefix
|
||||
2. **Changed default Content-Type** - Now uses `application/octet-stream`
|
||||
3. **Updated success detection** - Looks for HTTP 201
|
||||
4. **Added region parameter** - All upload methods now require `zone_region`
|
||||
|
||||
## Test Coverage
|
||||
|
||||
**Unit Tests (5):**
|
||||
- DE region URL generation (with/without case)
|
||||
- LA, NY, SG region URL generation
|
||||
|
||||
**Integration Tests (13):**
|
||||
- Full upload workflow mocking
|
||||
- Deployment service orchestration
|
||||
- URL generation and logging
|
||||
- Error handling
|
||||
|
||||
**Real-World Test:**
|
||||
- Actual upload to Bunny.net storage
|
||||
- File accessible via CDN URL
|
||||
- HTTP 201 response confirmed
|
||||
|
||||
## Status
|
||||
✅ **VALIDATED** - Ready for production use
|
||||
|
||||
|
|
@ -1,105 +0,0 @@
|
|||
# Story 4.4: Post-Deployment Verification - Implementation Summary
|
||||
|
||||
## Status: COMPLETE
|
||||
|
||||
Story points: 5
|
||||
|
||||
## Overview
|
||||
Implemented a simple CLI command to verify deployed URLs return 200 OK status.
|
||||
|
||||
## Implementation
|
||||
|
||||
### New CLI Command
|
||||
Added `verify-deployment` command to `src/cli/commands.py`:
|
||||
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id <id> [--sample N] [--timeout 10]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
- `--batch-id, -b`: Project/batch ID to verify (required)
|
||||
- `--sample, -s`: Number of random URLs to check (optional, default: check all)
|
||||
- `--timeout, -t`: Request timeout in seconds (default: 10)
|
||||
|
||||
### Core Functionality
|
||||
1. Queries database for deployed articles in specified batch
|
||||
2. Filters articles with `deployed_url` and status='deployed'
|
||||
3. Makes HTTP GET requests to verify 200 OK status
|
||||
4. Supports checking all URLs or random sample
|
||||
5. Clear output showing success/failure for each URL
|
||||
6. Summary report with total checked, successful, and failed counts
|
||||
|
||||
### Code Changes
|
||||
- **Modified:** `src/cli/commands.py`
|
||||
- Added imports: `requests`, `random`
|
||||
- Added `verify_deployment()` command function
|
||||
|
||||
### Acceptance Criteria Verification
|
||||
- ✅ CLI command available: `verify-deployment --batch_id <id>`
|
||||
- ✅ Takes batch ID as input
|
||||
- ✅ Retrieves URLs for all articles in batch from database
|
||||
- ✅ Makes HTTP GET requests to sample or all URLs
|
||||
- ✅ Reports which URLs return 200 OK and which do not
|
||||
- ✅ Clear, easy-to-read output
|
||||
- ✅ Can be run manually after deployment
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Verify all URLs in a batch:
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10
|
||||
```
|
||||
|
||||
### Verify random sample of 5 URLs:
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10 --sample 5
|
||||
```
|
||||
|
||||
### Custom timeout:
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10 --timeout 30
|
||||
```
|
||||
|
||||
## Sample Output
|
||||
```
|
||||
Verifying deployment for batch: My Project (ID: 10)
|
||||
Keyword: main keyword
|
||||
|
||||
Found 25 deployed articles
|
||||
Checking all 25 URLs
|
||||
|
||||
✓ https://example.com/article-1.html
|
||||
✓ https://example.com/article-2.html
|
||||
✗ https://example.com/article-3.html (HTTP 404)
|
||||
...
|
||||
|
||||
======================================================================
|
||||
Verification Summary
|
||||
======================================================================
|
||||
Total checked: 25
|
||||
Successful: 24
|
||||
Failed: 1
|
||||
|
||||
Failed URLs:
|
||||
https://example.com/article-3.html
|
||||
Title: Article Three Title
|
||||
Error: 404
|
||||
|
||||
======================================================================
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
- `requests==2.32.5` (already in requirements.txt)
|
||||
|
||||
## Testing
|
||||
- Command help output verified
|
||||
- Follows existing CLI command patterns
|
||||
- Simple, focused implementation per user requirements
|
||||
|
||||
## Notes
|
||||
- No authentication required (read-only operation)
|
||||
- Uses existing database repositories
|
||||
- Minimal dependencies
|
||||
- No freelance features added - strictly adheres to acceptance criteria
|
||||
- Can be integrated into auto-deploy workflow in future if needed
|
||||
|
||||
|
|
@ -1,55 +0,0 @@
|
|||
# Story 4.4: Post-Deployment Verification - Quick Start
|
||||
|
||||
## Command Usage
|
||||
|
||||
Verify that deployed URLs are live and returning 200 OK status:
|
||||
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id <id>
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
| Option | Short | Description | Default |
|
||||
|--------|-------|-------------|---------|
|
||||
| `--batch-id` | `-b` | Project/batch ID to verify | Required |
|
||||
| `--sample` | `-s` | Number of random URLs to check | All |
|
||||
| `--timeout` | `-t` | Request timeout in seconds | 10 |
|
||||
|
||||
## Examples
|
||||
|
||||
### Check all URLs in a batch
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10
|
||||
```
|
||||
|
||||
### Check random sample of 5 URLs
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10 --sample 5
|
||||
```
|
||||
|
||||
### Use custom timeout
|
||||
```bash
|
||||
uv run python main.py verify-deployment --batch-id 10 --timeout 30
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The command will:
|
||||
1. Show batch information
|
||||
2. Display each URL check result with ✓ (success) or ✗ (failed)
|
||||
3. Provide a summary of results
|
||||
4. List failed URLs with error details
|
||||
|
||||
## Exit Codes
|
||||
|
||||
- `0`: All URLs returned 200 OK
|
||||
- `1`: One or more URLs failed or error occurred
|
||||
|
||||
## Notes
|
||||
|
||||
- Only checks articles with `deployed_url` set and status='deployed'
|
||||
- Follows redirects automatically
|
||||
- No authentication required (read-only operation)
|
||||
- Can be run multiple times safely
|
||||
|
||||
|
|
@ -1,172 +0,0 @@
|
|||
# Story 4.5: Create URL and Link Reporting Script - Implementation Summary
|
||||
|
||||
**Status:** ✅ COMPLETE
|
||||
**Story Points:** 3
|
||||
**Date Completed:** October 22, 2025
|
||||
|
||||
## Overview
|
||||
Implemented a CLI command to export article URLs with optional link details (anchor text and destination URLs) based on project and tier filters. Additionally enhanced the data model to store anchor text directly in the database for better performance and data integrity.
|
||||
|
||||
## Implementation
|
||||
|
||||
### Core Features Implemented
|
||||
|
||||
1. **CLI Command: `get-links`**
|
||||
- Location: `src/cli/commands.py`
|
||||
- Exports article URLs in CSV format
|
||||
- Required arguments:
|
||||
- `--project-id` / `-p`: Project ID to filter
|
||||
- `--tier` / `-t`: Tier filter (supports "1", "2", or "2+" for ranges)
|
||||
- Optional flags:
|
||||
- `--with-anchor-text`: Include anchor text used for tiered links
|
||||
- `--with-destination-url`: Include destination URL that the article links to
|
||||
- Output: CSV to stdout (can be redirected to file)
|
||||
|
||||
2. **Database Enhancement: anchor_text Field**
|
||||
- Added `anchor_text` column to `article_links` table
|
||||
- Migration script: `scripts/migrate_add_anchor_text.py`
|
||||
- Updated `ArticleLink` model with new field
|
||||
- Updated `ArticleLinkRepository.create()` to accept anchor_text parameter
|
||||
|
||||
3. **Content Injection Updates**
|
||||
- Modified `src/interlinking/content_injection.py` to capture and store actual anchor text used
|
||||
- Updated `_try_inject_link()` to return the anchor text that was successfully injected
|
||||
- All link creation calls now include anchor_text:
|
||||
- Tiered links (money site and lower tier)
|
||||
- Homepage links
|
||||
- See Also section links
|
||||
|
||||
## Files Modified
|
||||
|
||||
### Database Layer
|
||||
- `src/database/models.py` - Added `anchor_text` field to ArticleLink model
|
||||
- `src/database/repositories.py` - Updated ArticleLinkRepository.create()
|
||||
- `scripts/migrate_add_anchor_text.py` - New migration script
|
||||
|
||||
### Business Logic
|
||||
- `src/interlinking/content_injection.py`:
|
||||
- Modified `_try_inject_link()` signature to return anchor text
|
||||
- Updated `_inject_tiered_links()` to capture anchor text
|
||||
- Updated `_inject_homepage_link()` to capture anchor text
|
||||
- Updated `_inject_see_also_section()` to store article titles as anchor text
|
||||
|
||||
### CLI
|
||||
- `src/cli/commands.py`:
|
||||
- Added `get-links` command
|
||||
- Simplified implementation (no HTML parsing needed)
|
||||
- Direct database read for anchor text
|
||||
|
||||
### Tests
|
||||
- `tests/integration/test_get_links_command.py` - New comprehensive test suite (9 tests)
|
||||
|
||||
### Documentation
|
||||
- `docs/prd/epic-4-deployment.md` - Updated Story 4.5 status to COMPLETE
|
||||
- `docs/stories/story-3.2-find-tiered-links.md` - Updated ArticleLink schema to include anchor_text field
|
||||
- `docs/architecture/data-models.md` - Added ArticleLink model documentation with anchor_text field
|
||||
- `STORY_3.2_IMPLEMENTATION_SUMMARY.md` - Updated schema to include anchor_text field
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic usage - get all tier 1 URLs
|
||||
```bash
|
||||
python main.py get-links --project-id 1 --tier 1
|
||||
```
|
||||
|
||||
### Get tier 2 and above with anchor text and destinations
|
||||
```bash
|
||||
python main.py get-links --project-id 1 --tier 2+ --with-anchor-text --with-destination-url
|
||||
```
|
||||
|
||||
### Export to file
|
||||
```bash
|
||||
python main.py get-links --project-id 1 --tier 1 --with-anchor-text > tier1_links.csv
|
||||
```
|
||||
|
||||
## CSV Output Format
|
||||
|
||||
**Basic (no flags):**
|
||||
```csv
|
||||
article_url,tier,title
|
||||
https://example.com/article1.html,tier1,Article Title 1
|
||||
```
|
||||
|
||||
**With anchor text:**
|
||||
```csv
|
||||
article_url,tier,title,anchor_text
|
||||
https://example.com/article1.html,tier1,Article Title 1,expert services
|
||||
```
|
||||
|
||||
**With destination URL:**
|
||||
```csv
|
||||
article_url,tier,title,destination_url
|
||||
https://example.com/article1.html,tier1,Article Title 1,https://www.moneysite.com
|
||||
```
|
||||
|
||||
**With both flags:**
|
||||
```csv
|
||||
article_url,tier,title,anchor_text,destination_url
|
||||
https://example.com/article1.html,tier1,Article Title 1,expert services,https://www.moneysite.com
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
**Test Coverage:** 9 integration tests, all passing
|
||||
|
||||
**Test Cases:**
|
||||
1. Basic tier 1 export (no optional flags)
|
||||
2. Tier range filter (2+)
|
||||
3. Export with anchor text
|
||||
4. Export with destination URL
|
||||
5. Export with both flags
|
||||
6. Tier 2 resolves to_content_id to deployed URL
|
||||
7. Error handling - invalid project
|
||||
8. Error handling - invalid tier format
|
||||
9. Error handling - no deployed articles
|
||||
|
||||
## Database Enhancement Benefits
|
||||
|
||||
The addition of the `anchor_text` field to the `article_links` table provides:
|
||||
|
||||
1. **Performance**: No HTML parsing required - direct database read
|
||||
2. **Data Integrity**: Know exactly what anchor text was used for each link
|
||||
3. **Auditability**: Track link relationships and their anchor text
|
||||
4. **Simplicity**: Cleaner code without BeautifulSoup HTML parsing in CLI
|
||||
|
||||
## Migration
|
||||
|
||||
To apply the database changes to existing databases:
|
||||
```bash
|
||||
python scripts/migrate_add_anchor_text.py
|
||||
```
|
||||
|
||||
To rollback:
|
||||
```bash
|
||||
python scripts/migrate_add_anchor_text.py rollback
|
||||
```
|
||||
|
||||
**Note:** Existing links will have NULL anchor_text. Re-run content injection to populate this field for existing content.
|
||||
|
||||
## Acceptance Criteria - Verification
|
||||
|
||||
✅ A new CLI command `get-links` is created
|
||||
✅ The script accepts a mandatory `project_id`
|
||||
✅ The script accepts a `tier` specifier supporting single tier and ranges (e.g., "2+")
|
||||
✅ Optional flag `--with-anchor-text` includes the anchor text
|
||||
✅ Optional flag `--with-destination-url` includes the destination URL
|
||||
✅ The script queries the database to retrieve link information
|
||||
✅ The output is well-formatted CSV printed to stdout
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- Only reports tiered links (excludes homepage and see also links)
|
||||
- Existing article_links records created before migration will have NULL anchor_text
|
||||
- CSV output goes to stdout only (user must redirect to file)
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements for future stories:
|
||||
- Add `--link-type` flag to filter by link type (tiered, homepage, wheel_see_also)
|
||||
- Add `--output` flag to write directly to file
|
||||
- Add JSON output format option
|
||||
- Add summary statistics (total links, link types breakdown)
|
||||
|
||||
|
|
@ -1,185 +0,0 @@
|
|||
# Template Tracking Fix - October 21, 2025
|
||||
|
||||
## Problem Identified
|
||||
|
||||
Story 2.4 was incorrectly implemented to store template mappings in `master.config.json` instead of the database. This meant:
|
||||
- Templates were tracked per hostname in a config file
|
||||
- No database field to store template at site level
|
||||
- Story 3.4 (boilerplate pages) couldn't easily determine which template to use
|
||||
- Inconsistent tracking between config file and database
|
||||
|
||||
## Root Cause
|
||||
|
||||
Story 2.4 specification said to use `master.config.json` for template mappings, but this was wrong. Templates should be tracked at the **site/domain level in the database**, not in a config file.
|
||||
|
||||
## What Was Fixed
|
||||
|
||||
### 1. Database Model Updated
|
||||
**File**: `src/database/models.py`
|
||||
|
||||
Added `template_name` field to `SiteDeployment` model:
|
||||
```python
|
||||
class SiteDeployment(Base):
|
||||
# ... existing fields ...
|
||||
template_name: Mapped[str] = mapped_column(String(50), default="basic", nullable=False)
|
||||
```
|
||||
|
||||
### 2. Migration Script Created
|
||||
**File**: `scripts/migrate_add_template_to_sites.py`
|
||||
|
||||
New migration script adds `template_name` column to `site_deployments` table:
|
||||
```sql
|
||||
ALTER TABLE site_deployments
|
||||
ADD COLUMN template_name VARCHAR(50) DEFAULT 'basic' NOT NULL
|
||||
```
|
||||
|
||||
### 3. Template Service Fixed
|
||||
**File**: `src/templating/service.py`
|
||||
|
||||
**Before** (wrong):
|
||||
```python
|
||||
def select_template_for_content(...):
|
||||
# Query config file for hostname mapping
|
||||
if hostname in config.templates.mappings:
|
||||
return config.templates.mappings[hostname]
|
||||
|
||||
# Pick random and save to config
|
||||
template_name = self._select_random_template()
|
||||
self._persist_template_mapping(hostname, template_name)
|
||||
return template_name
|
||||
```
|
||||
|
||||
**After** (correct):
|
||||
```python
|
||||
def select_template_for_content(...):
|
||||
# Query database for site template
|
||||
if site_deployment_id and site_deployment_repo:
|
||||
site_deployment = site_deployment_repo.get_by_id(site_deployment_id)
|
||||
if site_deployment:
|
||||
return site_deployment.template_name or "basic"
|
||||
|
||||
return self._select_random_template()
|
||||
```
|
||||
|
||||
**Removed**:
|
||||
- `_persist_template_mapping()` method (no longer needed)
|
||||
|
||||
### 4. Config File Simplified
|
||||
**File**: `master.config.json`
|
||||
|
||||
**Before**:
|
||||
```json
|
||||
"templates": {
|
||||
"default": "basic",
|
||||
"mappings": {
|
||||
"aws-s3-bucket-1": "modern",
|
||||
"bunny-bucket-1": "classic",
|
||||
"azure-bucket-1": "minimal",
|
||||
"test.example.com": "minimal"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**After**:
|
||||
```json
|
||||
"templates": {
|
||||
"default": "basic"
|
||||
}
|
||||
```
|
||||
|
||||
Only keep `default` for fallback behavior. All template tracking now in database.
|
||||
|
||||
### 5. Story 2.4 Spec Updated
|
||||
**File**: `docs/stories/story-2.4-html-formatting-templates.md`
|
||||
|
||||
- Updated Task 3 to reflect database tracking
|
||||
- Updated Task 5 to include `template_name` field on `SiteDeployment`
|
||||
- Updated Technical Decisions section
|
||||
|
||||
### 6. Story 3.4 Updated
|
||||
**File**: `docs/stories/story-3.4-boilerplate-site-pages.md`
|
||||
|
||||
- Boilerplate pages now read `site.template_name` from database
|
||||
- No template service changes needed
|
||||
- Effort reduced from 15 to 14 story points
|
||||
|
||||
## How It Works Now
|
||||
|
||||
### Site Creation
|
||||
```python
|
||||
# When creating/provisioning a site
|
||||
site = SiteDeployment(
|
||||
site_name="example-site",
|
||||
template_name="modern", # or "basic", "classic", "minimal"
|
||||
# ... other fields
|
||||
)
|
||||
```
|
||||
|
||||
### Article Generation
|
||||
```python
|
||||
# When generating article
|
||||
site = site_repo.get_by_id(article.site_deployment_id)
|
||||
template = site.template_name # Read from database
|
||||
formatted_html = template_service.format_content(content, title, meta, template)
|
||||
```
|
||||
|
||||
### Boilerplate Pages
|
||||
```python
|
||||
# When generating boilerplate pages
|
||||
site = site_repo.get_by_id(site_id)
|
||||
template = site.template_name # Same template as articles
|
||||
about_html = generate_page("about", template=template)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Single source of truth**: Template tracked in database only
|
||||
2. **Consistent sites**: All content on a site uses same template
|
||||
3. **Simpler logic**: No config file manipulation needed
|
||||
4. **Better data model**: Template is a property of the site, not a mapping
|
||||
5. **Easier to query**: Can find all sites using a specific template
|
||||
|
||||
## Migration Path
|
||||
|
||||
For existing deployments:
|
||||
1. Run migration script: `uv run python scripts/migrate_add_template_to_sites.py`
|
||||
2. All existing sites default to `template_name="basic"`
|
||||
3. Update specific sites if needed:
|
||||
```sql
|
||||
UPDATE site_deployments SET template_name='modern' WHERE id=5;
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
No tests broken by this change:
|
||||
- Template service tests still pass (reads from database instead of config)
|
||||
- Article generation tests still pass
|
||||
- Template selection logic unchanged from user perspective
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Created
|
||||
- `scripts/migrate_add_template_to_sites.py`
|
||||
- `TEMPLATE_TRACKING_FIX.md` (this file)
|
||||
|
||||
### Modified
|
||||
- `src/database/models.py` - Added `template_name` field
|
||||
- `src/templating/service.py` - Removed config lookups, read from DB
|
||||
- `master.config.json` - Removed `mappings` section
|
||||
- `docs/stories/story-2.4-html-formatting-templates.md` - Updated spec
|
||||
- `docs/stories/story-3.4-boilerplate-site-pages.md` - Updated to use DB field
|
||||
- `STORY_3.4_CREATED.md` - Updated effort estimate
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Run migration: `uv run python scripts/migrate_add_template_to_sites.py`
|
||||
2. Verify existing articles still render correctly
|
||||
3. Implement Story 3.4 using the database field
|
||||
4. Future site creation/provisioning should set `template_name`
|
||||
|
||||
---
|
||||
|
||||
**Fixed by**: AI Code Assistant
|
||||
**Fixed on**: October 21, 2025
|
||||
**Issue identified by**: User during Story 3.4 discussion
|
||||
|
||||
24
brands.json
24
brands.json
|
|
@ -1,5 +1,27 @@
|
|||
{
|
||||
"gullco.com": ["Gullco", "Gullco International"],
|
||||
"dehumidifiercorp.com": ["Dehumidifier Corp", "Dehumidifier Corporation of America", "DCA"],
|
||||
"hoggeprecision.com": ["Hogge Precision", "Hogge Precision Parts Co. Inc"]
|
||||
"hoggeprecision.com": ["Hogge Precision", "Hogge Precision Parts Co. Inc"],
|
||||
"cncplastics.com": [ "Advanced Industrial"],
|
||||
"fzemanufacturing.com": ["FZE Manufacturing", "FZM Manufacturing Solutions LLC", "FZE"],
|
||||
"msmmfg.com": ["MSM", "Machine Specialty and Manufacturing", "Machine Specialty and Manufacturing Inc"],
|
||||
"lelubricants.com": ["Lubrication Engineers", "LE"],
|
||||
"metalcraftspinning.com": ["Metal Craft Spinning", "Metal Craft Spinning and Stamping"],
|
||||
"agifabricators.com": ["AGI Fabricators", "AGI/Winkler"],
|
||||
"renown-electric.com": ["Renown Electric", "Renown Electric Motors and Repair Inc."],
|
||||
"nicoletplastics.com": ["Nicolet Plastics", "Nicolet Plastics LLC"],
|
||||
"evrproducts.com": ["EVR Products", "Elasto-Valve Rubber Products Inc"],
|
||||
"dietzelectric.com ": ["Dietz Electric"],
|
||||
"mcmusa.net": ["MCM", "MCM Composites", "MCM Composites LLC"],
|
||||
"axiomatic.com": ["Axiomatic Technologies Corporation", "Axiomatic"],
|
||||
"paragonsteel.com": ["Paragon Steel"],
|
||||
"chap2.com": ["Chapter 2 Incorporated", "Chapter 2 Inc"],
|
||||
"royalpurpleind.com": ["Royal Purple Industrial"],
|
||||
"mccormickind.com": ["McCormick Industries","McCormick Industries Inc"],
|
||||
"mod-tronic.com": ["Mod-Tronic", "Mod-Tronic Instruments Limited"],
|
||||
"rpmrubberparts.com": ["RPM IndustrialRubber Parts", "RPM Mechanical Inc"],
|
||||
"elismanufacturing.com": ["ELIS Manufacturing & Packaging Solutions", "ELIS Manufacturing & Packaging Solutions Inc"],
|
||||
"greenbayplastics.com": ["Green Bay Plastics"],
|
||||
"ksentry.com": ["Krueger Sentry Gauge"]
|
||||
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,46 +2,6 @@
|
|||
|
||||
Comprehensive documentation for all CLI commands.
|
||||
|
||||
> **Note:** This documentation is auto-generated from the Click command definitions. To regenerate after adding or modifying commands, run:
|
||||
> ```bash
|
||||
> uv run python scripts/generate_cli_docs.py
|
||||
> ```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### System Commands
|
||||
- `config` - Show current configuration
|
||||
- `health` - Check system health
|
||||
- `models` - List available AI models
|
||||
|
||||
### User Management
|
||||
- `add-user` - Create a new user (requires admin)
|
||||
- `delete-user` - Delete a user (requires admin)
|
||||
- `list-users` - List all users (requires admin)
|
||||
|
||||
### Site Management
|
||||
- `provision-site` - Provision a new site with Storage Zone and Pull Zone
|
||||
- `attach-domain` - Attach a domain to an existing Storage Zone
|
||||
- `list-sites` - List all site deployments
|
||||
- `get-site` - Get detailed information about a site
|
||||
- `remove-site` - Remove a site deployment record
|
||||
- `sync-sites` - Sync existing bunny.net sites to database
|
||||
|
||||
### Project Management
|
||||
- `ingest-cora` - Ingest a CORA .xlsx report and create a new project
|
||||
- `ingest-simple` - Ingest a simple spreadsheet and create a new project
|
||||
- `list-projects` - List all projects for the authenticated user
|
||||
|
||||
### Content Generation
|
||||
- `generate-batch` - Generate content batch from job file
|
||||
|
||||
### Deployment
|
||||
- `deploy-batch` - Deploy all content in a batch to cloud storage
|
||||
- `verify-deployment` - Verify deployed URLs return 200 OK status
|
||||
|
||||
### Link Export
|
||||
- `get-links` - Export article URLs with optional link details
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [System](#system)
|
||||
|
|
@ -350,8 +310,14 @@ Ingest a CORA .xlsx report and create a new project
|
|||
- `--custom-anchors`, `-a`
|
||||
- Type: STRING | Comma-separated list of custom anchor text (optional)
|
||||
|
||||
- `--tier1-branded-ratio`
|
||||
- Type: FLOAT | Ratio of branded anchor text for tier1 (default: 0.75). When specified, prompts for branded anchor text (company name) and configures tier1 job with explicit anchor text terms achieving the specified ratio.
|
||||
- `--tier1-branded-ratio`, `-t`
|
||||
- Type: FLOAT | Ratio of branded anchor text for tier1 (optional, only prompts if provided)
|
||||
|
||||
- `--tier1-branded-plus-ratio`, `-bp`
|
||||
- Type: FLOAT | Ratio of branded+ anchor text for tier1 (optional, applied to remaining slots after branded)
|
||||
|
||||
- `--random-deployment-targets`, `-r`
|
||||
- Type: INT | Number of random deployment targets to select (default: random 2-3)
|
||||
|
||||
- `--username`, `-u`
|
||||
- Type: STRING | Username for authentication
|
||||
|
|
@ -365,14 +331,6 @@ Ingest a CORA .xlsx report and create a new project
|
|||
uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project"
|
||||
```
|
||||
|
||||
**Example with branded anchor text ratio:**
|
||||
|
||||
```bash
|
||||
uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project" --tier1-branded-ratio 0.75
|
||||
```
|
||||
|
||||
When using `--tier1-branded-ratio`, you will be prompted to enter the branded anchor text (company name). The generated job file will include tier1 anchor_text_config with explicit mode, where the specified percentage of terms are branded and the remainder are main keyword variations.
|
||||
|
||||
---
|
||||
|
||||
### `ingest-simple`
|
||||
|
|
@ -554,3 +512,13 @@ uv run python main.py get-links --project-id 1 --tier 1
|
|||
```
|
||||
|
||||
---
|
||||
|
||||
## Other Commands
|
||||
|
||||
### `create-job`
|
||||
|
||||
Create a job file from an existing project ID
|
||||
|
||||
### `discover-s3-buckets`
|
||||
|
||||
Discover and register AWS S3 buckets as site deployments
|
||||
|
|
|
|||
|
|
@ -227,6 +227,7 @@ Ingest a CORA .xlsx report and create a new project
|
|||
- `--money-site-url` / `-m` (optional) - Money site URL (e.g., https://example.com)
|
||||
- `--custom-anchors` / `-a` (optional) - Comma-separated list of custom anchor text
|
||||
- `--tier1-branded-ratio` (optional) - Ratio of branded anchor text for tier1 (default: 0.75). When specified, prompts for branded anchor text and configures tier1 job with explicit anchor text terms achieving the specified ratio.
|
||||
- `--random-deployment-targets` / `-r` (optional) - Number of random deployment targets to select from available sites (default: random 2-3)
|
||||
- `--username` / `-u` (optional) - Username for authentication
|
||||
- `--password` / `-p` (optional) - Password for authentication
|
||||
|
||||
|
|
|
|||
|
|
@ -88,8 +88,8 @@
|
|||
"max_links_per_article": 5,
|
||||
"tier_anchor_text_rules": {
|
||||
"tier1": {
|
||||
"source": "main_keyword",
|
||||
"description": "Tier 1 uses main keyword for anchor text"
|
||||
"source": "related_searches",
|
||||
"description": "Tier 1 uses related searches for anchor text"
|
||||
},
|
||||
"tier2": {
|
||||
"source": "related_searches",
|
||||
|
|
|
|||
|
|
@ -115,7 +115,7 @@ def create_job_file_for_project(
|
|||
return None
|
||||
|
||||
t1_count = tier1_count if tier1_count is not None else random.randint(10, 12)
|
||||
t2_count = random.randint(30, 45)
|
||||
t2_count = (tier1_count * 2) + random.randint(1, 12)
|
||||
if random_deployment_targets is not None:
|
||||
num_targets = min(random_deployment_targets, len(available_domains))
|
||||
else:
|
||||
|
|
|
|||
|
|
@ -274,12 +274,64 @@ class S3StorageClient:
|
|||
except BotoCoreError as e:
|
||||
raise S3StorageError(f"Failed to configure bucket: {str(e)}")
|
||||
|
||||
def _configure_static_website_hosting(
|
||||
self,
|
||||
bucket_name: str,
|
||||
region: str,
|
||||
endpoint_url: Optional[str] = None
|
||||
):
|
||||
"""
|
||||
Enable static website hosting on S3 bucket
|
||||
|
||||
This allows the bucket to serve index.html at the root URL.
|
||||
Note: This only works for standard AWS S3, not S3-compatible services.
|
||||
|
||||
Args:
|
||||
bucket_name: S3 bucket name
|
||||
region: AWS region
|
||||
endpoint_url: Custom endpoint URL (if set, skip website hosting config)
|
||||
|
||||
Raises:
|
||||
S3StorageError: If configuration fails
|
||||
"""
|
||||
# Skip website hosting for S3-compatible services (they may not support it)
|
||||
if endpoint_url:
|
||||
logger.debug(f"Skipping static website hosting for S3-compatible bucket {bucket_name}")
|
||||
return
|
||||
|
||||
try:
|
||||
s3_client = self._get_s3_client(region, endpoint_url)
|
||||
|
||||
# Enable static website hosting with index.html as index document
|
||||
try:
|
||||
s3_client.put_bucket_website(
|
||||
Bucket=bucket_name,
|
||||
WebsiteConfiguration={
|
||||
'IndexDocument': {'Suffix': 'index.html'},
|
||||
'ErrorDocument': {'Key': 'error.html'}
|
||||
}
|
||||
)
|
||||
logger.info(f"Enabled static website hosting for bucket {bucket_name}")
|
||||
except ClientError as e:
|
||||
error_code = e.response.get('Error', {}).get('Code', '')
|
||||
if error_code == 'NoSuchBucket':
|
||||
raise S3StorageError(f"Bucket {bucket_name} does not exist")
|
||||
elif error_code == 'NotImplemented':
|
||||
# Some S3-compatible services don't support website hosting
|
||||
logger.debug(f"Static website hosting not supported for bucket {bucket_name}: {e}")
|
||||
else:
|
||||
logger.warning(f"Could not enable static website hosting: {e}")
|
||||
|
||||
except BotoCoreError as e:
|
||||
logger.warning(f"Failed to configure static website hosting: {str(e)}")
|
||||
|
||||
def _generate_public_url(
|
||||
self,
|
||||
bucket_name: str,
|
||||
file_path: str,
|
||||
region: str,
|
||||
custom_domain: Optional[str] = None
|
||||
custom_domain: Optional[str] = None,
|
||||
endpoint_url: Optional[str] = None
|
||||
) -> str:
|
||||
"""
|
||||
Generate public URL for uploaded file
|
||||
|
|
@ -289,6 +341,7 @@ class S3StorageClient:
|
|||
file_path: File path within bucket
|
||||
region: AWS region
|
||||
custom_domain: Optional custom domain (manual setup required)
|
||||
endpoint_url: Custom endpoint URL (if set, use standard endpoint format)
|
||||
|
||||
Returns:
|
||||
Public URL string
|
||||
|
|
@ -296,8 +349,12 @@ class S3StorageClient:
|
|||
if custom_domain:
|
||||
return f"https://{custom_domain.rstrip('/')}/{file_path}"
|
||||
|
||||
# Virtual-hosted style URL (default for AWS S3)
|
||||
return f"https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}"
|
||||
# Use website endpoint format for standard AWS S3 (enables root URL access)
|
||||
# Use standard endpoint for S3-compatible services
|
||||
if endpoint_url:
|
||||
return f"https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}"
|
||||
else:
|
||||
return f"https://{bucket_name}.s3-website-{region}.amazonaws.com/{file_path}"
|
||||
|
||||
def upload_file(
|
||||
self,
|
||||
|
|
@ -331,6 +388,9 @@ class S3StorageClient:
|
|||
# This is idempotent and safe to call multiple times
|
||||
try:
|
||||
self._configure_bucket_public_read(bucket_name, region, endpoint_url)
|
||||
# Enable static website hosting if uploading index.html
|
||||
if file_path == 'index.html':
|
||||
self._configure_static_website_hosting(bucket_name, region, endpoint_url)
|
||||
except S3StorageError as e:
|
||||
logger.warning(f"Bucket configuration warning: {e}")
|
||||
|
||||
|
|
@ -377,7 +437,7 @@ class S3StorageClient:
|
|||
s3_client.put_object(**upload_kwargs)
|
||||
|
||||
public_url = self._generate_public_url(
|
||||
bucket_name, file_path, region, custom_domain
|
||||
bucket_name, file_path, region, custom_domain, endpoint_url
|
||||
)
|
||||
|
||||
logger.info(f"Uploaded {file_path} to s3://{bucket_name}/{file_path}")
|
||||
|
|
|
|||
|
|
@ -390,7 +390,11 @@ class BatchProcessor:
|
|||
if assigned_site.storage_provider in ('s3', 's3_compatible') and assigned_site.s3_custom_domain:
|
||||
hostname = assigned_site.s3_custom_domain
|
||||
elif assigned_site.storage_provider in ('s3', 's3_compatible') and assigned_site.s3_bucket_name and assigned_site.s3_bucket_region:
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3.{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
# Use website endpoint format for standard AWS S3 (enables root URL access)
|
||||
if assigned_site.storage_provider == 's3_compatible' or getattr(assigned_site, 's3_endpoint_url', None):
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3.{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3-website-{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
hostname = assigned_site.custom_hostname or assigned_site.pull_zone_bcdn_hostname
|
||||
click.echo(f"{prefix} Assigned to site: {hostname} (ID: {site_deployment_id})")
|
||||
|
|
@ -893,7 +897,11 @@ class BatchProcessor:
|
|||
if assigned_site.storage_provider in ('s3', 's3_compatible') and assigned_site.s3_custom_domain:
|
||||
hostname = assigned_site.s3_custom_domain
|
||||
elif assigned_site.storage_provider in ('s3', 's3_compatible') and assigned_site.s3_bucket_name and assigned_site.s3_bucket_region:
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3-website-{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
# Use website endpoint format for standard AWS S3 (enables root URL access)
|
||||
if assigned_site.storage_provider == 's3_compatible' or getattr(assigned_site, 's3_endpoint_url', None):
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3.{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
hostname = f"{assigned_site.s3_bucket_name}.s3-website-{assigned_site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
hostname = assigned_site.custom_hostname or assigned_site.pull_zone_bcdn_hostname
|
||||
click.echo(f"{prefix} Assigned to site: {hostname} (ID: {site_deployment_id})")
|
||||
|
|
|
|||
|
|
@ -2,6 +2,8 @@ import os
|
|||
import re
|
||||
import random
|
||||
import requests
|
||||
import json
|
||||
from datetime import datetime, timedelta
|
||||
from urllib.parse import quote
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
|
|
@ -9,161 +11,296 @@ from dotenv import load_dotenv
|
|||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
BACKLOG_QUEUE = Path('deployment_logs/tier1_backlog_queue.json')
|
||||
|
||||
def process_colinkri_urls(dripfeed=7):
|
||||
"""
|
||||
Process URL files and send them to Colinkri API.
|
||||
|
||||
Args:
|
||||
dripfeed (int): Number of days for drip feed. Default is 7.
|
||||
|
||||
Returns:
|
||||
dict: Summary of processed, successful, and failed files
|
||||
"""
|
||||
api_key = os.getenv('COLINKRI_API_KEY')
|
||||
if not api_key:
|
||||
raise ValueError("COLINKRI_API_KEY not found in environment variables")
|
||||
|
||||
# Setup directories
|
||||
base_dir = Path('deployment_logs')
|
||||
done_dir = base_dir / 'Done'
|
||||
failed_dir = base_dir / 'Failed'
|
||||
|
||||
# Create directories if they don't exist
|
||||
done_dir.mkdir(parents=True, exist_ok=True)
|
||||
failed_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Pattern to match files: YYYY-MM-DD_other_tiers_urls.txt
|
||||
pattern = re.compile(r'^\d{4}-\d{2}-\d{2}_other_tiers_urls\.txt$')
|
||||
|
||||
# Get matching files
|
||||
matching_files = [f for f in base_dir.iterdir()
|
||||
if f.is_file() and pattern.match(f.name)]
|
||||
|
||||
if not matching_files:
|
||||
print("No matching files found.")
|
||||
return {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
results = {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
for file_path in matching_files:
|
||||
results['processed'] += 1
|
||||
campaign_name = file_path.stem # Filename without .txt
|
||||
|
||||
print(f"\nProcessing: {file_path.name}")
|
||||
|
||||
try:
|
||||
# Read URLs from file
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
urls = [line.strip() for line in f if line.strip()]
|
||||
|
||||
if not urls:
|
||||
print(f" ⚠️ No URLs found in {file_path.name}")
|
||||
|
||||
# Handle potential duplicate filenames in Failed folder
|
||||
destination = failed_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = failed_dir / new_name
|
||||
counter += 1
|
||||
|
||||
file_path.rename(destination)
|
||||
results['failed'] += 1
|
||||
continue
|
||||
# Randomize URL order
|
||||
random.shuffle(urls)
|
||||
# Join URLs with pipe separator
|
||||
urls_param = '|'.join(urls)
|
||||
|
||||
# Prepare API request
|
||||
api_url = 'https://www.colinkri.com/amember/crawler/api'
|
||||
|
||||
# URL encode the parameters
|
||||
data = {
|
||||
'apikey': api_key,
|
||||
'campaignname': campaign_name,
|
||||
'dripfeed': str(dripfeed),
|
||||
'urls': urls_param
|
||||
}
|
||||
|
||||
headers = {
|
||||
'Content-Type': 'application/x-www-form-urlencoded'
|
||||
}
|
||||
|
||||
# Send request
|
||||
print(f" 📤 Sending {len(urls)} URLs to Colinkri API...")
|
||||
response = requests.post(api_url, data=data, headers=headers, timeout=30)
|
||||
|
||||
# Check response
|
||||
if response.status_code == 200:
|
||||
print(f" ✅ Success! Campaign: {campaign_name}")
|
||||
|
||||
# Handle potential duplicate filenames in Done folder
|
||||
destination = done_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
# Add counter to filename if it already exists
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = done_dir / new_name
|
||||
counter += 1
|
||||
|
||||
file_path.rename(destination)
|
||||
results['successful'] += 1
|
||||
else:
|
||||
error_msg = f"API returned status code {response.status_code}: {response.text}"
|
||||
print(f" ❌ Failed: {error_msg}")
|
||||
|
||||
# Handle potential duplicate filenames in Failed folder
|
||||
destination = failed_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = failed_dir / new_name
|
||||
counter += 1
|
||||
|
||||
# Log error to file
|
||||
error_log = failed_dir / f"{destination.stem}_error.log"
|
||||
with open(error_log, 'w', encoding='utf-8') as f:
|
||||
f.write(f"Error processing {file_path.name}\n")
|
||||
f.write(f"Status Code: {response.status_code}\n")
|
||||
f.write(f"Response: {response.text}\n")
|
||||
|
||||
file_path.rename(destination)
|
||||
results['failed'] += 1
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ Error: {str(e)}")
|
||||
|
||||
# Handle potential duplicate filenames in Failed folder
|
||||
|
||||
def _send_urls(api_url, api_key, file_path, done_dir, failed_dir, dripfeed, service_name):
|
||||
"""Send URLs from a file to an indexer API and move the file to Done/Failed."""
|
||||
campaign_name = file_path.stem
|
||||
|
||||
print(f"\nProcessing: {file_path.name}")
|
||||
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
urls = [line.strip() for line in f if line.strip()]
|
||||
|
||||
if not urls:
|
||||
print(f" [WARNING] No URLs found in {file_path.name}")
|
||||
destination = failed_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = failed_dir / new_name
|
||||
counter += 1
|
||||
file_path.rename(destination)
|
||||
return False
|
||||
|
||||
random.shuffle(urls)
|
||||
urls_param = '|'.join(urls)
|
||||
|
||||
data = {
|
||||
'apikey': api_key,
|
||||
'campaignname': campaign_name,
|
||||
'dripfeed': str(dripfeed),
|
||||
'urls': urls_param
|
||||
}
|
||||
|
||||
headers = {
|
||||
'Content-Type': 'application/x-www-form-urlencoded'
|
||||
}
|
||||
|
||||
print(f" [SENDING] Sending {len(urls)} URLs to {service_name} API...")
|
||||
response = requests.post(api_url, data=data, headers=headers, timeout=30)
|
||||
|
||||
if response.status_code == 200:
|
||||
print(f" [SUCCESS] Success! Campaign: {campaign_name}")
|
||||
destination = done_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = done_dir / new_name
|
||||
counter += 1
|
||||
file_path.rename(destination)
|
||||
return True
|
||||
else:
|
||||
error_msg = f"API returned status code {response.status_code}: {response.text}"
|
||||
print(f" [FAILED] Failed: {error_msg}")
|
||||
destination = failed_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = failed_dir / new_name
|
||||
counter += 1
|
||||
|
||||
# Log error to file
|
||||
error_log = failed_dir / f"{destination.stem}_error.log"
|
||||
with open(error_log, 'w', encoding='utf-8') as f:
|
||||
f.write(f"Error processing {file_path.name}\n")
|
||||
f.write(f"Exception: {str(e)}\n")
|
||||
|
||||
f.write(f"Status Code: {response.status_code}\n")
|
||||
f.write(f"Response: {response.text}\n")
|
||||
file_path.rename(destination)
|
||||
results['failed'] += 1
|
||||
|
||||
# Print summary
|
||||
print("\n" + "="*50)
|
||||
print("SUMMARY")
|
||||
print("="*50)
|
||||
print(f"Files processed: {results['processed']}")
|
||||
print(f"Successful: {results['successful']}")
|
||||
print(f"Failed: {results['failed']}")
|
||||
print("="*50)
|
||||
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f" [ERROR] Error: {str(e)}")
|
||||
destination = failed_dir / file_path.name
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
new_name = f"{file_path.stem}_{counter}{file_path.suffix}"
|
||||
destination = failed_dir / new_name
|
||||
counter += 1
|
||||
error_log = failed_dir / f"{destination.stem}_error.log"
|
||||
with open(error_log, 'w', encoding='utf-8') as f:
|
||||
f.write(f"Error processing {file_path.name}\n")
|
||||
f.write(f"Exception: {str(e)}\n")
|
||||
file_path.rename(destination)
|
||||
return False
|
||||
|
||||
|
||||
def process_colinkri_urls(base_dir, done_dir, failed_dir, dripfeed=7):
|
||||
"""Process other_tiers URL files and send them to Colinkri API."""
|
||||
api_key = os.getenv('COLINKRI_API_KEY')
|
||||
if not api_key:
|
||||
raise ValueError("COLINKRI_API_KEY not found in environment variables")
|
||||
|
||||
pattern = re.compile(r'^\d{4}-\d{2}-\d{2}_other_tiers_urls\.txt$')
|
||||
matching_files = [f for f in base_dir.iterdir()
|
||||
if f.is_file() and pattern.match(f.name)]
|
||||
|
||||
if not matching_files:
|
||||
print("No matching other_tiers files found.")
|
||||
return {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
results = {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
for file_path in matching_files:
|
||||
results['processed'] += 1
|
||||
success = _send_urls(
|
||||
'https://www.colinkri.com/amember/crawler/api',
|
||||
api_key, file_path, done_dir, failed_dir, dripfeed, 'Colinkri'
|
||||
)
|
||||
results['successful' if success else 'failed'] += 1
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def process_omega_urls(base_dir, done_dir, failed_dir, dripfeed=7, lag_days=10):
|
||||
"""Process the tier1 URL file from exactly lag_days ago and send to Omega Indexer API."""
|
||||
api_key = os.getenv('OMEGA_API_KEY')
|
||||
if not api_key:
|
||||
raise ValueError("OMEGA_API_KEY not found in environment variables")
|
||||
|
||||
target_date = (datetime.now() - timedelta(days=lag_days)).strftime('%Y-%m-%d')
|
||||
target_filename = f"{target_date}_tier1_urls.txt"
|
||||
file_path = base_dir / target_filename
|
||||
|
||||
print(f"Looking for tier1 file: {target_filename}")
|
||||
|
||||
if not file_path.exists():
|
||||
print(f"No tier1 file found for {target_date} (10-day lag).")
|
||||
return {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
results = {'processed': 1, 'successful': 0, 'failed': 0}
|
||||
success = _send_urls(
|
||||
'https://www.omegaindexer.com/amember/dashboard/api',
|
||||
api_key, file_path, done_dir, failed_dir, dripfeed, 'Omega Indexer'
|
||||
)
|
||||
results['successful' if success else 'failed'] += 1
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def _init_backlog_queue(base_dir, done_dir, lag_days=10):
|
||||
"""
|
||||
One-time setup: read all backlogged tier1 files (older than lag_days),
|
||||
combine all URLs, shuffle, and save to a queue file.
|
||||
"""
|
||||
cutoff_date = (datetime.now() - timedelta(days=lag_days)).strftime('%Y-%m-%d')
|
||||
pattern = re.compile(r'^(\d{4}-\d{2}-\d{2})_tier1_urls\.txt$')
|
||||
|
||||
backlog_files = []
|
||||
for f in sorted(base_dir.iterdir()):
|
||||
match = pattern.match(f.name)
|
||||
if match and f.is_file() and match.group(1) <= cutoff_date:
|
||||
backlog_files.append(f)
|
||||
|
||||
if not backlog_files:
|
||||
print("No backlogged tier1 files found.")
|
||||
return None
|
||||
|
||||
# Collect all URLs and track source files
|
||||
all_urls = []
|
||||
source_files = []
|
||||
for f in backlog_files:
|
||||
source_files.append(f.name)
|
||||
with open(f, 'r', encoding='utf-8') as fh:
|
||||
urls = [line.strip() for line in fh if line.strip()]
|
||||
all_urls.extend(urls)
|
||||
|
||||
random.shuffle(all_urls)
|
||||
|
||||
queue_data = {
|
||||
'urls': all_urls,
|
||||
'source_files': source_files,
|
||||
'total': len(all_urls),
|
||||
'sent': 0,
|
||||
'created': datetime.now().isoformat()
|
||||
}
|
||||
|
||||
with open(BACKLOG_QUEUE, 'w', encoding='utf-8') as f:
|
||||
json.dump(queue_data, f, indent=2)
|
||||
|
||||
print(f" [INIT] Backlog queue created: {len(all_urls)} URLs from {len(source_files)} files")
|
||||
|
||||
# Move source files to Done now that they're queued
|
||||
for filename in source_files:
|
||||
src = base_dir / filename
|
||||
if src.exists():
|
||||
destination = done_dir / filename
|
||||
counter = 1
|
||||
while destination.exists():
|
||||
stem = Path(filename).stem
|
||||
suffix = Path(filename).suffix
|
||||
destination = done_dir / f"{stem}_{counter}{suffix}"
|
||||
counter += 1
|
||||
src.rename(destination)
|
||||
|
||||
return queue_data
|
||||
|
||||
|
||||
def process_omega_backlog(base_dir, done_dir, failed_dir, dripfeed=7, batch_size=50, lag_days=10):
|
||||
"""
|
||||
Drain the tier1 backlog by sending batch_size URLs per run to Omega Indexer.
|
||||
On first run, initializes the queue from all backlogged tier1 files.
|
||||
"""
|
||||
api_key = os.getenv('OMEGA_API_KEY')
|
||||
if not api_key:
|
||||
raise ValueError("OMEGA_API_KEY not found in environment variables")
|
||||
|
||||
# Load or initialize the queue
|
||||
if BACKLOG_QUEUE.exists():
|
||||
with open(BACKLOG_QUEUE, 'r', encoding='utf-8') as f:
|
||||
queue_data = json.load(f)
|
||||
else:
|
||||
queue_data = _init_backlog_queue(base_dir, done_dir, lag_days)
|
||||
if not queue_data:
|
||||
return {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
remaining_urls = queue_data['urls']
|
||||
if not remaining_urls:
|
||||
print("Backlog queue is empty. Backlog complete!")
|
||||
BACKLOG_QUEUE.unlink()
|
||||
return {'processed': 0, 'successful': 0, 'failed': 0}
|
||||
|
||||
# Take the next batch
|
||||
batch = remaining_urls[:batch_size]
|
||||
remaining = remaining_urls[batch_size:]
|
||||
|
||||
batch_num = (queue_data['sent'] // batch_size) + 1
|
||||
campaign_name = f"tier1_backlog_batch_{batch_num}"
|
||||
|
||||
print(f"\n [BACKLOG] Batch {batch_num}: {len(batch)} URLs ({len(remaining)} remaining after this)")
|
||||
|
||||
try:
|
||||
urls_param = '|'.join(batch)
|
||||
data = {
|
||||
'apikey': api_key,
|
||||
'campaignname': campaign_name,
|
||||
'dripfeed': str(dripfeed),
|
||||
'urls': urls_param
|
||||
}
|
||||
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
|
||||
|
||||
print(f" [SENDING] Sending {len(batch)} URLs to Omega Indexer API...")
|
||||
response = requests.post(
|
||||
'https://www.omegaindexer.com/amember/dashboard/api',
|
||||
data=data, headers=headers, timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
print(f" [SUCCESS] Success! Campaign: {campaign_name}")
|
||||
queue_data['urls'] = remaining
|
||||
queue_data['sent'] += len(batch)
|
||||
|
||||
with open(BACKLOG_QUEUE, 'w', encoding='utf-8') as f:
|
||||
json.dump(queue_data, f, indent=2)
|
||||
|
||||
if not remaining:
|
||||
print(" [COMPLETE] Backlog fully drained!")
|
||||
BACKLOG_QUEUE.unlink()
|
||||
|
||||
return {'processed': 1, 'successful': 1, 'failed': 0}
|
||||
else:
|
||||
print(f" [FAILED] API returned {response.status_code}: {response.text}")
|
||||
return {'processed': 1, 'successful': 0, 'failed': 1}
|
||||
|
||||
except Exception as e:
|
||||
print(f" [ERROR] Error: {str(e)}")
|
||||
return {'processed': 1, 'successful': 0, 'failed': 1}
|
||||
|
||||
|
||||
def _print_summary(label, results):
|
||||
print(f"\n{'='*50}")
|
||||
print(f"{label} SUMMARY")
|
||||
print(f"{'='*50}")
|
||||
print(f"Files processed: {results['processed']}")
|
||||
print(f"Successful: {results['successful']}")
|
||||
print(f"Failed: {results['failed']}")
|
||||
print(f"{'='*50}")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Example usage
|
||||
process_colinkri_urls(dripfeed=7)
|
||||
base_dir = Path('deployment_logs')
|
||||
done_dir = base_dir / 'Done'
|
||||
failed_dir = base_dir / 'Failed'
|
||||
done_dir.mkdir(parents=True, exist_ok=True)
|
||||
failed_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 1) Colinkri: send all pending other_tiers files
|
||||
colinkri_results = process_colinkri_urls(base_dir, done_dir, failed_dir, dripfeed=7)
|
||||
_print_summary("COLINKRI", colinkri_results)
|
||||
|
||||
# 2) Omega Indexer: send tier1 file from 10 days ago
|
||||
omega_results = process_omega_urls(base_dir, done_dir, failed_dir, dripfeed=7, lag_days=10)
|
||||
_print_summary("OMEGA INDEXER", omega_results)
|
||||
|
||||
# 3) Omega Indexer backlog: drain 50 URLs per run
|
||||
backlog_results = process_omega_backlog(base_dir, done_dir, failed_dir, dripfeed=7, batch_size=50, lag_days=10)
|
||||
_print_summary("OMEGA BACKLOG", backlog_results)
|
||||
|
|
@ -34,7 +34,12 @@ def get_site_hostname(site: SiteDeployment) -> str:
|
|||
if site.s3_custom_domain:
|
||||
return site.s3_custom_domain
|
||||
elif site.s3_bucket_name and site.s3_bucket_region:
|
||||
return f"{site.s3_bucket_name}.s3.{site.s3_bucket_region}.amazonaws.com"
|
||||
# Use website endpoint format for static website hosting (enables root URL access)
|
||||
# Skip website endpoint for S3-compatible services (use standard endpoint)
|
||||
if site.storage_provider == 's3_compatible' or getattr(site, 's3_endpoint_url', None):
|
||||
return f"{site.s3_bucket_name}.s3.{site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
return f"{site.s3_bucket_name}.s3-website-{site.s3_bucket_region}.amazonaws.com"
|
||||
else:
|
||||
hostname = site.custom_hostname or site.pull_zone_bcdn_hostname
|
||||
logger.warning(f"S3 site {site.id} missing s3_custom_domain and bucket info, using fallback: {hostname}")
|
||||
|
|
@ -146,7 +151,7 @@ def generate_public_url(site: SiteDeployment, file_path: str) -> str:
|
|||
-> 'https://cdn.example.com/article.html'
|
||||
|
||||
S3 site without custom domain, file_path='article.html'
|
||||
-> 'https://bucket-name.s3.region.amazonaws.com/article.html'
|
||||
-> 'https://bucket-name.s3-website-region.amazonaws.com/article.html'
|
||||
"""
|
||||
hostname = get_site_hostname(site)
|
||||
return f"https://{hostname}/{file_path}"
|
||||
|
|
|
|||
|
|
@ -1,199 +0,0 @@
|
|||
# Story 2.2 Implementation Summary
|
||||
|
||||
## Overview
|
||||
Successfully implemented simplified AI content generation via batch jobs using OpenRouter API.
|
||||
|
||||
## Completed Phases
|
||||
|
||||
### Phase 1: Data Model & Schema Design
|
||||
- ✅ Added `GeneratedContent` model to `src/database/models.py`
|
||||
- ✅ Created `GeneratedContentRepository` in `src/database/repositories.py`
|
||||
- ✅ Updated `scripts/init_db.py` (automatic table creation via Base.metadata)
|
||||
|
||||
### Phase 2: AI Client & Prompt Management
|
||||
- ✅ Created `src/generation/ai_client.py` with:
|
||||
- `AIClient` class for OpenRouter API integration
|
||||
- `PromptManager` class for template loading
|
||||
- Retry logic with exponential backoff
|
||||
- ✅ Created prompt templates in `src/generation/prompts/`:
|
||||
- `title_generation.json`
|
||||
- `outline_generation.json`
|
||||
- `content_generation.json`
|
||||
- `content_augmentation.json`
|
||||
|
||||
### Phase 3: Core Generation Pipeline
|
||||
- ✅ Implemented `ContentGenerator` in `src/generation/service.py` with:
|
||||
- `generate_title()` - Stage 1
|
||||
- `generate_outline()` - Stage 2 with JSON validation
|
||||
- `generate_content()` - Stage 3
|
||||
- `validate_word_count()` - Word count validation
|
||||
- `augment_content()` - Simple augmentation
|
||||
- `count_words()` - HTML-aware word counting
|
||||
- Debug output support
|
||||
|
||||
### Phase 4: Batch Processing
|
||||
- ✅ Created `src/generation/job_config.py` with:
|
||||
- `JobConfig` parser with tier defaults
|
||||
- `TierConfig` and `Job` dataclasses
|
||||
- JSON validation
|
||||
- ✅ Created `src/generation/batch_processor.py` with:
|
||||
- `BatchProcessor` class
|
||||
- Progress logging to console
|
||||
- Error handling and continue-on-error support
|
||||
- Statistics tracking
|
||||
|
||||
### Phase 5: CLI Integration
|
||||
- ✅ Added `generate-batch` command to `src/cli/commands.py`
|
||||
- ✅ Command options:
|
||||
- `--job-file` (required)
|
||||
- `--username` / `--password` for authentication
|
||||
- `--debug` for saving AI responses
|
||||
- `--continue-on-error` flag
|
||||
- `--model` selection (default: gpt-4o-mini)
|
||||
|
||||
### Phase 6: Testing & Validation
|
||||
- ✅ Created unit tests:
|
||||
- `tests/unit/test_job_config.py` (9 tests)
|
||||
- `tests/unit/test_content_generator.py` (9 tests)
|
||||
- ✅ Created integration test stub:
|
||||
- `tests/integration/test_generate_batch.py` (2 tests)
|
||||
- ✅ Created example job files:
|
||||
- `jobs/example_tier1_batch.json`
|
||||
- `jobs/example_multi_tier_batch.json`
|
||||
- `jobs/README.md` (comprehensive documentation)
|
||||
|
||||
### Phase 7: Cleanup & Documentation
|
||||
- ✅ Deprecated old `src/generation/rule_engine.py`
|
||||
- ✅ Updated documentation:
|
||||
- `docs/architecture/workflows.md` - Added generation workflow diagram
|
||||
- `docs/architecture/components.md` - Updated generation module description
|
||||
- `docs/architecture/data-models.md` - Updated GeneratedContent model
|
||||
- `docs/stories/story-2.2. simplified-ai-content-generation.md` - Marked as Completed
|
||||
- ✅ Updated `.gitignore` to exclude `debug_output/`
|
||||
- ✅ Updated `env.example` with `OPENROUTER_API_KEY`
|
||||
|
||||
## Key Files Created/Modified
|
||||
|
||||
### New Files (17)
|
||||
```
|
||||
src/generation/ai_client.py
|
||||
src/generation/service.py
|
||||
src/generation/job_config.py
|
||||
src/generation/batch_processor.py
|
||||
src/generation/prompts/title_generation.json
|
||||
src/generation/prompts/outline_generation.json
|
||||
src/generation/prompts/content_generation.json
|
||||
src/generation/prompts/content_augmentation.json
|
||||
jobs/example_tier1_batch.json
|
||||
jobs/example_multi_tier_batch.json
|
||||
jobs/README.md
|
||||
tests/unit/test_job_config.py
|
||||
tests/unit/test_content_generator.py
|
||||
tests/integration/test_generate_batch.py
|
||||
IMPLEMENTATION_SUMMARY.md
|
||||
```
|
||||
|
||||
### Modified Files (7)
|
||||
```
|
||||
src/database/models.py (added GeneratedContent model)
|
||||
src/database/repositories.py (added GeneratedContentRepository)
|
||||
src/cli/commands.py (added generate-batch command)
|
||||
src/generation/rule_engine.py (deprecated)
|
||||
docs/architecture/workflows.md (updated)
|
||||
docs/architecture/components.md (updated)
|
||||
docs/architecture/data-models.md (updated)
|
||||
docs/stories/story-2.2. simplified-ai-content-generation.md (marked complete)
|
||||
.gitignore (added debug_output/)
|
||||
env.example (added OPENROUTER_API_KEY)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### 1. Set up environment
|
||||
```bash
|
||||
# Copy env.example to .env and add your OpenRouter API key
|
||||
cp env.example .env
|
||||
# Edit .env and set OPENROUTER_API_KEY
|
||||
```
|
||||
|
||||
### 2. Initialize database
|
||||
```bash
|
||||
python scripts/init_db.py
|
||||
```
|
||||
|
||||
### 3. Create a project (if not exists)
|
||||
```bash
|
||||
python main.py ingest-cora --file path/to/cora.xlsx --name "My Project"
|
||||
```
|
||||
|
||||
### 4. Run batch generation
|
||||
```bash
|
||||
python main.py generate-batch --job-file jobs/example_tier1_batch.json
|
||||
```
|
||||
|
||||
### 5. With debug output
|
||||
```bash
|
||||
python main.py generate-batch --job-file jobs/example_tier1_batch.json --debug
|
||||
```
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Three-Stage Pipeline
|
||||
1. **Title Generation**: Uses keyword + entities + related searches
|
||||
2. **Outline Generation**: JSON-formatted with H2/H3 structure, validated against min/max constraints
|
||||
3. **Content Generation**: Full HTML fragment based on outline
|
||||
|
||||
### Simplification Wins
|
||||
- No complex rule engine
|
||||
- Single word count validation (min/max from job file)
|
||||
- One-attempt augmentation if below minimum
|
||||
- Job file controls all operational parameters
|
||||
- Tier defaults for common configurations
|
||||
|
||||
### Error Handling
|
||||
- Network errors: 3 retries with exponential backoff
|
||||
- Rate limits: Respects retry-after headers
|
||||
- Failed articles: Saved with status='failed', can continue processing with `--continue-on-error`
|
||||
- Database errors: Always abort (data integrity)
|
||||
|
||||
## Testing
|
||||
|
||||
Run tests with:
|
||||
```bash
|
||||
pytest tests/unit/test_job_config.py -v
|
||||
pytest tests/unit/test_content_generator.py -v
|
||||
pytest tests/integration/test_generate_batch.py -v
|
||||
```
|
||||
|
||||
## Next Steps (Future Stories)
|
||||
|
||||
- Story 2.3: Interlinking integration
|
||||
- Story 3.x: Template selection
|
||||
- Story 4.x: Deployment integration
|
||||
- Expand test coverage (currently basic tests only)
|
||||
|
||||
## Success Criteria Met
|
||||
|
||||
All acceptance criteria from Story 2.2 have been met:
|
||||
|
||||
✅ 1. Batch Job Control - Job file specifies all tier parameters
|
||||
✅ 2. Three-Stage Generation - Title → Outline → Content pipeline
|
||||
✅ 3. SEO Data Integration - Keyword, entities, related searches used in all stages
|
||||
✅ 4. Word Count Validation - Validates against min/max from job file
|
||||
✅ 5. Simple Augmentation - Single attempt if below minimum
|
||||
✅ 6. Database Storage - GeneratedContent table with all required fields
|
||||
✅ 7. CLI Execution - generate-batch command with progress logging
|
||||
|
||||
## Estimated Implementation Time
|
||||
- Total: ~20-29 hours (as estimated in task breakdown)
|
||||
- Actual: Completed in single session with comprehensive implementation
|
||||
|
||||
## Notes
|
||||
|
||||
- OpenRouter API key required in environment
|
||||
- Debug output saved to `debug_output/` when `--debug` flag used
|
||||
- Job files support multiple projects and tiers
|
||||
- Tier defaults can be fully or partially overridden
|
||||
- HTML output is fragment format (no <html>, <head>, or <body> tags)
|
||||
- Word count strips HTML tags and counts text words only
|
||||
|
||||
|
|
@ -1,192 +0,0 @@
|
|||
# Story 3.1: URL Generation and Site Assignment - COMPLETE
|
||||
|
||||
## Status: ✅ IMPLEMENTATION COMPLETE
|
||||
|
||||
All acceptance criteria met. 44 tests passing. Ready for use.
|
||||
|
||||
---
|
||||
|
||||
## What I Built
|
||||
|
||||
### Core Functionality
|
||||
1. **Site Assignment System** with full priority logic
|
||||
2. **URL Generation** with intelligent slug creation
|
||||
3. **Auto-Site Creation** via bunny.net API
|
||||
4. **Keyword-Based Provisioning** for targeted site creation
|
||||
5. **Flexible Hostname Support** (custom domains OR bcdn-only)
|
||||
|
||||
### Priority Assignment Rules Implemented
|
||||
- **Tier1**: Preferred → Keyword → Random
|
||||
- **Tier2+**: Keyword → Random
|
||||
- **Auto-create** when pool insufficient (optional)
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Migrate Your Database
|
||||
```bash
|
||||
mysql -u user -p database < scripts/migrate_story_3.1.sql
|
||||
```
|
||||
|
||||
### 2. Import Your 400+ Bunny.net Sites
|
||||
```bash
|
||||
uv run python main.py sync-sites --admin-user your_admin
|
||||
```
|
||||
|
||||
### 3. Use New Features
|
||||
```python
|
||||
from src.generation.site_assignment import assign_sites_to_batch
|
||||
from src.generation.url_generator import generate_urls_for_batch
|
||||
|
||||
# Assign sites to articles
|
||||
assign_sites_to_batch(articles, job, site_repo, bunny_client, "project-keyword")
|
||||
|
||||
# Generate URLs
|
||||
urls = generate_urls_for_batch(articles, site_repo)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Results
|
||||
|
||||
```
|
||||
44 tests passing:
|
||||
✅ 14 URL generator tests
|
||||
✅ 8 Site provisioning tests
|
||||
✅ 9 Site assignment tests
|
||||
✅ 8 Job config tests
|
||||
✅ 5 Integration tests
|
||||
```
|
||||
|
||||
Run tests:
|
||||
```bash
|
||||
uv run pytest tests/unit/test_url_generator.py \
|
||||
tests/unit/test_site_provisioning.py \
|
||||
tests/unit/test_site_assignment.py \
|
||||
tests/unit/test_job_config_extensions.py \
|
||||
tests/integration/test_story_3_1_integration.py -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
### New Modules (3):
|
||||
- `src/generation/site_provisioning.py` - Bunny.net site creation
|
||||
- `src/generation/url_generator.py` - URL and slug generation
|
||||
- `src/generation/site_assignment.py` - Site assignment with priority system
|
||||
|
||||
### Modified Core Files (6):
|
||||
- `src/database/models.py` - Nullable custom_hostname
|
||||
- `src/database/interfaces.py` - Updated interface
|
||||
- `src/database/repositories.py` - New methods
|
||||
- `src/templating/service.py` - Hostname flexibility
|
||||
- `src/cli/commands.py` - Import all sites
|
||||
- `src/generation/job_config.py` - New config fields
|
||||
|
||||
### Tests (5 new files):
|
||||
- `tests/unit/test_url_generator.py`
|
||||
- `tests/unit/test_site_provisioning.py`
|
||||
- `tests/unit/test_site_assignment.py`
|
||||
- `tests/unit/test_job_config_extensions.py`
|
||||
- `tests/integration/test_story_3_1_integration.py`
|
||||
|
||||
### Documentation (3):
|
||||
- `STORY_3.1_IMPLEMENTATION_SUMMARY.md` - Detailed documentation
|
||||
- `STORY_3.1_QUICKSTART.md` - Quick start guide
|
||||
- `jobs/example_story_3.1_full_features.json` - Example config
|
||||
|
||||
### Migration (1):
|
||||
- `scripts/migrate_story_3.1.sql` - Database migration
|
||||
|
||||
---
|
||||
|
||||
## Job Config Examples
|
||||
|
||||
### Minimal (use existing sites):
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"tiers": {"tier1": {"count": 10}}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Full Features:
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"tiers": {"tier1": {"count": 10}},
|
||||
"tier1_preferred_sites": ["www.premium.com"],
|
||||
"auto_create_sites": true,
|
||||
"create_sites_for_keywords": [
|
||||
{"keyword": "engine repair", "count": 3}
|
||||
]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## URL Examples
|
||||
|
||||
### Custom Domain:
|
||||
```
|
||||
https://www.example.com/how-to-fix-your-engine.html
|
||||
```
|
||||
|
||||
### Bunny CDN Only:
|
||||
```
|
||||
https://mysite123.b-cdn.net/how-to-fix-your-engine.html
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions (Simple Over Complex)
|
||||
|
||||
✅ **Simple slug generation** - No complex character handling
|
||||
✅ **Keyword matching by site name** - No fuzzy matching
|
||||
✅ **Clear priority system** - Easy to understand and debug
|
||||
✅ **Explicit auto-creation flag** - Safe by default
|
||||
✅ **Comprehensive error messages** - Easy troubleshooting
|
||||
|
||||
❌ Deferred to technical debt:
|
||||
- Fuzzy keyword/entity matching
|
||||
- Complex ML-based site selection
|
||||
- Advanced slug optimization
|
||||
|
||||
---
|
||||
|
||||
## Production Ready
|
||||
|
||||
✅ All acceptance criteria met
|
||||
✅ Comprehensive test coverage
|
||||
✅ No linter errors
|
||||
✅ Error handling implemented
|
||||
✅ Logging at INFO level
|
||||
✅ Model-based schema (no manual migration needed in prod)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Run migration on dev database
|
||||
2. Test with `sync-sites` to import your 400+ sites
|
||||
3. Create test job config
|
||||
4. Integrate into your content generation workflow
|
||||
5. Deploy to production (model changes auto-apply)
|
||||
|
||||
---
|
||||
|
||||
## Questions?
|
||||
|
||||
See detailed docs:
|
||||
- `STORY_3.1_IMPLEMENTATION_SUMMARY.md` - Full details
|
||||
- `STORY_3.1_QUICKSTART.md` - Quick reference
|
||||
|
||||
Test job config:
|
||||
- `jobs/example_story_3.1_full_features.json`
|
||||
|
||||
|
|
@ -368,7 +368,7 @@ class TestS3StorageClient:
|
|||
content="<html>Content</html>"
|
||||
)
|
||||
|
||||
assert "test-bucket.s3.us-east-1.amazonaws.com/article.html" in result.message
|
||||
assert "test-bucket.s3-website-us-east-1.amazonaws.com/article.html" in result.message
|
||||
|
||||
@patch.dict(os.environ, {
|
||||
'AWS_ACCESS_KEY_ID': 'test-key',
|
||||
|
|
|
|||
Loading…
Reference in New Issue