13 KiB
Technical Debt & Future Enhancements
This document tracks technical debt, future enhancements, and features that were deferred from the MVP.
Story 1.6: Deployment Infrastructure Management
Domain Health Check / Verification Status
Priority: Medium
Epic Suggestion: Epic 4 (Deployment) or Epic 3 (Pre-deployment)
Estimated Effort: Small (1-2 days)
Problem
After importing or provisioning sites, there's no way to verify:
- Domain ownership is still valid (user didn't let domain expire)
- DNS configuration is correct and pointing to bunny.net
- Custom domain is actually serving content
- SSL certificates are valid
With 50+ domains, manual checking is impractical.
Proposed Solution
Option 1: Active Health Check
- Create a health check file in each Storage Zone (e.g.,
.health-check.txt) - Periodically attempt to fetch it via the custom domain
- Record results in database
Option 2: Use bunny.net API
- Check if bunny.net exposes domain verification status via API
- Query verification status for each custom hostname
Database Changes
Add health_status field to SiteDeployment table:
unknown- Not yet checkedhealthy- Domain resolving and serving contentdns_failure- Cannot resolve domainssl_error- Certificate issuesunreachable- Domain not respondingexpired- Likely domain ownership lost
Add last_health_check timestamp field.
CLI Commands
# Check single domain
check-site-health --domain www.example.com
# Check all domains
check-all-sites-health
# List unhealthy sites
list-sites --status unhealthy
Use Cases
- Automated monitoring to detect when domains expire
- Pre-deployment validation before pushing new content
- Dashboard showing health of entire portfolio
- Alert system for broken domains
Impact
- Prevents wasted effort deploying to expired domains
- Early detection of DNS/SSL issues
- Better operational visibility across large domain portfolios
Story 2.3: AI-Powered Content Generation
Prompt Template A/B Testing & Optimization
Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)
Problem
Content quality and AI compliance with CORA targets varies based on prompt wording. No systematic way to:
- Test different prompt variations
- Compare results objectively
- Select optimal prompts for different scenarios
- Track which prompts work best with which models
Proposed Solution
Prompt Versioning System:
- Support multiple versions of each prompt template
- Name prompts with version suffix (e.g.,
title_generation_v1.json,title_generation_v2.json) - Job config specifies which prompt version to use per stage
Comparison Tool:
# Generate with multiple prompt versions
compare-prompts --project-id 1 --variants v1,v2,v3 --stages title,outline
# Outputs:
# - Side-by-side content comparison
# - Validation scores
# - Augmentation requirements
# - Generation time/cost
# - Recommendation
Metrics to Track:
- Validation pass rate
- Augmentation frequency
- Average attempts per stage
- Word count variance
- Keyword density accuracy
- Generation time
- API cost
Database Changes:
Add prompt_version fields to GeneratedContent:
title_prompt_versionoutline_prompt_versioncontent_prompt_version
Impact
- Higher quality content
- Reduced augmentation needs
- Lower API costs
- Model-specific optimizations
- Data-driven prompt improvements
Parallel Article Generation
Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)
Problem
Articles are generated sequentially, which is slow for large batches:
- 15 tier 1 articles: ~10-20 minutes
- 150 tier 2 articles: ~2-3 hours
This could be parallelized since articles are independent.
Proposed Solution
Multi-threading/Multi-processing:
- Add
--parallel Nflag togenerate-batchcommand - Process N articles simultaneously
- Share database session pool
- Rate limit API calls to avoid throttling
Considerations:
- Database connection pooling
- OpenRouter rate limits
- Memory usage (N concurrent AI calls)
- Progress tracking complexity
- Error handling across threads
Example:
# Generate 4 articles in parallel
generate-batch -j job.json --parallel 4
Impact
- 3-4x faster for large batches
- Better resource utilization
- Reduced total job time
Job Folder Auto-Processing
Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Small (1-2 days)
Problem
Currently must run each job file individually. For large operations with many batches, want to:
- Queue multiple jobs
- Process jobs/folder automatically
- Run overnight batches
Proposed Solution
Job Queue System:
# Process all jobs in folder
generate-batch --folder jobs/pending/
# Process and move to completed/
generate-batch --folder jobs/pending/ --move-on-complete jobs/completed/
# Watch folder for new jobs
generate-batch --watch jobs/queue/ --interval 60
Features:
- Process jobs in order (alphabetical or by timestamp)
- Move completed jobs to archive folder
- Skip failed jobs or retry
- Summary report for all jobs
Database Changes:
Add JobRun table to track batch job executions:
- job_file_path
- start_time, end_time
- total_articles, successful, failed
- status (running/completed/failed)
Impact
- Hands-off batch processing
- Better for large-scale operations
- Easier job management
Cost Tracking & Analytics
Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (2-4 days)
Problem
No visibility into:
- API costs per article/batch
- Which models are most cost-effective
- Cost per tier/quality level
- Budget tracking
Proposed Solution
Track API Usage:
- Log tokens used per API call
- Store in database with cost calculation
- Dashboard showing costs
Cost Fields in GeneratedContent:
title_tokens_usedtitle_cost_usdoutline_tokens_usedoutline_cost_usdcontent_tokens_usedcontent_cost_usdtotal_cost_usd
Analytics Commands:
# Show costs for project
cost-report --project-id 1
# Compare model costs
model-cost-comparison --models claude-3.5-sonnet,gpt-4o
# Budget tracking
cost-summary --date-range 2025-10-01:2025-10-31
Reports:
- Cost per article by tier
- Model efficiency (cost vs quality)
- Daily/weekly/monthly spend
- Budget alerts
Impact
- Cost optimization
- Better budget planning
- Model selection data
- ROI tracking
Model Performance Analytics
Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)
Problem
No data on which models perform best for:
- Different tiers
- Different content types
- Title vs outline vs content generation
- Pass rates and quality scores
Proposed Solution
Performance Tracking:
- Track validation metrics per model
- Generate comparison reports
- Recommend optimal models for scenarios
Metrics:
- First-attempt pass rate
- Average attempts to success
- Augmentation frequency
- Validation score distributions
- Generation time
- Cost per successful article
Dashboard:
# Model performance report
model-performance --days 30
# Output:
Model: claude-3.5-sonnet
Title: 98% pass rate, 1.02 avg attempts, $0.05 avg cost
Outline: 85% pass rate, 1.35 avg attempts, $0.15 avg cost
Content: 72% pass rate, 1.67 avg attempts, $0.89 avg cost
Model: gpt-4o
...
Recommendations:
- Use claude-3.5-sonnet for titles (best pass rate)
- Use gpt-4o for content (better quality scores)
Impact
- Data-driven model selection
- Optimize quality vs cost
- Identify model strengths/weaknesses
- Better tier-model mapping
Improved Content Augmentation
Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Enhancement
Estimated Effort: Medium (3-5 days)
Problem
Current augmentation is basic:
- Random word insertion can break sentence flow
- Doesn't consider context
- Can feel unnatural
- No quality scoring
Proposed Solution
Smarter Augmentation:
- Use AI to rewrite sentences with missing terms
- Analyze sentence structure before insertion
- Add quality scoring for augmented vs original
- User-reviewable augmentation suggestions
Example:
# Instead of: "The process involves machine learning techniques."
# Random insert: "The process involves keyword machine learning techniques."
# Smarter: "The process involves keyword-driven machine learning techniques."
# Or: "The process, focused on keyword optimization, involves machine learning."
Features:
- Context-aware term insertion
- Sentence rewriting option
- A/B comparison (original vs augmented)
- Quality scoring
- Manual review mode
Impact
- More natural augmented content
- Better readability
- Higher quality scores
- User confidence in output
Story 3.1: URL Generation and Site Assignment
Fuzzy Keyword/Entity Matching for Site Assignment
Priority: Medium
Epic Suggestion: Epic 3 (Pre-deployment) - Enhancement
Estimated Effort: Medium (5-8 story points)
Problem
Currently tier1 site assignment uses:
- Explicit preferred sites from job config
- Random selection from available pool
This doesn't leverage semantic matching between article content and site domains/names. For SEO and organizational purposes, it would be valuable to assign articles to sites based on topic/keyword relevance.
Proposed Solution
Intelligent Site Matching:
- Extract article keywords and entities from GeneratedContent
- Parse keywords/entities from site hostnames and names
- Score each (article, site) pair based on keyword/entity overlap
- Assign tier1 articles to highest-scoring available sites
- Fall back to random if no good matches
Example:
Article: "Engine Repair Basics"
Keywords: ["engine repair", "automotive", "maintenance"]
Entities: ["engine", "carburetor", "cylinder"]
Available Sites:
- auto-repair-tips.com Score: 0.85 (high match)
- engine-maintenance-guide.com Score: 0.92 (very high match)
- cooking-recipes.com Score: 0.05 (low match)
Assignment: engine-maintenance-guide.com (best match)
Implementation Details:
- Scoring algorithm: weighted combination of keyword match + entity match
- Fuzzy matching: use Levenshtein distance or similar for partial matches
- Track assignments to avoid reusing sites within same batch
- Configurable threshold (e.g., only assign if score > 0.5, else random)
Job Configuration:
{
"tier1_site_matching": {
"enabled": true,
"min_score": 0.5,
"weight_keywords": 0.6,
"weight_entities": 0.4
}
}
Database Changes: None required - uses existing GeneratedContent fields (keyword, entities) and SiteDeployment fields (custom_hostname, site_name)
Complexity Factors
- Keyword extraction from domain names (e.g., "auto-repair-tips.com" → ["auto", "repair", "tips"])
- Entity recognition and normalization
- Scoring algorithm design and tuning
- Testing with various domain/content combinations
- Performance optimization for large site pools
Impact
- Better SEO through topical site clustering
- More organized content portfolio
- Easier to identify which sites cover which topics
- Improved content discoverability
Alternative: Simpler Keyword-Only Matching
If full fuzzy matching is too complex, start with exact keyword substring matching:
# Simple version: check if article keyword appears in hostname
if article.main_keyword.lower() in site.custom_hostname.lower():
score = 1.0
else:
score = 0.0
This would still provide value with much less complexity (2-3 story points instead of 5-8).
Story 3.3: Content Interlinking Injection
Boilerplate Site Pages (About, Contact, Privacy)
Priority: High
Epic Suggestion: Epic 3 (Pre-deployment) - Story 3.4
Estimated Effort: Medium (20 story points, 2-3 days)
Status: ✅ PROMOTED TO STORY 3.4 (specification complete)
Problem
During Story 3.3 implementation, we added navigation menus to all HTML templates with links to:
about.htmlcontact.htmlprivacy.html/index.html
However, these pages don't exist, creating broken links on every deployed site.
Impact
- Unprofessional appearance (404 errors on nav links)
- Poor user experience
- Privacy policy may be legally required for public sites
- No contact mechanism for users
Solution (Now Story 3.4)
See full specification: docs/stories/story-3.4-boilerplate-site-pages.md
Summary:
- Automatically generate boilerplate pages for each site during batch generation
- Store in new
site_pagestable - Use same template as articles for visual consistency
- Generic but professional content suitable for any niche
- Generated once per site, skip if already exists
Implementation tracked in Story 3.4.
Future Sections
Add new technical debt items below as they're identified during development.