Big-Link-Man/docs/technical-debt.md

9.1 KiB

Technical Debt & Future Enhancements

This document tracks technical debt, future enhancements, and features that were deferred from the MVP.


Story 1.6: Deployment Infrastructure Management

Domain Health Check / Verification Status

Priority: Medium
Epic Suggestion: Epic 4 (Deployment) or Epic 3 (Pre-deployment)
Estimated Effort: Small (1-2 days)

Problem

After importing or provisioning sites, there's no way to verify:

  • Domain ownership is still valid (user didn't let domain expire)
  • DNS configuration is correct and pointing to bunny.net
  • Custom domain is actually serving content
  • SSL certificates are valid

With 50+ domains, manual checking is impractical.

Proposed Solution

Option 1: Active Health Check

  1. Create a health check file in each Storage Zone (e.g., .health-check.txt)
  2. Periodically attempt to fetch it via the custom domain
  3. Record results in database

Option 2: Use bunny.net API

  • Check if bunny.net exposes domain verification status via API
  • Query verification status for each custom hostname

Database Changes Add health_status field to SiteDeployment table:

  • unknown - Not yet checked
  • healthy - Domain resolving and serving content
  • dns_failure - Cannot resolve domain
  • ssl_error - Certificate issues
  • unreachable - Domain not responding
  • expired - Likely domain ownership lost

Add last_health_check timestamp field.

CLI Commands

# Check single domain
check-site-health --domain www.example.com

# Check all domains
check-all-sites-health

# List unhealthy sites
list-sites --status unhealthy

Use Cases

  • Automated monitoring to detect when domains expire
  • Pre-deployment validation before pushing new content
  • Dashboard showing health of entire portfolio
  • Alert system for broken domains

Impact

  • Prevents wasted effort deploying to expired domains
  • Early detection of DNS/SSL issues
  • Better operational visibility across large domain portfolios

Story 2.3: AI-Powered Content Generation

Prompt Template A/B Testing & Optimization

Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)

Problem

Content quality and AI compliance with CORA targets varies based on prompt wording. No systematic way to:

  • Test different prompt variations
  • Compare results objectively
  • Select optimal prompts for different scenarios
  • Track which prompts work best with which models

Proposed Solution

Prompt Versioning System:

  1. Support multiple versions of each prompt template
  2. Name prompts with version suffix (e.g., title_generation_v1.json, title_generation_v2.json)
  3. Job config specifies which prompt version to use per stage

Comparison Tool:

# Generate with multiple prompt versions
compare-prompts --project-id 1 --variants v1,v2,v3 --stages title,outline

# Outputs:
# - Side-by-side content comparison
# - Validation scores
# - Augmentation requirements
# - Generation time/cost
# - Recommendation

Metrics to Track:

  • Validation pass rate
  • Augmentation frequency
  • Average attempts per stage
  • Word count variance
  • Keyword density accuracy
  • Generation time
  • API cost

Database Changes: Add prompt_version fields to GeneratedContent:

  • title_prompt_version
  • outline_prompt_version
  • content_prompt_version

Impact

  • Higher quality content
  • Reduced augmentation needs
  • Lower API costs
  • Model-specific optimizations
  • Data-driven prompt improvements

Parallel Article Generation

Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)

Problem

Articles are generated sequentially, which is slow for large batches:

  • 15 tier 1 articles: ~10-20 minutes
  • 150 tier 2 articles: ~2-3 hours

This could be parallelized since articles are independent.

Proposed Solution

Multi-threading/Multi-processing:

  1. Add --parallel N flag to generate-batch command
  2. Process N articles simultaneously
  3. Share database session pool
  4. Rate limit API calls to avoid throttling

Considerations:

  • Database connection pooling
  • OpenRouter rate limits
  • Memory usage (N concurrent AI calls)
  • Progress tracking complexity
  • Error handling across threads

Example:

# Generate 4 articles in parallel
generate-batch -j job.json --parallel 4

Impact

  • 3-4x faster for large batches
  • Better resource utilization
  • Reduced total job time

Job Folder Auto-Processing

Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Small (1-2 days)

Problem

Currently must run each job file individually. For large operations with many batches, want to:

  • Queue multiple jobs
  • Process jobs/folder automatically
  • Run overnight batches

Proposed Solution

Job Queue System:

# Process all jobs in folder
generate-batch --folder jobs/pending/

# Process and move to completed/
generate-batch --folder jobs/pending/ --move-on-complete jobs/completed/

# Watch folder for new jobs
generate-batch --watch jobs/queue/ --interval 60

Features:

  • Process jobs in order (alphabetical or by timestamp)
  • Move completed jobs to archive folder
  • Skip failed jobs or retry
  • Summary report for all jobs

Database Changes: Add JobRun table to track batch job executions:

  • job_file_path
  • start_time, end_time
  • total_articles, successful, failed
  • status (running/completed/failed)

Impact

  • Hands-off batch processing
  • Better for large-scale operations
  • Easier job management

Cost Tracking & Analytics

Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (2-4 days)

Problem

No visibility into:

  • API costs per article/batch
  • Which models are most cost-effective
  • Cost per tier/quality level
  • Budget tracking

Proposed Solution

Track API Usage:

  1. Log tokens used per API call
  2. Store in database with cost calculation
  3. Dashboard showing costs

Cost Fields in GeneratedContent:

  • title_tokens_used
  • title_cost_usd
  • outline_tokens_used
  • outline_cost_usd
  • content_tokens_used
  • content_cost_usd
  • total_cost_usd

Analytics Commands:

# Show costs for project
cost-report --project-id 1

# Compare model costs
model-cost-comparison --models claude-3.5-sonnet,gpt-4o

# Budget tracking
cost-summary --date-range 2025-10-01:2025-10-31

Reports:

  • Cost per article by tier
  • Model efficiency (cost vs quality)
  • Daily/weekly/monthly spend
  • Budget alerts

Impact

  • Cost optimization
  • Better budget planning
  • Model selection data
  • ROI tracking

Model Performance Analytics

Priority: Low
Epic Suggestion: Epic 2 (Content Generation) - Post-MVP
Estimated Effort: Medium (3-5 days)

Problem

No data on which models perform best for:

  • Different tiers
  • Different content types
  • Title vs outline vs content generation
  • Pass rates and quality scores

Proposed Solution

Performance Tracking:

  1. Track validation metrics per model
  2. Generate comparison reports
  3. Recommend optimal models for scenarios

Metrics:

  • First-attempt pass rate
  • Average attempts to success
  • Augmentation frequency
  • Validation score distributions
  • Generation time
  • Cost per successful article

Dashboard:

# Model performance report
model-performance --days 30

# Output:
Model: claude-3.5-sonnet
  Title: 98% pass rate, 1.02 avg attempts, $0.05 avg cost
  Outline: 85% pass rate, 1.35 avg attempts, $0.15 avg cost  
  Content: 72% pass rate, 1.67 avg attempts, $0.89 avg cost
  
Model: gpt-4o
  ...
  
Recommendations:
- Use claude-3.5-sonnet for titles (best pass rate)
- Use gpt-4o for content (better quality scores)

Impact

  • Data-driven model selection
  • Optimize quality vs cost
  • Identify model strengths/weaknesses
  • Better tier-model mapping

Improved Content Augmentation

Priority: Medium
Epic Suggestion: Epic 2 (Content Generation) - Enhancement
Estimated Effort: Medium (3-5 days)

Problem

Current augmentation is basic:

  • Random word insertion can break sentence flow
  • Doesn't consider context
  • Can feel unnatural
  • No quality scoring

Proposed Solution

Smarter Augmentation:

  1. Use AI to rewrite sentences with missing terms
  2. Analyze sentence structure before insertion
  3. Add quality scoring for augmented vs original
  4. User-reviewable augmentation suggestions

Example:

# Instead of: "The process involves machine learning techniques."
# Random insert: "The process involves keyword machine learning techniques."

# Smarter: "The process involves keyword-driven machine learning techniques."
# Or: "The process, focused on keyword optimization, involves machine learning."

Features:

  • Context-aware term insertion
  • Sentence rewriting option
  • A/B comparison (original vs augmented)
  • Quality scoring
  • Manual review mode

Impact

  • More natural augmented content
  • Better readability
  • Higher quality scores
  • User confidence in output

Future Sections

Add new technical debt items below as they're identified during development.