Big-Link-Man/docs/stories/story-3.4-boilerplate-site-...

20 KiB

Story 3.4: Generate Boilerplate Site Pages

Status

QA COMPLETE - Ready for Production

Story

As a developer, I want to automatically generate boilerplate about.html, contact.html, and privacy.html pages for each site in my batch, so that the navigation menu links from Story 3.3 work and the sites appear complete.

Context

  • Story 3.3 added navigation menus to all HTML templates with links to:
    • /index.html (homepage)
    • about.html (about page)
    • privacy.html (privacy policy)
    • contact.html (contact page)
  • Currently, these pages don't exist, creating broken links
  • Each site needs its own set of these pages
  • Pages should use the same template as the articles (basic/modern/classic/minimal)
  • Content should be generic but professional enough for a real site
  • Privacy policy needs to be comprehensive and legally sound (generic template)

Acceptance Criteria

Core Functionality

  • A function generates the three boilerplate pages for a given site
  • Pages are created AFTER articles are generated but BEFORE deployment
  • Each page uses the same template as the articles for that site
  • Pages are stored in the database for deployment
  • Pages are associated with the correct site (via site_deployment_id)

Page Content Requirements

About Page (about.html)

  • Empty page with just the template applied
  • No content text required (just template navigation/structure)
  • User can add content later if needed

Contact Page (contact.html)

  • Empty page with just the template applied
  • No content text required (just template navigation/structure)
  • User can add content later if needed

Privacy Policy (privacy.html)

  • Option 1 (Minimal): Empty page like about/contact
  • No content text required (just template navigation/structure)
  • User can add content later if needed

Decision: Start with Option 1 (empty pages) for all three pages. Privacy policy content can be added later via backfill update or manual edit if needed.

Template Integration

  • Use same template engine as article content (src/templating/service.py)
  • Read template from site.template_name field in database
  • Pages use same template as articles on the same site (consistent look)
  • Include navigation menu (which will link to these same pages)

Database Storage

  • Create new site_pages table (clean separation from articles):
    • id, site_deployment_id, page_type, content, created_at, updated_at
    • Foreign key to site_deployments with CASCADE delete
    • Unique constraint on (site_deployment_id, page_type)
    • Indexes on site_deployment_id and page_type
  • Each site can have one of each page type (about, contact, privacy)
  • Pages are fundamentally different from articles, deserve own table

URL Generation

  • Pages use simple filenames: about.html, contact.html, privacy.html
  • Full URLs: https://{hostname}/about.html
  • No slug generation needed (fixed filenames)
  • Pages tracked separately from article URLs

Integration Point

  • Hook into batch generation workflow in src/generation/batch_processor.py
  • After site assignment (Story 3.1) and before deployment (Epic 4)
  • Generate pages ONLY for newly created sites (not existing sites)
  • One-time backfill script to add pages to all existing imported sites

Two Use Cases

  1. One-time backfill: Script to generate pages for all existing sites in database (hundreds of sites)
  2. Ongoing generation: Automatically generate pages only when new sites are created (provision-site, auto_create_sites, etc.)

Tasks / Subtasks

1. Create SitePage Database Table

Effort: 2 story points

  • Create new site_pages table with schema:
    • id, site_deployment_id, page_type, content, created_at, updated_at
  • Add SitePage model to src/database/models.py
  • Create migration script scripts/migrate_add_site_pages.py
  • Add unique constraint on (site_deployment_id, page_type)
  • Add indexes on site_deployment_id and page_type
  • Add CASCADE delete (if site deleted, pages deleted)
  • Test migration on development database

2. Create SitePage Repository

Effort: 2 story points

  • Create ISitePageRepository interface in src/database/interfaces.py:
    • create(site_deployment_id, page_type, content) -> SitePage
    • get_by_site(site_deployment_id) -> List[SitePage]
    • get_by_site_and_type(site_deployment_id, page_type) -> Optional[SitePage]
    • update_content(page_id, content) -> SitePage
    • exists(site_deployment_id, page_type) -> bool
    • delete(page_id) -> bool
  • Implement SitePageRepository in src/database/repositories.py
  • Add to repository factory/dependency injection

3. Create Page Content Templates (SIMPLIFIED)

Effort: 1 story point (reduced from 3)

  • Create src/generation/page_templates.py module
  • Implement get_page_content(page_type: str, domain: str) -> str:
    • Returns just a heading: <h1>About Us</h1>, <h1>Contact</h1>, <h1>Privacy Policy</h1>
    • All three pages use same heading-only approach
    • No other content text
  • No need for extensive content generation
  • Pages are just placeholders until user adds content manually

4. Implement Page Generation Logic (SIMPLIFIED)

Effort: 2 story points (reduced from 3)

  • Create src/generation/site_page_generator.py module
  • Implement generate_site_pages(site_deployment: SiteDeployment, template_name: str, page_repo, template_service) -> List[SitePage]:
    • Get domain from site (custom_hostname or bcdn_hostname)
    • For each page type (about, contact, privacy):
      • Get heading-only content from page_templates.py
      • Wrap heading in HTML template using template_service
      • Store page in database
    • Return list of created pages
  • Pages have just heading (e.g., <h1>About Us</h1>) wrapped in template
  • Log page generation at INFO level

5. Integrate with Site Creation (Not Batch Processor)

Effort: 2 story points

  • Update src/generation/site_provisioning.py:
    • After creating new site via bunny.net API, generate boilerplate pages
    • Call generate_site_pages() immediately after site creation
    • Log page generation results
  • Update provision-site CLI command:
    • Generate pages after site is provisioned
  • Handle errors gracefully (log warning if page generation fails, continue with site creation)
  • DO NOT generate pages in batch processor (only for new sites, not existing sites)

6. Update Template Service (No Changes Needed)

Effort: 0 story points

  • Template service already handles simple content
  • Just pass heading HTML through existing format_content() method
  • No changes needed to template service

7. Create Backfill Script for Existing Sites

Effort: 2 story points

  • Create scripts/backfill_site_pages.py:
    • Query all sites in database that don't have pages
    • For each site: generate about, contact, privacy pages
    • Use default template (or infer from site name if possible)
    • Progress reporting (e.g., "Generating pages for site 50/400...")
    • Dry-run mode to preview changes
    • CLI arguments: --dry-run, --template, --batch-size
  • Add error handling for individual site failures (continue with next site)
  • Log results: successful, failed, skipped counts

8. Homepage Generation (Optional - Deferred)

Effort: 2 story points (if implemented)

  • DEFER to Epic 4 or later
  • Homepage (index.html) requires knowing all articles on the site
  • Not needed for Story 3.4 (navigation menu links to /index.html can 404 for now)
  • Document in technical notes

9. Unit Tests (SIMPLIFIED)

Effort: 2 story points (reduced from 3)

  • Test heading-only page content generation
  • Test domain extraction from SiteDeployment (custom vs bcdn hostname)
  • Test page HTML wrapping with each template type
  • Test SitePage repository CRUD operations
  • Test duplicate page prevention (unique constraint)
  • Test page generation for single site
  • Test backfill script logic
  • Mock template service and repositories
  • Achieve >80% code coverage for new modules

10. Integration Tests (SIMPLIFIED)

Effort: 1 story point (reduced from 2)

  • Test site creation triggers page generation
  • Test with different template types (basic, modern, classic, minimal)
  • Test with custom domain sites vs bunny.net-only sites
  • Test pages stored correctly in database
  • Test backfill script on real database
  • Verify navigation menu links work (pages exist at expected paths)

Technical Notes

SitePage Model

class SitePage(Base):
    __tablename__ = "site_pages"
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    site_deployment_id: Mapped[int] = mapped_column(
        Integer, 
        ForeignKey('site_deployments.id', ondelete='CASCADE'), 
        nullable=False
    )
    page_type: Mapped[str] = mapped_column(String(20), nullable=False)  # about, contact, privacy, homepage
    content: Mapped[str] = mapped_column(Text, nullable=False)  # Full HTML
    created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
    updated_at: Mapped[datetime] = mapped_column(
        DateTime, 
        default=datetime.utcnow, 
        onupdate=datetime.utcnow, 
        nullable=False
    )
    
    # Relationships
    site_deployment: Mapped["SiteDeployment"] = relationship("SiteDeployment", back_populates="pages")
    
    # Unique constraint
    __table_args__ = (
        UniqueConstraint('site_deployment_id', 'page_type', name='uq_site_page_type'),
    )

Database Migration

CREATE TABLE site_pages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    site_deployment_id INTEGER NOT NULL,
    page_type VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (site_deployment_id) REFERENCES site_deployments(id) ON DELETE CASCADE,
    UNIQUE (site_deployment_id, page_type)
);

CREATE INDEX idx_site_pages_site ON site_pages(site_deployment_id);
CREATE INDEX idx_site_pages_type ON site_pages(page_type);

Page Content Template Examples (SIMPLIFIED)

Implementation - Heading Only

# src/generation/page_templates.py

def get_page_content(page_type: str, domain: str) -> str:
    """
    Generate minimal content for boilerplate pages.
    Just a heading - no other content text.
    """
    page_titles = {
        "about": "About Us",
        "contact": "Contact",
        "privacy": "Privacy Policy"
    }
    return f"<h1>{page_titles.get(page_type, page_type.title())}</h1>"

Result - Heading Only Example

<!DOCTYPE html>
<html>
<head>
    <title>About Us</title>
    <!-- Same template/styling as articles -->
</head>
<body>
    <nav>
        <ul>
            <li><a href="/index.html">Home</a></li>
            <li><a href="about.html">About</a></li>
            <li><a href="privacy.html">Privacy</a></li>
            <li><a href="contact.html">Contact</a></li>
        </ul>
    </nav>
    
    <main>
        <h1>About Us</h1>
        <!-- No other content - user can add later if needed -->
    </main>
</body>
</html>

Why Heading-Only Pages Work

  1. Fixes broken nav links - Pages exist, no 404 errors
  2. Better UX than completely empty - User sees something when they click the link
  3. User can customize - Add content manually later for specific sites
  4. Minimal effort - No need to generate/maintain generic content
  5. Deployment ready - Pages can be deployed as-is
  6. Future enhancement - Can add content generation later if needed

Integration with Site Creation

# In src/generation/site_provisioning.py

def create_bunnynet_site(name_prefix: str, region: str = "DE", template: str = "basic"):
    # Step 1: Create Storage Zone
    storage = bunny_client.create_storage_zone(...)
    
    # Step 2: Create Pull Zone
    pull = bunny_client.create_pull_zone(...)
    
    # Step 3: Save to database
    site = site_repo.create(...)
    
    # Step 4: Generate boilerplate pages (NEW - Story 3.4)
    logger.info(f"Generating boilerplate pages for new site {site.id}...")
    try:
        generate_site_pages(site, template, page_repo, template_service)
        logger.info(f"Successfully created about, contact, privacy pages for site {site.id}")
    except Exception as e:
        logger.warning(f"Failed to generate pages for site {site.id}: {e}")
        # Don't fail site creation if page generation fails
    
    return site

Backfill Script Usage

# One-time backfill for all existing sites (dry-run first)
uv run python scripts/backfill_site_pages.py \
  --username admin \
  --password yourpass \
  --template basic \
  --dry-run

# Output:
# Found 423 sites without boilerplate pages
# [DRY RUN] Would generate pages for site 1 (www.example.com)
# [DRY RUN] Would generate pages for site 2 (site123.b-cdn.net)
# ...
# [DRY RUN] Total: 423 sites would be updated

# Actually generate pages
uv run python scripts/backfill_site_pages.py \
  --username admin \
  --password yourpass \
  --template basic

# Output:
# Generating pages for site 1/423 (www.example.com)... ✓
# Generating pages for site 2/423 (site123.b-cdn.net)... ✓
# ...
# Complete: 423 successful, 0 failed, 0 skipped

# Use different template per site (default: basic)
uv run python scripts/backfill_site_pages.py \
  --username admin \
  --password yourpass \
  --template modern \
  --batch-size 50  # Process 50 sites at a time

Page URL Structure

Homepage:       https://example.com/index.html
About:          https://example.com/about.html
Contact:        https://example.com/contact.html
Privacy:        https://example.com/privacy.html
Article 1:      https://example.com/how-to-fix-engines.html
Article 2:      https://example.com/engine-maintenance-tips.html

Template Application Example

# For articles (existing)
template_service.apply_template(
    content=article.content,
    template_name="modern",
    title=article.title,
    meta_description=article.meta_description,
    url=article_url
)

# For pages (new)
template_service.apply_template_to_page(
    content=page_content,  # Markdown or HTML from page_templates.py
    template_name="modern",
    page_title="About Us",  # Static title
    domain=site.custom_hostname or site.pull_zone_bcdn_hostname
)

Backfill Script Implementation

# scripts/backfill_site_pages.py

def backfill_site_pages(
    page_repo, 
    site_repo, 
    template_service, 
    template: str = "basic",
    dry_run: bool = False,
    batch_size: int = 100
):
    """Generate boilerplate pages for all sites that don't have them"""
    
    # Get all sites
    all_sites = site_repo.get_all()
    logger.info(f"Found {len(all_sites)} total sites in database")
    
    # Filter to sites without pages
    sites_needing_pages = []
    for site in all_sites:
        existing_pages = page_repo.get_by_site(site.id)
        if len(existing_pages) < 3:  # Should have about, contact, privacy
            sites_needing_pages.append(site)
    
    logger.info(f"Found {len(sites_needing_pages)} sites without boilerplate pages")
    
    if dry_run:
        for site in sites_needing_pages:
            domain = site.custom_hostname or site.pull_zone_bcdn_hostname
            logger.info(f"[DRY RUN] Would generate pages for site {site.id} ({domain})")
        logger.info(f"[DRY RUN] Total: {len(sites_needing_pages)} sites would be updated")
        return
    
    # Generate pages for each site
    successful = 0
    failed = 0
    
    for idx, site in enumerate(sites_needing_pages, 1):
        domain = site.custom_hostname or site.pull_zone_bcdn_hostname
        logger.info(f"Generating pages for site {idx}/{len(sites_needing_pages)} ({domain})...")
        
        try:
            generate_site_pages(site, template, page_repo, template_service)
            successful += 1
        except Exception as e:
            logger.error(f"Failed to generate pages for site {site.id}: {e}")
            failed += 1
        
        # Progress checkpoint every batch_size sites
        if idx % batch_size == 0:
            logger.info(f"Progress: {idx}/{len(sites_needing_pages)} sites processed")
    
    logger.info(f"Complete: {successful} successful, {failed} failed")

Domain Extraction

def get_domain_from_site(site_deployment: SiteDeployment) -> str:
    """Extract domain for use in page content (email addresses, etc.)"""
    if site_deployment.custom_hostname:
        return site_deployment.custom_hostname
    else:
        return site_deployment.pull_zone_bcdn_hostname

The privacy policy template should be:

  • Generic enough to apply to blog/content sites
  • Comprehensive enough to cover common scenarios (cookies, analytics, third-party links)
  • NOT legal advice - users should consult a lawyer for specific requirements
  • Include standard disclaimers
  • Regularly reviewed and updated (document version/date)

Recommended approach: Use a well-tested generic template from a reputable source (e.g., Privacy Policy Generator) and adapt it to fit our template structure.

Dependencies

  • Story 3.1: Site assignment must be complete (need to know which sites are in use)
  • Story 3.3: Navigation menu is already in templates (pages fulfill those links)
  • Story 2.4: Template service exists and can apply HTML templates
  • Story 1.6: SiteDeployment table exists

Future Considerations

  • Story 4.1 will deploy these pages along with articles
  • Future: Custom page content per project (override generic templates)
  • Future: Homepage generation with dynamic article listing
  • Future: Allow users to edit boilerplate page content via CLI or web interface
  • Future: Additional pages (terms of service, disclaimer, etc.)
  • Future: Page templates with more customization options (site name, tagline, etc.)

Deferred to Later

  • Homepage (index.html) generation - Could be part of this story or deferred to Epic 4
    • If generated here: Simple page listing all articles on the site
    • If deferred: Epic 4 deployment could create a basic redirect or placeholder
  • Custom page content per project - Allow projects to override default templates
  • Multi-language support - Generate pages in different languages based on project settings

Total Effort

14 story points (reduced from 20 due to heading-only simplification and no template service changes)

Effort Breakdown

  1. Database Schema (2 points) - site_pages table only
  2. Repository Layer (2 points) - SitePageRepository
  3. Page Content Templates (1 point) - heading-only
  4. Generation Logic (2 points) - reads site.template_name from DB
  5. Site Creation Integration (2 points)
  6. Template Service Updates (0 points) - no changes needed
  7. Backfill Script (2 points)
  8. Homepage Generation (deferred)
  9. Unit Tests (2 points)
  10. Integration Tests (1 point)

Total: 14 story points

Effort Reduction

Original estimate: 20 story points (with full page content) Simplified (heading-only pages): 15 story points Savings: 5 story points (no complex content generation needed)

Notes

  • Pages should be visually consistent with articles (same template)
  • Pages have heading only - just <h1> tag, no body content
  • Better UX than completely empty (user sees page title when they click nav link)
  • User can manually add content later for specific sites if desired
  • Pages are generated once per site at creation time
  • Future enhancement: Add content generation for privacy policy if legally required
  • Future enhancement: CLI command to update page content for specific sites