Big-Link-Man/docs/stories/story-3.4-boilerplate-site-...

518 lines
20 KiB
Markdown

# Story 3.4: Generate Boilerplate Site Pages
## Status
**QA COMPLETE** - Ready for Production
## Story
**As a developer**, I want to automatically generate boilerplate `about.html`, `contact.html`, and `privacy.html` pages for each site in my batch, so that the navigation menu links from Story 3.3 work and the sites appear complete.
## Context
- Story 3.3 added navigation menus to all HTML templates with links to:
- `/index.html` (homepage)
- `about.html` (about page)
- `privacy.html` (privacy policy)
- `contact.html` (contact page)
- Currently, these pages don't exist, creating broken links
- Each site needs its own set of these pages
- Pages should use the same template as the articles (basic/modern/classic/minimal)
- Content should be generic but professional enough for a real site
- Privacy policy needs to be comprehensive and legally sound (generic template)
## Acceptance Criteria
### Core Functionality
- A function generates the three boilerplate pages for a given site
- Pages are created AFTER articles are generated but BEFORE deployment
- Each page uses the same template as the articles for that site
- Pages are stored in the database for deployment
- Pages are associated with the correct site (via `site_deployment_id`)
### Page Content Requirements
#### About Page (`about.html`)
- Empty page with just the template applied
- No content text required (just template navigation/structure)
- User can add content later if needed
#### Contact Page (`contact.html`)
- Empty page with just the template applied
- No content text required (just template navigation/structure)
- User can add content later if needed
#### Privacy Policy (`privacy.html`)
- **Option 1 (Minimal):** Empty page like about/contact
- No content text required (just template navigation/structure)
- User can add content later if needed
**Decision:** Start with Option 1 (empty pages) for all three pages. Privacy policy content can be added later via backfill update or manual edit if needed.
### Template Integration
- Use same template engine as article content (`src/templating/service.py`)
- Read template from `site.template_name` field in database
- Pages use same template as articles on the same site (consistent look)
- Include navigation menu (which will link to these same pages)
### Database Storage
- Create new `site_pages` table (clean separation from articles):
- `id`, `site_deployment_id`, `page_type`, `content`, `created_at`, `updated_at`
- Foreign key to `site_deployments` with CASCADE delete
- Unique constraint on (site_deployment_id, page_type)
- Indexes on site_deployment_id and page_type
- Each site can have one of each page type (about, contact, privacy)
- Pages are fundamentally different from articles, deserve own table
### URL Generation
- Pages use simple filenames: `about.html`, `contact.html`, `privacy.html`
- Full URLs: `https://{hostname}/about.html`
- No slug generation needed (fixed filenames)
- Pages tracked separately from article URLs
### Integration Point
- Hook into batch generation workflow in `src/generation/batch_processor.py`
- After site assignment (Story 3.1) and before deployment (Epic 4)
- Generate pages ONLY for newly created sites (not existing sites)
- One-time backfill script to add pages to all existing imported sites
### Two Use Cases
1. **One-time backfill**: Script to generate pages for all existing sites in database (hundreds of sites)
2. **Ongoing generation**: Automatically generate pages only when new sites are created (provision-site, auto_create_sites, etc.)
## Tasks / Subtasks
### 1. Create SitePage Database Table
**Effort:** 2 story points
- [ ] Create new `site_pages` table with schema:
- `id`, `site_deployment_id`, `page_type`, `content`, `created_at`, `updated_at`
- [ ] Add `SitePage` model to `src/database/models.py`
- [ ] Create migration script `scripts/migrate_add_site_pages.py`
- [ ] Add unique constraint on (site_deployment_id, page_type)
- [ ] Add indexes on site_deployment_id and page_type
- [ ] Add CASCADE delete (if site deleted, pages deleted)
- [ ] Test migration on development database
### 2. Create SitePage Repository
**Effort:** 2 story points
- [ ] Create `ISitePageRepository` interface in `src/database/interfaces.py`:
- `create(site_deployment_id, page_type, content) -> SitePage`
- `get_by_site(site_deployment_id) -> List[SitePage]`
- `get_by_site_and_type(site_deployment_id, page_type) -> Optional[SitePage]`
- `update_content(page_id, content) -> SitePage`
- `exists(site_deployment_id, page_type) -> bool`
- `delete(page_id) -> bool`
- [ ] Implement `SitePageRepository` in `src/database/repositories.py`
- [ ] Add to repository factory/dependency injection
### 3. Create Page Content Templates (SIMPLIFIED)
**Effort:** 1 story point (reduced from 3)
- [ ] Create `src/generation/page_templates.py` module
- [ ] Implement `get_page_content(page_type: str, domain: str) -> str`:
- Returns just a heading: `<h1>About Us</h1>`, `<h1>Contact</h1>`, `<h1>Privacy Policy</h1>`
- All three pages use same heading-only approach
- No other content text
- [ ] No need for extensive content generation
- [ ] Pages are just placeholders until user adds content manually
### 4. Implement Page Generation Logic (SIMPLIFIED)
**Effort:** 2 story points (reduced from 3)
- [ ] Create `src/generation/site_page_generator.py` module
- [ ] Implement `generate_site_pages(site_deployment: SiteDeployment, template_name: str, page_repo, template_service) -> List[SitePage]`:
- Get domain from site (custom_hostname or bcdn_hostname)
- For each page type (about, contact, privacy):
- Get heading-only content from `page_templates.py`
- Wrap heading in HTML template using `template_service`
- Store page in database
- Return list of created pages
- [ ] Pages have just heading (e.g., `<h1>About Us</h1>`) wrapped in template
- [ ] Log page generation at INFO level
### 5. Integrate with Site Creation (Not Batch Processor)
**Effort:** 2 story points
- [ ] Update `src/generation/site_provisioning.py`:
- After creating new site via bunny.net API, generate boilerplate pages
- Call `generate_site_pages()` immediately after site creation
- Log page generation results
- [ ] Update `provision-site` CLI command:
- Generate pages after site is provisioned
- [ ] Handle errors gracefully (log warning if page generation fails, continue with site creation)
- [ ] **DO NOT generate pages in batch processor** (only for new sites, not existing sites)
### 6. Update Template Service (No Changes Needed)
**Effort:** 0 story points
- [x] Template service already handles simple content
- [x] Just pass heading HTML through existing `format_content()` method
- [x] No changes needed to template service
### 7. Create Backfill Script for Existing Sites
**Effort:** 2 story points
- [ ] Create `scripts/backfill_site_pages.py`:
- Query all sites in database that don't have pages
- For each site: generate about, contact, privacy pages
- Use default template (or infer from site name if possible)
- Progress reporting (e.g., "Generating pages for site 50/400...")
- Dry-run mode to preview changes
- CLI arguments: `--dry-run`, `--template`, `--batch-size`
- [ ] Add error handling for individual site failures (continue with next site)
- [ ] Log results: successful, failed, skipped counts
### 8. Homepage Generation (Optional - Deferred)
**Effort:** 2 story points (if implemented)
- [ ] **DEFER to Epic 4 or later**
- [ ] Homepage (`index.html`) requires knowing all articles on the site
- [ ] Not needed for Story 3.4 (navigation menu links to `/index.html` can 404 for now)
- [ ] Document in technical notes
### 9. Unit Tests (SIMPLIFIED)
**Effort:** 2 story points (reduced from 3)
- [ ] Test heading-only page content generation
- [ ] Test domain extraction from SiteDeployment (custom vs bcdn hostname)
- [ ] Test page HTML wrapping with each template type
- [ ] Test SitePage repository CRUD operations
- [ ] Test duplicate page prevention (unique constraint)
- [ ] Test page generation for single site
- [ ] Test backfill script logic
- [ ] Mock template service and repositories
- [ ] Achieve >80% code coverage for new modules
### 10. Integration Tests (SIMPLIFIED)
**Effort:** 1 story point (reduced from 2)
- [ ] Test site creation triggers page generation
- [ ] Test with different template types (basic, modern, classic, minimal)
- [ ] Test with custom domain sites vs bunny.net-only sites
- [ ] Test pages stored correctly in database
- [ ] Test backfill script on real database
- [ ] Verify navigation menu links work (pages exist at expected paths)
## Technical Notes
### SitePage Model
```python
class SitePage(Base):
__tablename__ = "site_pages"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
site_deployment_id: Mapped[int] = mapped_column(
Integer,
ForeignKey('site_deployments.id', ondelete='CASCADE'),
nullable=False
)
page_type: Mapped[str] = mapped_column(String(20), nullable=False) # about, contact, privacy, homepage
content: Mapped[str] = mapped_column(Text, nullable=False) # Full HTML
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
updated_at: Mapped[datetime] = mapped_column(
DateTime,
default=datetime.utcnow,
onupdate=datetime.utcnow,
nullable=False
)
# Relationships
site_deployment: Mapped["SiteDeployment"] = relationship("SiteDeployment", back_populates="pages")
# Unique constraint
__table_args__ = (
UniqueConstraint('site_deployment_id', 'page_type', name='uq_site_page_type'),
)
```
### Database Migration
```sql
CREATE TABLE site_pages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
site_deployment_id INTEGER NOT NULL,
page_type VARCHAR(20) NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (site_deployment_id) REFERENCES site_deployments(id) ON DELETE CASCADE,
UNIQUE (site_deployment_id, page_type)
);
CREATE INDEX idx_site_pages_site ON site_pages(site_deployment_id);
CREATE INDEX idx_site_pages_type ON site_pages(page_type);
```
### Page Content Template Examples (SIMPLIFIED)
#### Implementation - Heading Only
```python
# src/generation/page_templates.py
def get_page_content(page_type: str, domain: str) -> str:
"""
Generate minimal content for boilerplate pages.
Just a heading - no other content text.
"""
page_titles = {
"about": "About Us",
"contact": "Contact",
"privacy": "Privacy Policy"
}
return f"<h1>{page_titles.get(page_type, page_type.title())}</h1>"
```
#### Result - Heading Only Example
```html
<!DOCTYPE html>
<html>
<head>
<title>About Us</title>
<!-- Same template/styling as articles -->
</head>
<body>
<nav>
<ul>
<li><a href="/index.html">Home</a></li>
<li><a href="about.html">About</a></li>
<li><a href="privacy.html">Privacy</a></li>
<li><a href="contact.html">Contact</a></li>
</ul>
</nav>
<main>
<h1>About Us</h1>
<!-- No other content - user can add later if needed -->
</main>
</body>
</html>
```
#### Why Heading-Only Pages Work
1. **Fixes broken nav links** - Pages exist, no 404 errors
2. **Better UX than completely empty** - User sees something when they click the link
3. **User can customize** - Add content manually later for specific sites
4. **Minimal effort** - No need to generate/maintain generic content
5. **Deployment ready** - Pages can be deployed as-is
6. **Future enhancement** - Can add content generation later if needed
### Integration with Site Creation
```python
# In src/generation/site_provisioning.py
def create_bunnynet_site(name_prefix: str, region: str = "DE", template: str = "basic"):
# Step 1: Create Storage Zone
storage = bunny_client.create_storage_zone(...)
# Step 2: Create Pull Zone
pull = bunny_client.create_pull_zone(...)
# Step 3: Save to database
site = site_repo.create(...)
# Step 4: Generate boilerplate pages (NEW - Story 3.4)
logger.info(f"Generating boilerplate pages for new site {site.id}...")
try:
generate_site_pages(site, template, page_repo, template_service)
logger.info(f"Successfully created about, contact, privacy pages for site {site.id}")
except Exception as e:
logger.warning(f"Failed to generate pages for site {site.id}: {e}")
# Don't fail site creation if page generation fails
return site
```
### Backfill Script Usage
```bash
# One-time backfill for all existing sites (dry-run first)
uv run python scripts/backfill_site_pages.py \
--username admin \
--password yourpass \
--template basic \
--dry-run
# Output:
# Found 423 sites without boilerplate pages
# [DRY RUN] Would generate pages for site 1 (www.example.com)
# [DRY RUN] Would generate pages for site 2 (site123.b-cdn.net)
# ...
# [DRY RUN] Total: 423 sites would be updated
# Actually generate pages
uv run python scripts/backfill_site_pages.py \
--username admin \
--password yourpass \
--template basic
# Output:
# Generating pages for site 1/423 (www.example.com)... ✓
# Generating pages for site 2/423 (site123.b-cdn.net)... ✓
# ...
# Complete: 423 successful, 0 failed, 0 skipped
# Use different template per site (default: basic)
uv run python scripts/backfill_site_pages.py \
--username admin \
--password yourpass \
--template modern \
--batch-size 50 # Process 50 sites at a time
```
### Page URL Structure
```
Homepage: https://example.com/index.html
About: https://example.com/about.html
Contact: https://example.com/contact.html
Privacy: https://example.com/privacy.html
Article 1: https://example.com/how-to-fix-engines.html
Article 2: https://example.com/engine-maintenance-tips.html
```
### Template Application Example
```python
# For articles (existing)
template_service.apply_template(
content=article.content,
template_name="modern",
title=article.title,
meta_description=article.meta_description,
url=article_url
)
# For pages (new)
template_service.apply_template_to_page(
content=page_content, # Markdown or HTML from page_templates.py
template_name="modern",
page_title="About Us", # Static title
domain=site.custom_hostname or site.pull_zone_bcdn_hostname
)
```
### Backfill Script Implementation
```python
# scripts/backfill_site_pages.py
def backfill_site_pages(
page_repo,
site_repo,
template_service,
template: str = "basic",
dry_run: bool = False,
batch_size: int = 100
):
"""Generate boilerplate pages for all sites that don't have them"""
# Get all sites
all_sites = site_repo.get_all()
logger.info(f"Found {len(all_sites)} total sites in database")
# Filter to sites without pages
sites_needing_pages = []
for site in all_sites:
existing_pages = page_repo.get_by_site(site.id)
if len(existing_pages) < 3: # Should have about, contact, privacy
sites_needing_pages.append(site)
logger.info(f"Found {len(sites_needing_pages)} sites without boilerplate pages")
if dry_run:
for site in sites_needing_pages:
domain = site.custom_hostname or site.pull_zone_bcdn_hostname
logger.info(f"[DRY RUN] Would generate pages for site {site.id} ({domain})")
logger.info(f"[DRY RUN] Total: {len(sites_needing_pages)} sites would be updated")
return
# Generate pages for each site
successful = 0
failed = 0
for idx, site in enumerate(sites_needing_pages, 1):
domain = site.custom_hostname or site.pull_zone_bcdn_hostname
logger.info(f"Generating pages for site {idx}/{len(sites_needing_pages)} ({domain})...")
try:
generate_site_pages(site, template, page_repo, template_service)
successful += 1
except Exception as e:
logger.error(f"Failed to generate pages for site {site.id}: {e}")
failed += 1
# Progress checkpoint every batch_size sites
if idx % batch_size == 0:
logger.info(f"Progress: {idx}/{len(sites_needing_pages)} sites processed")
logger.info(f"Complete: {successful} successful, {failed} failed")
```
### Domain Extraction
```python
def get_domain_from_site(site_deployment: SiteDeployment) -> str:
"""Extract domain for use in page content (email addresses, etc.)"""
if site_deployment.custom_hostname:
return site_deployment.custom_hostname
else:
return site_deployment.pull_zone_bcdn_hostname
```
### Privacy Policy Legal Note
The privacy policy template should be:
- Generic enough to apply to blog/content sites
- Comprehensive enough to cover common scenarios (cookies, analytics, third-party links)
- NOT legal advice - users should consult a lawyer for specific requirements
- Include standard disclaimers
- Regularly reviewed and updated (document version/date)
Recommended approach: Use a well-tested generic template from a reputable source (e.g., Privacy Policy Generator) and adapt it to fit our template structure.
## Dependencies
- Story 3.1: Site assignment must be complete (need to know which sites are in use)
- Story 3.3: Navigation menu is already in templates (pages fulfill those links)
- Story 2.4: Template service exists and can apply HTML templates
- Story 1.6: SiteDeployment table exists
## Future Considerations
- Story 4.1 will deploy these pages along with articles
- Future: Custom page content per project (override generic templates)
- Future: Homepage generation with dynamic article listing
- Future: Allow users to edit boilerplate page content via CLI or web interface
- Future: Additional pages (terms of service, disclaimer, etc.)
- Future: Page templates with more customization options (site name, tagline, etc.)
## Deferred to Later
- **Homepage (`index.html`) generation** - Could be part of this story or deferred to Epic 4
- If generated here: Simple page listing all articles on the site
- If deferred: Epic 4 deployment could create a basic redirect or placeholder
- **Custom page content per project** - Allow projects to override default templates
- **Multi-language support** - Generate pages in different languages based on project settings
## Total Effort
14 story points (reduced from 20 due to heading-only simplification and no template service changes)
### Effort Breakdown
1. Database Schema (2 points) - site_pages table only
2. Repository Layer (2 points) - SitePageRepository
3. Page Content Templates (1 point) - heading-only
4. Generation Logic (2 points) - reads site.template_name from DB
5. Site Creation Integration (2 points)
6. Template Service Updates (0 points) - no changes needed
7. Backfill Script (2 points)
8. Homepage Generation (deferred)
9. Unit Tests (2 points)
10. Integration Tests (1 point)
**Total: 14 story points**
### Effort Reduction
Original estimate: 20 story points (with full page content)
Simplified (heading-only pages): 15 story points
Savings: 5 story points (no complex content generation needed)
## Notes
- Pages should be visually consistent with articles (same template)
- **Pages have heading only** - just `<h1>` tag, no body content
- Better UX than completely empty (user sees page title when they click nav link)
- User can manually add content later for specific sites if desired
- Pages are generated once per site at creation time
- Future enhancement: Add content generation for privacy policy if legally required
- Future enhancement: CLI command to update page content for specific sites