Big-Link-Man/docs/prd/epic-4-deployment.md

# Epic 4: Cloud Deployment & Handoff

## Epic Goal
To deploy all finalized HTML content (articles and boilerplate pages) for a batch to the correct cloud storage targets, purge the CDN cache, and verify the successful deployment.

## Status
- **Story 4.1**: ✅ COMPLETE (22 story points, real-world validated)
- **Story 4.2**: ✅ COMPLETE (implemented in Story 4.1)
- **Story 4.3**: ✅ COMPLETE (implemented in Story 4.1)
- **Story 4.4**: Not started
- **Story 4.5**: Not started

## Stories

### Story 4.1: Deploy Content to Cloud Storage
**Status:** ✅ COMPLETE (22 story points, real-world validated)
**Document:** [story-4.1-deploy-content-to-cloud.md](../stories/story-4.1-deploy-content-to-cloud.md)

**As a developer**, I want to upload all generated HTML files for a batch to their designated cloud storage buckets so that the content is hosted and ready to be served.

**Acceptance Criteria**
* CLI command (`deploy-batch --batch_id <id>`) and auto-deploy after batch generation
* Bunny.net Storage API integration (multi-cloud is technical debt)
* Uploads articles and boilerplate pages (about, contact, privacy) if they exist
* Authentication via `BUNNY_API_KEY` from `.env` and `storage_zone_password` from database
* Continue on error, report detailed summary (successful, failed, total time)
* **Includes Story 4.2 functionality**: Log URLs to tier-segregated daily text files
* **Includes Story 4.3 functionality**: Update database status to 'deployed', store public URLs

**Implementation Notes**
* Auto-deploy is ON by default
* Duplicate URL prevention in text files (critical for manual re-runs)
* All API keys from `.env` only (not master.config.json)
* Storage API authentication details TBD during implementation

### Story 4.2: Log Deployed URLs to Tiered Text Files
**Status:** ✅ COMPLETE (implemented in Story 4.1)

**As a developer**, I want to save the URLs of all deployed articles into daily, tier-segregated text files, so that I have a clean list for indexing services and other external tools.

**Acceptance Criteria** (Implemented in Story 4.1)
* After an article is successfully deployed, its public URL is logged to a text file.
* A `deployment_logs/` folder will be used to store the output files.
* Two separate files are created for each day's deployments, using a `YYYY-MM-DD` timestamp.
    * `deployment_logs/YYYY-MM-DD_tier1_urls.txt`
    * `deployment_logs/YYYY-MM-DD_other_tiers_urls.txt`
* URLs for Tier 1 articles are appended to the `_tier1_urls.txt` file.
* URLs for all other tiers (T2, T3, etc.) are appended to the `_other_tiers_urls.txt` file.
* The script automatically creates new files when the date changes.
* URLs for boilerplate pages (about, contact, privacy) are explicitly excluded from these files.
* **Duplicate prevention**: Check file before appending to avoid duplicate URLs

### Story 4.3: Update Deployment Status
**Status:** ✅ COMPLETE (implemented in Story 4.1)

**As a developer**, I want to update the status of each article and site in the database to 'deployed' and record the final public URL, so that the system has an accurate record of what content is live.

**Acceptance Criteria** (Implemented in Story 4.1)
* Upon successful upload of an article, its status in the `generated_content` table is updated to 'deployed'.
* The final, verified public URL for the article is stored in `deployed_url` field.
* New fields added: `deployed_url` and `deployed_at` to `generated_content` table.
* Database updates are transactional to ensure data integrity.

**Note:** The `last_deployed_at` timestamp for `site_deployments` could be added as enhancement if needed.

### Story 4.4: Post-Deployment Verification
**Status:** Not Started

**As a user**, I want a simple way to verify that a batch of articles has been deployed successfully, by checking a sample of URLs for a `200 OK` status, so I can have confidence the deployment worked.

**Acceptance Criteria:**
* A post-deployment script or CLI command is available (e.g., `verify-deployment --batch_id <id>`).
* The script takes a batch ID as input.
* It retrieves the URLs for all articles in that batch from database.
* It makes an HTTP GET request to a random sample of (or all based on command flags) URLs.
* It reports which URLs return a `200 OK` status and which do not.
* The output is clear and easy to read (e.g., a list of successful and failed URLs).
* Can be run manually after deployment or integrated into auto-deploy workflow.

### Story 4.5: Create URL and Link Reporting Script
**Status:** Not Started

**As a user**, I want a script to generate custom lists of URLs based on project and tier, with optional link details, so that I can easily export data for analysis or external tools.

**Acceptance Criteria:**
* A new CLI script is created (e.g., `scripts/get_links.py` or CLI command `get-links`).
* The script accepts a mandatory `project_id`.
* The script accepts a `tier` specifier that supports:
    * A single tier (e.g., `--tier 1`).
    * An open-ended range for a tier and above (e.g., `--tier 2+`).
* An optional flag (`--with-anchor-text`) includes the anchor text used for each link in the output.
* An optional flag (`--with-destination-url`) includes the destination URL of the tiered link placed on that page.
* The script queries the database to retrieve the required link and content information.
* The output is a well-formatted list (e.g., CSV or plain text) printed to the console.

## Technical Debt
- Multi-cloud support (AWS S3, Azure, DigitalOcean, etc.) - deferred from Story 4.1
- CDN cache purging after deployment
- Boilerplate page storage optimization (regenerate on-the-fly vs storing HTML)
- Homepage (`index.html`) generation for sites

## Notes
- Story 4.1 is the primary deployment story and includes core functionality from Stories 4.2 and 4.3
- Auto-deploy is enabled by default to streamline workflow
- All cloud provider credentials come from `.env` file only
- Story 4.4 and 4.5 are independent utilities that can be implemented as needed