Big-Link-Man/docs/prd/epic-6-multi-cloud-storage.md

# Epic 6: Multi-Cloud Storage Support

## Epic Goal
To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments.

## Rationale
Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will:

- **Increase Flexibility**: Support multiple cloud storage providers
- **Reduce Vendor Lock-in**: Enable migration between providers
- **Leverage Existing Infrastructure**: Use existing S3 buckets and credentials
- **Maintain Compatibility**: Existing Bunny.net deployments continue to work unchanged

## Status
- **Story 6.1**: 🔄 PLANNING (Storage Provider Abstraction)
- **Story 6.2**: 🔄 PLANNING (S3 Client Implementation)
- **Story 6.3**: 🔄 PLANNING (Database Schema Updates)
- **Story 6.4**: 🔄 PLANNING (URL Generation for S3)
- **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support)
- **Story 6.6**: 🔄 PLANNING (Bucket Provisioning Script)

## Stories

### Story 6.1: Storage Provider Abstraction Layer
**Estimated Effort**: 3 story points

**As a developer**, I want a simple way to support multiple storage providers without cluttering `DeploymentService` with if/elif chains, so that adding new providers (eventually 8+) is straightforward.

**Acceptance Criteria**:
* Create a simple factory function `create_storage_client(site: SiteDeployment)` that returns the appropriate client:
  - `'bunny'` → `BunnyStorageClient()`
  - `'s3'` → `S3StorageClient()`
  - `'s3_compatible'` → `S3StorageClient()` (with custom endpoint)
  - Future providers added here
* Refactor `BunnyStorageClient.upload_file()` to accept `site: SiteDeployment` parameter:
  - Change from: `upload_file(zone_name, zone_password, zone_region, file_path, content)`
  - Change to: `upload_file(site: SiteDeployment, file_path: str, content: str)`
  - Client extracts bunny-specific fields from `site` internally
* Update `DeploymentService` to use factory and unified interface:
  - Remove hardcoded `BunnyStorageClient` from `__init__`
  - In `deploy_article()` and `deploy_boilerplate_page()`: create client per site
  - Call: `client.upload_file(site, file_path, content)` (same signature for all providers)
* Optional: Add `StorageClient` Protocol for type hints (helps with 8+ providers)
* All existing Bunny.net deployments continue to work without changes
* Unit tests verify factory returns correct clients

**Technical Notes**:
* Factory function is simple if/elif chain (one place to maintain)
* All clients use same method signature: `upload_file(site, file_path, content)`
* Each client extracts provider-specific fields from `site` object internally
* Protocol is optional but recommended for type safety with many providers
* Factory pattern keeps `DeploymentService` clean (no provider-specific logic)
* Backward compatibility: default provider is "bunny" if not specified

---

### Story 6.2: AWS S3 Client Implementation
**Estimated Effort**: 8 story points

**As a user**, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure.

**Acceptance Criteria**:
* Create `S3StorageClient` implementing `StorageClient` interface
* Use boto3 library for AWS S3 operations
* Support standard AWS S3 regions
* Authentication via AWS credentials from environment variables
* Automatically configure bucket for public READ access only (not write):
  - Apply public-read ACL or bucket policy on first upload
  - Ensure bucket allows public read access (disable block public access settings)
  - Verify public read access is enabled before deployment
  - **Security**: Never enable public write access - only read permissions
* Upload files with correct content-type headers
* Generate public URLs from bucket name and region
* Support custom domain mapping (if configured)
* Error handling for common S3 errors (403, 404, bucket not found, etc.)
* Retry logic with exponential backoff (consistent with BunnyStorageClient)
* Unit tests with mocked boto3 calls

**Configuration**:
* AWS credentials from environment variables (global):
  - `AWS_ACCESS_KEY_ID`
  - `AWS_SECRET_ACCESS_KEY`
  - `AWS_REGION` (default region, can be overridden per-site)
* Per-site configuration stored in database:
  - `s3_bucket_name`: S3 bucket name
  - `s3_bucket_region`: AWS region (optional, uses default if not set)
  - `s3_custom_domain`: Optional custom domain for URL generation (manual setup)

**URL Generation**:
* Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}`
* With custom domain: `https://{custom_domain}/{file_path}`
* Support for path-style URLs if needed: `https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}`

**Technical Notes**:
* boto3 session management (reuse sessions for performance)
* Content-type detection (text/html for HTML files)
* Automatic public read access configuration (read-only, never write):
  - Check and configure bucket policy for public read access only
  - Disable "Block Public Access" settings for read access
  - Apply public-read ACL to uploaded objects (not public-write)
  - Validate public read access before deployment
  - **Security**: Uploads require authenticated credentials, only reads are public

---

### Story 6.3: Database Schema Updates for Multi-Cloud
**Estimated Effort**: 3 story points

**As a developer**, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider.

**Acceptance Criteria**:
* Add `storage_provider` field to `site_deployments` table:
  - Type: String(20), Not Null, Default: 'bunny'
  - Values: 'bunny', 's3', 's3_compatible'
  - Indexed for query performance
* Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'):
  - `s3_bucket_name`: String(255), Nullable
  - `s3_bucket_region`: String(50), Nullable
  - `s3_custom_domain`: String(255), Nullable
  - `s3_endpoint_url`: String(500), Nullable (for S3-compatible services)
* Create migration script to:
  - Add new fields with appropriate defaults
  - Set `storage_provider='bunny'` for all existing records
  - Preserve all existing bunny.net fields
* Update `SiteDeployment` model with new fields
* Update repository methods to handle new fields
* Backward compatibility: existing queries continue to work

**Migration Strategy**:
* Existing sites default to 'bunny' provider
* No data loss or breaking changes
* New fields are nullable to support gradual migration

---

### Story 6.4: URL Generation for S3 Providers
**Estimated Effort**: 3 story points

**As a user**, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs.

**Acceptance Criteria**:
* Update `generate_public_url()` in `url_generator.py` to handle S3 providers
* Support multiple URL formats:
  - Virtual-hosted style: `https://bucket.s3.region.amazonaws.com/file.html`
  - Path-style: `https://s3.region.amazonaws.com/bucket/file.html` (if needed)
  - Custom domain: `https://custom-domain.com/file.html`
* URL generation logic based on `storage_provider` field
* Maintain existing behavior for Bunny.net (no changes)
* Handle S3-compatible services with custom endpoints
* Unit tests for all URL generation scenarios

**Technical Notes**:
* Virtual-hosted style is default for AWS S3
* Custom domain takes precedence if configured
* S3-compatible services may need path-style URLs depending on endpoint

---

### Story 6.5: S3-Compatible Services Support
**Estimated Effort**: 5 story points

**As a user**, I want to deploy to S3-compatible services (Linode Object Storage, DreamHost Object Storage, DigitalOcean Spaces), so that I can use S3-compatible storage providers the same way I use Bunny.net.

**Acceptance Criteria**:
* Extend `S3StorageClient` to support S3-compatible endpoints
* Support provider-specific configurations:
  - **Linode Object Storage**: Custom endpoint
  - **DreamHost Object Storage**: Custom endpoint
  - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`)
* Store `s3_endpoint_url` per site for custom endpoints
* Handle provider-specific authentication differences
* Support provider-specific URL generation
* Configuration examples in documentation
* Unit tests for each supported service

**Supported Services**:
* AWS S3 (standard)
* Linode Object Storage
* DreamHost Object Storage
* DigitalOcean Spaces
* Backblaze
* Cloudflare
* (Other S3-compatible services can be added as needed)

**Configuration**:
* Per-service credentials in `.env` (global environment variables):
  - `LINODE_ACCESS_KEY` / `LINODE_SECRET_KEY` (for Linode)
  - `DREAMHOST_ACCESS_KEY` / `DREAMHOST_SECRET_KEY` (for DreamHost)
  - `DO_SPACES_ACCESS_KEY` / `DO_SPACES_SECRET_KEY` (for DigitalOcean)
* Endpoint URLs stored per-site in `s3_endpoint_url` field
* Provider type stored in `storage_provider` ('s3_compatible')
* Automatic public access configuration (same as AWS S3)

**Technical Notes**:
* Most S3-compatible services work with boto3 using custom endpoints
* Some may require minor authentication adjustments
* URL generation may differ (e.g., DigitalOcean uses different domain structure)

---

### Story 6.6: S3 Bucket Provisioning Script
**Estimated Effort**: 3 story points

**As a user**, I want a script to automatically create and configure S3 buckets with proper public access settings, so that I can quickly set up new storage targets without manual AWS console work.

**Acceptance Criteria**:
* Create CLI command: `provision-s3-bucket --name <bucket> --region <region> [--provider <s3|linode|dreamhost|do>]`
* Automatically create bucket if it doesn't exist
* Configure bucket for public read access only (not write):
  - Apply bucket policy allowing public read (GET requests only)
  - Disable "Block Public Access" settings for read access
  - Set appropriate CORS headers if needed
  - **Security**: Never enable public write access - uploads require authentication
* Support multiple providers:
  - AWS S3 (standard regions)
  - Linode Object Storage
  - DreamHost Object Storage
  - DigitalOcean Spaces
* Validate bucket configuration after creation
* Option to link bucket to existing site deployment
* Clear error messages for common issues (bucket name conflicts, permissions, etc.)
* Documentation with examples for each provider

**Usage Examples**:
```bash
# Create AWS S3 bucket
provision-s3-bucket --name my-site-bucket --region us-east-1

# Create Linode bucket
provision-s3-bucket --name my-site-bucket --region us-east-1 --provider linode

# Create and link to site
provision-s3-bucket --name my-site-bucket --region us-east-1 --site-id 5
```

**Technical Notes**:
* Uses boto3 for all providers (with custom endpoints for S3-compatible)
* Bucket naming validation (AWS rules apply)
* Idempotent: safe to run multiple times
* Optional: Can be integrated into `provision-site` command later

---

## Technical Considerations

### Architecture Changes

1. **Unified Method Signature**:
   ```python
   # All storage clients use the same signature
   class BunnyStorageClient:
       def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
           # Extract bunny-specific fields from site
           zone_name = site.storage_zone_name
           zone_password = site.storage_zone_password
           # ... do upload

   class S3StorageClient:
       def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
           # Extract S3-specific fields from site
           bucket_name = site.s3_bucket_name
           # ... do upload
   ```

2. **Simple Factory Function**:
   ```python
   def create_storage_client(site: SiteDeployment):
       """Create appropriate storage client based on site provider"""
       if site.storage_provider == 'bunny':
           return BunnyStorageClient()
       elif site.storage_provider == 's3':
           return S3StorageClient()
       elif site.storage_provider == 's3_compatible':
           return S3StorageClient()  # Same client, uses site.s3_endpoint_url
       # Future: elif site.storage_provider == 'cloudflare': ...
       else:
           raise ValueError(f"Unknown provider: {site.storage_provider}")
   ```

3. **Clean DeploymentService**:
   ```python
   # In deploy_article():
   client = create_storage_client(site)  # One line, works for all providers
   client.upload_file(site, file_path, content)  # Same call for all
   ```

4. **Optional Protocol** (recommended for type safety with 8+ providers):
   ```python
   from typing import Protocol

   class StorageClient(Protocol):
       def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: ...
   ```

### Credential Management

**Decision: Global Environment Variables**
- All credentials stored in `.env` file (global)
- AWS: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
- Linode: `LINODE_ACCESS_KEY`, `LINODE_SECRET_KEY`
- DreamHost: `DREAMHOST_ACCESS_KEY`, `DREAMHOST_SECRET_KEY`
- DigitalOcean: `DO_SPACES_ACCESS_KEY`, `DO_SPACES_SECRET_KEY`
- Simple, secure, follows cloud provider best practices
- Works well for single-account deployments
- Per-site credentials can be added later if needed for multi-account scenarios

### URL Generation Strategy

**Bunny.net**: Uses CDN hostname (custom or bunny.net domain)
**AWS S3**: Uses bucket name + region or custom domain (manual setup)
**S3-Compatible**: Uses service-specific endpoint or custom domain (manual setup)

Custom domain mapping is supported but requires manual configuration (documented, not automated).

### Backward Compatibility

- All existing Bunny.net sites continue to work
- Default `storage_provider='bunny'` for existing records
- No breaking changes to existing APIs
- No migration tools provided (sites can stay on Bunny.net or be manually reconfigured)

### Testing Strategy

- Unit tests with mocked boto3/requests
- Integration tests with test S3 buckets (optional)
- Backward compatibility tests for Bunny.net
- URL generation tests for all providers

## Dependencies

- **boto3** library for AWS S3 operations
- Existing deployment infrastructure (Epic 4)
- Database migration tools

## Decisions Made

1. **Credential Storage**: ✅ Global environment variables (Option A)
   - All credentials in `.env` file
   - Simple, secure, follows cloud provider best practices

2. **S3-Compatible Services**: ✅ Support Linode, DreamHost, and DigitalOcean
   - All services supported equally - no priority/decision logic in this epic
   - Provider selection happens elsewhere in the codebase
   - This epic just enables S3-compatible services to work the same as Bunny.net

3. **Custom Domains**: ✅ Manual setup (deferred automation)
   - Custom domains require manual configuration
   - Documented process, no automation in this epic

4. **Bucket Provisioning**: ✅ Manual with optional script (Story 6.6)
   - Primary: Manual bucket creation
   - Optional: `provision-s3-bucket` CLI script for automated setup

5. **Public Access**: ✅ Automatic configuration (read-only)
   - System automatically configures buckets for public READ access only
   - Applies bucket policies for read access, disables block public access, sets public-read ACLs
   - **Security**: Never enables public write access - all uploads require authenticated credentials

6. **Migration Path**: ✅ No migration tools
   - No automated migration from Bunny.net to S3
   - Sites can be manually reconfigured if needed

## Success Metrics

- ✅ Deploy content to AWS S3 successfully
- ✅ Deploy content to S3-compatible services (Linode, DreamHost, DigitalOcean) successfully
- ✅ All existing Bunny.net deployments continue working
- ✅ URL generation works correctly for all providers
- ✅ Buckets automatically configured for public read access (not write)
- ✅ Zero breaking changes to existing functionality
- ✅ Bucket provisioning script works for all supported providers