# Epic 6: Multi-Cloud Storage Support ## Epic Goal To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments. ## Rationale Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will: - **Increase Flexibility**: Support multiple cloud storage providers - **Reduce Vendor Lock-in**: Enable migration between providers - **Leverage Existing Infrastructure**: Use existing S3 buckets and credentials - **Maintain Compatibility**: Existing Bunny.net deployments continue to work unchanged ## Status - **Story 6.1**: 🔄 PLANNING (Storage Provider Abstraction) - **Story 6.2**: 🔄 PLANNING (S3 Client Implementation) - **Story 6.3**: 🔄 PLANNING (Database Schema Updates) - **Story 6.4**: 🔄 PLANNING (URL Generation for S3) - **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support) - **Story 6.6**: 🔄 PLANNING (Bucket Provisioning Script) ## Stories ### Story 6.1: Storage Provider Abstraction Layer **Estimated Effort**: 3 story points **As a developer**, I want a simple way to support multiple storage providers without cluttering `DeploymentService` with if/elif chains, so that adding new providers (eventually 8+) is straightforward. **Acceptance Criteria**: * Create a simple factory function `create_storage_client(site: SiteDeployment)` that returns the appropriate client: - `'bunny'` → `BunnyStorageClient()` - `'s3'` → `S3StorageClient()` - `'s3_compatible'` → `S3StorageClient()` (with custom endpoint) - Future providers added here * Refactor `BunnyStorageClient.upload_file()` to accept `site: SiteDeployment` parameter: - Change from: `upload_file(zone_name, zone_password, zone_region, file_path, content)` - Change to: `upload_file(site: SiteDeployment, file_path: str, content: str)` - Client extracts bunny-specific fields from `site` internally * Update `DeploymentService` to use factory and unified interface: - Remove hardcoded `BunnyStorageClient` from `__init__` - In `deploy_article()` and `deploy_boilerplate_page()`: create client per site - Call: `client.upload_file(site, file_path, content)` (same signature for all providers) * Optional: Add `StorageClient` Protocol for type hints (helps with 8+ providers) * All existing Bunny.net deployments continue to work without changes * Unit tests verify factory returns correct clients **Technical Notes**: * Factory function is simple if/elif chain (one place to maintain) * All clients use same method signature: `upload_file(site, file_path, content)` * Each client extracts provider-specific fields from `site` object internally * Protocol is optional but recommended for type safety with many providers * Factory pattern keeps `DeploymentService` clean (no provider-specific logic) * Backward compatibility: default provider is "bunny" if not specified --- ### Story 6.2: AWS S3 Client Implementation **Estimated Effort**: 8 story points **As a user**, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure. **Acceptance Criteria**: * Create `S3StorageClient` implementing `StorageClient` interface * Use boto3 library for AWS S3 operations * Support standard AWS S3 regions * Authentication via AWS credentials from environment variables * Automatically configure bucket for public READ access only (not write): - Apply public-read ACL or bucket policy on first upload - Ensure bucket allows public read access (disable block public access settings) - Verify public read access is enabled before deployment - **Security**: Never enable public write access - only read permissions * Upload files with correct content-type headers * Generate public URLs from bucket name and region * Support custom domain mapping (if configured) * Error handling for common S3 errors (403, 404, bucket not found, etc.) * Retry logic with exponential backoff (consistent with BunnyStorageClient) * Unit tests with mocked boto3 calls **Configuration**: * AWS credentials from environment variables (global): - `AWS_ACCESS_KEY_ID` - `AWS_SECRET_ACCESS_KEY` - `AWS_REGION` (default region, can be overridden per-site) * Per-site configuration stored in database: - `s3_bucket_name`: S3 bucket name - `s3_bucket_region`: AWS region (optional, uses default if not set) - `s3_custom_domain`: Optional custom domain for URL generation (manual setup) **URL Generation**: * Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}` * With custom domain: `https://{custom_domain}/{file_path}` * Support for path-style URLs if needed: `https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}` **Technical Notes**: * boto3 session management (reuse sessions for performance) * Content-type detection (text/html for HTML files) * Automatic public read access configuration (read-only, never write): - Check and configure bucket policy for public read access only - Disable "Block Public Access" settings for read access - Apply public-read ACL to uploaded objects (not public-write) - Validate public read access before deployment - **Security**: Uploads require authenticated credentials, only reads are public --- ### Story 6.3: Database Schema Updates for Multi-Cloud **Estimated Effort**: 3 story points **As a developer**, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider. **Acceptance Criteria**: * Add `storage_provider` field to `site_deployments` table: - Type: String(20), Not Null, Default: 'bunny' - Values: 'bunny', 's3', 's3_compatible' - Indexed for query performance * Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'): - `s3_bucket_name`: String(255), Nullable - `s3_bucket_region`: String(50), Nullable - `s3_custom_domain`: String(255), Nullable - `s3_endpoint_url`: String(500), Nullable (for S3-compatible services) * Create migration script to: - Add new fields with appropriate defaults - Set `storage_provider='bunny'` for all existing records - Preserve all existing bunny.net fields * Update `SiteDeployment` model with new fields * Update repository methods to handle new fields * Backward compatibility: existing queries continue to work **Migration Strategy**: * Existing sites default to 'bunny' provider * No data loss or breaking changes * New fields are nullable to support gradual migration --- ### Story 6.4: URL Generation for S3 Providers **Estimated Effort**: 3 story points **As a user**, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs. **Acceptance Criteria**: * Update `generate_public_url()` in `url_generator.py` to handle S3 providers * Support multiple URL formats: - Virtual-hosted style: `https://bucket.s3.region.amazonaws.com/file.html` - Path-style: `https://s3.region.amazonaws.com/bucket/file.html` (if needed) - Custom domain: `https://custom-domain.com/file.html` * URL generation logic based on `storage_provider` field * Maintain existing behavior for Bunny.net (no changes) * Handle S3-compatible services with custom endpoints * Unit tests for all URL generation scenarios **Technical Notes**: * Virtual-hosted style is default for AWS S3 * Custom domain takes precedence if configured * S3-compatible services may need path-style URLs depending on endpoint --- ### Story 6.5: S3-Compatible Services Support **Estimated Effort**: 5 story points **As a user**, I want to deploy to S3-compatible services (Linode Object Storage, DreamHost Object Storage, DigitalOcean Spaces), so that I can use S3-compatible storage providers the same way I use Bunny.net. **Acceptance Criteria**: * Extend `S3StorageClient` to support S3-compatible endpoints * Support provider-specific configurations: - **Linode Object Storage**: Custom endpoint - **DreamHost Object Storage**: Custom endpoint - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`) * Store `s3_endpoint_url` per site for custom endpoints * Handle provider-specific authentication differences * Support provider-specific URL generation * Configuration examples in documentation * Unit tests for each supported service **Supported Services**: * AWS S3 (standard) * Linode Object Storage * DreamHost Object Storage * DigitalOcean Spaces * Backblaze * Cloudflare * (Other S3-compatible services can be added as needed) **Configuration**: * Per-service credentials in `.env` (global environment variables): - `LINODE_ACCESS_KEY` / `LINODE_SECRET_KEY` (for Linode) - `DREAMHOST_ACCESS_KEY` / `DREAMHOST_SECRET_KEY` (for DreamHost) - `DO_SPACES_ACCESS_KEY` / `DO_SPACES_SECRET_KEY` (for DigitalOcean) * Endpoint URLs stored per-site in `s3_endpoint_url` field * Provider type stored in `storage_provider` ('s3_compatible') * Automatic public access configuration (same as AWS S3) **Technical Notes**: * Most S3-compatible services work with boto3 using custom endpoints * Some may require minor authentication adjustments * URL generation may differ (e.g., DigitalOcean uses different domain structure) --- ### Story 6.6: S3 Bucket Provisioning Script **Estimated Effort**: 3 story points **As a user**, I want a script to automatically create and configure S3 buckets with proper public access settings, so that I can quickly set up new storage targets without manual AWS console work. **Acceptance Criteria**: * Create CLI command: `provision-s3-bucket --name --region [--provider ]` * Automatically create bucket if it doesn't exist * Configure bucket for public read access only (not write): - Apply bucket policy allowing public read (GET requests only) - Disable "Block Public Access" settings for read access - Set appropriate CORS headers if needed - **Security**: Never enable public write access - uploads require authentication * Support multiple providers: - AWS S3 (standard regions) - Linode Object Storage - DreamHost Object Storage - DigitalOcean Spaces * Validate bucket configuration after creation * Option to link bucket to existing site deployment * Clear error messages for common issues (bucket name conflicts, permissions, etc.) * Documentation with examples for each provider **Usage Examples**: ```bash # Create AWS S3 bucket provision-s3-bucket --name my-site-bucket --region us-east-1 # Create Linode bucket provision-s3-bucket --name my-site-bucket --region us-east-1 --provider linode # Create and link to site provision-s3-bucket --name my-site-bucket --region us-east-1 --site-id 5 ``` **Technical Notes**: * Uses boto3 for all providers (with custom endpoints for S3-compatible) * Bucket naming validation (AWS rules apply) * Idempotent: safe to run multiple times * Optional: Can be integrated into `provision-site` command later --- ## Technical Considerations ### Architecture Changes 1. **Unified Method Signature**: ```python # All storage clients use the same signature class BunnyStorageClient: def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: # Extract bunny-specific fields from site zone_name = site.storage_zone_name zone_password = site.storage_zone_password # ... do upload class S3StorageClient: def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: # Extract S3-specific fields from site bucket_name = site.s3_bucket_name # ... do upload ``` 2. **Simple Factory Function**: ```python def create_storage_client(site: SiteDeployment): """Create appropriate storage client based on site provider""" if site.storage_provider == 'bunny': return BunnyStorageClient() elif site.storage_provider == 's3': return S3StorageClient() elif site.storage_provider == 's3_compatible': return S3StorageClient() # Same client, uses site.s3_endpoint_url # Future: elif site.storage_provider == 'cloudflare': ... else: raise ValueError(f"Unknown provider: {site.storage_provider}") ``` 3. **Clean DeploymentService**: ```python # In deploy_article(): client = create_storage_client(site) # One line, works for all providers client.upload_file(site, file_path, content) # Same call for all ``` 4. **Optional Protocol** (recommended for type safety with 8+ providers): ```python from typing import Protocol class StorageClient(Protocol): def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: ... ``` ### Credential Management **Decision: Global Environment Variables** - All credentials stored in `.env` file (global) - AWS: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION` - Linode: `LINODE_ACCESS_KEY`, `LINODE_SECRET_KEY` - DreamHost: `DREAMHOST_ACCESS_KEY`, `DREAMHOST_SECRET_KEY` - DigitalOcean: `DO_SPACES_ACCESS_KEY`, `DO_SPACES_SECRET_KEY` - Simple, secure, follows cloud provider best practices - Works well for single-account deployments - Per-site credentials can be added later if needed for multi-account scenarios ### URL Generation Strategy **Bunny.net**: Uses CDN hostname (custom or bunny.net domain) **AWS S3**: Uses bucket name + region or custom domain (manual setup) **S3-Compatible**: Uses service-specific endpoint or custom domain (manual setup) Custom domain mapping is supported but requires manual configuration (documented, not automated). ### Backward Compatibility - All existing Bunny.net sites continue to work - Default `storage_provider='bunny'` for existing records - No breaking changes to existing APIs - No migration tools provided (sites can stay on Bunny.net or be manually reconfigured) ### Testing Strategy - Unit tests with mocked boto3/requests - Integration tests with test S3 buckets (optional) - Backward compatibility tests for Bunny.net - URL generation tests for all providers ## Dependencies - **boto3** library for AWS S3 operations - Existing deployment infrastructure (Epic 4) - Database migration tools ## Decisions Made 1. **Credential Storage**: ✅ Global environment variables (Option A) - All credentials in `.env` file - Simple, secure, follows cloud provider best practices 2. **S3-Compatible Services**: ✅ Support Linode, DreamHost, and DigitalOcean - All services supported equally - no priority/decision logic in this epic - Provider selection happens elsewhere in the codebase - This epic just enables S3-compatible services to work the same as Bunny.net 3. **Custom Domains**: ✅ Manual setup (deferred automation) - Custom domains require manual configuration - Documented process, no automation in this epic 4. **Bucket Provisioning**: ✅ Manual with optional script (Story 6.6) - Primary: Manual bucket creation - Optional: `provision-s3-bucket` CLI script for automated setup 5. **Public Access**: ✅ Automatic configuration (read-only) - System automatically configures buckets for public READ access only - Applies bucket policies for read access, disables block public access, sets public-read ACLs - **Security**: Never enables public write access - all uploads require authenticated credentials 6. **Migration Path**: ✅ No migration tools - No automated migration from Bunny.net to S3 - Sites can be manually reconfigured if needed ## Success Metrics - ✅ Deploy content to AWS S3 successfully - ✅ Deploy content to S3-compatible services (Linode, DreamHost, DigitalOcean) successfully - ✅ All existing Bunny.net deployments continue working - ✅ URL generation works correctly for all providers - ✅ Buckets automatically configured for public read access (not write) - ✅ Zero breaking changes to existing functionality - ✅ Bucket provisioning script works for all supported providers