# Epic 6: Multi-Cloud Storage Support ## Epic Goal To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments. ## Rationale Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will: - **Increase Flexibility**: Support multiple cloud storage providers - **Reduce Vendor Lock-in**: Enable migration between providers - **Leverage Existing Infrastructure**: Use existing S3 buckets and credentials - **Maintain Compatibility**: Existing Bunny.net deployments continue to work unchanged ## Status - **Story 6.1**: 🔄 PLANNING (Storage Provider Abstraction) - **Story 6.2**: 🔄 PLANNING (S3 Client Implementation) - **Story 6.3**: 🔄 PLANNING (Database Schema Updates) - **Story 6.4**: 🔄 PLANNING (URL Generation for S3) - **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support) ## Stories ### Story 6.1: Storage Provider Abstraction Layer **Estimated Effort**: 5 story points **As a developer**, I want a unified storage interface that abstracts provider-specific details, so that the deployment service can work with any storage provider without code changes. **Acceptance Criteria**: * Create a `StorageClient` protocol/interface with common methods: - `upload_file(file_path: str, content: str, content_type: str) -> UploadResult` - `file_exists(file_path: str) -> bool` - `list_files(prefix: str = '') -> List[str]` * Refactor `BunnyStorageClient` to implement the interface * Create a `StorageClientFactory` that returns the appropriate client based on provider type * Update `DeploymentService` to use the factory instead of hardcoding `BunnyStorageClient` * All existing Bunny.net deployments continue to work without changes * Unit tests verify interface compliance **Technical Notes**: * Use Python `Protocol` (typing) or ABC for interface definition * Factory pattern: `create_storage_client(site: SiteDeployment) -> StorageClient` * Maintain backward compatibility: default provider is "bunny" if not specified --- ### Story 6.2: AWS S3 Client Implementation **Estimated Effort**: 8 story points **As a user**, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure. **Acceptance Criteria**: * Create `S3StorageClient` implementing `StorageClient` interface * Use boto3 library for AWS S3 operations * Support standard AWS S3 regions * Authentication via AWS credentials (access key ID, secret access key) * Handle bucket permissions (public read access required) * Upload files with correct content-type headers * Generate public URLs from bucket name and region * Support custom domain mapping (if configured) * Error handling for common S3 errors (403, 404, bucket not found, etc.) * Retry logic with exponential backoff (consistent with BunnyStorageClient) * Unit tests with mocked boto3 calls **Configuration**: * AWS credentials from environment variables: - `AWS_ACCESS_KEY_ID` - `AWS_SECRET_ACCESS_KEY` - `AWS_REGION` (default region, can be overridden per-site) * Per-site configuration stored in database: - `bucket_name`: S3 bucket name - `bucket_region`: AWS region (optional, uses default if not set) - `custom_domain`: Optional custom domain for URL generation **URL Generation**: * Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}` * With custom domain: `https://{custom_domain}/{file_path}` * Support for path-style URLs if needed: `https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}` **Technical Notes**: * boto3 session management (reuse sessions for performance) * Content-type detection (text/html for HTML files) * Public read ACL or bucket policy required for public URLs --- ### Story 6.3: Database Schema Updates for Multi-Cloud **Estimated Effort**: 3 story points **As a developer**, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider. **Acceptance Criteria**: * Add `storage_provider` field to `site_deployments` table: - Type: String(20), Not Null, Default: 'bunny' - Values: 'bunny', 's3', 's3_compatible' - Indexed for query performance * Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'): - `s3_bucket_name`: String(255), Nullable - `s3_bucket_region`: String(50), Nullable - `s3_custom_domain`: String(255), Nullable - `s3_endpoint_url`: String(500), Nullable (for S3-compatible services) * Create migration script to: - Add new fields with appropriate defaults - Set `storage_provider='bunny'` for all existing records - Preserve all existing bunny.net fields * Update `SiteDeployment` model with new fields * Update repository methods to handle new fields * Backward compatibility: existing queries continue to work **Migration Strategy**: * Existing sites default to 'bunny' provider * No data loss or breaking changes * New fields are nullable to support gradual migration --- ### Story 6.4: URL Generation for S3 Providers **Estimated Effort**: 3 story points **As a user**, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs. **Acceptance Criteria**: * Update `generate_public_url()` in `url_generator.py` to handle S3 providers * Support multiple URL formats: - Virtual-hosted style: `https://bucket.s3.region.amazonaws.com/file.html` - Path-style: `https://s3.region.amazonaws.com/bucket/file.html` (if needed) - Custom domain: `https://custom-domain.com/file.html` * URL generation logic based on `storage_provider` field * Maintain existing behavior for Bunny.net (no changes) * Handle S3-compatible services with custom endpoints * Unit tests for all URL generation scenarios **Technical Notes**: * Virtual-hosted style is default for AWS S3 * Custom domain takes precedence if configured * S3-compatible services may need path-style URLs depending on endpoint --- ### Story 6.5: S3-Compatible Services Support **Estimated Effort**: 5 story points **As a user**, I want to deploy to S3-compatible services (DigitalOcean Spaces, Backblaze B2, Linode Object Storage), so that I can use cost-effective alternatives to AWS. **Acceptance Criteria**: * Extend `S3StorageClient` to support S3-compatible endpoints * Support provider-specific configurations: - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`) - **Backblaze B2**: Custom endpoint and authentication - **Linode Object Storage**: Custom endpoint * Store `s3_endpoint_url` per site for custom endpoints * Handle provider-specific authentication differences * Support provider-specific URL generation * Configuration examples in documentation * Unit tests for each supported service **Supported Services** (Initial): * DigitalOcean Spaces * Backblaze B2 * Linode Object Storage * (Others can be added as needed) **Configuration**: * Per-service credentials in `.env` or per-site in database * Endpoint URLs stored per-site in `s3_endpoint_url` field * Provider type stored in `storage_provider` ('s3_compatible') **Technical Notes**: * Most S3-compatible services work with boto3 using custom endpoints * Some may require minor authentication adjustments * URL generation may differ (e.g., DigitalOcean uses different domain structure) --- ## Technical Considerations ### Architecture Changes 1. **Interface/Protocol Design**: ```python class StorageClient(Protocol): def upload_file(...) -> UploadResult: ... def file_exists(...) -> bool: ... def list_files(...) -> List[str]: ... ``` 2. **Factory Pattern**: ```python def create_storage_client(site: SiteDeployment) -> StorageClient: if site.storage_provider == 'bunny': return BunnyStorageClient() elif site.storage_provider in ('s3', 's3_compatible'): return S3StorageClient(site) else: raise ValueError(f"Unknown provider: {site.storage_provider}") ``` 3. **Dependency Injection**: - `DeploymentService` receives `StorageClient` from factory - No hardcoded provider dependencies ### Credential Management **Option A: Environment Variables (Recommended for AWS)** - Global AWS credentials in `.env` - Simple, secure, follows AWS best practices - Works well for single-account deployments **Option B: Per-Site Credentials** - Store credentials in database (encrypted) - Required for multi-account or S3-compatible services - More complex but more flexible **Decision Needed**: Which approach for initial implementation? ### URL Generation Strategy **Bunny.net**: Uses CDN hostname (custom or bunny.net domain) **AWS S3**: Uses bucket name + region or custom domain **S3-Compatible**: Uses service-specific endpoint or custom domain All providers should support custom domain mapping for consistent URLs. ### Backward Compatibility - All existing Bunny.net sites continue to work - Default `storage_provider='bunny'` for existing records - No breaking changes to existing APIs - Migration is optional (sites can stay on Bunny.net) ### Testing Strategy - Unit tests with mocked boto3/requests - Integration tests with test S3 buckets (optional) - Backward compatibility tests for Bunny.net - URL generation tests for all providers ## Dependencies - **boto3** library for AWS S3 operations - Existing deployment infrastructure (Epic 4) - Database migration tools ## Open Questions 1. **Credential Storage**: Per-site in DB vs. global env vars? (Recommendation: Start with env vars, add per-site later if needed) 2. **S3-Compatible Priority**: Which services to support first? (Recommendation: DigitalOcean Spaces, then Backblaze B2) 3. **Custom Domains**: How are custom domains configured? Manual setup or automated? (Recommendation: Manual for now, document process) 4. **Bucket Provisioning**: Should we automate S3 bucket creation, or require manual setup? (Recommendation: Manual for now, similar to current Bunny.net approach) 5. **Public Access**: How to ensure buckets are publicly readable? (Recommendation: Document requirements, validate in tests) 6. **Migration Path**: Should we provide tools to migrate existing Bunny.net sites to S3? (Recommendation: Defer to future story) ## Success Metrics - ✅ Deploy content to AWS S3 successfully - ✅ Deploy content to at least one S3-compatible service - ✅ All existing Bunny.net deployments continue working - ✅ URL generation works correctly for all providers - ✅ Zero breaking changes to existing functionality