Big-Link-Man/docs/prd/epic-6-multi-cloud-storage.md

11 KiB

Epic 6: Multi-Cloud Storage Support

Epic Goal

To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments.

Rationale

Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will:

  • Increase Flexibility: Support multiple cloud storage providers
  • Reduce Vendor Lock-in: Enable migration between providers
  • Leverage Existing Infrastructure: Use existing S3 buckets and credentials
  • Maintain Compatibility: Existing Bunny.net deployments continue to work unchanged

Status

  • Story 6.1: 🔄 PLANNING (Storage Provider Abstraction)
  • Story 6.2: 🔄 PLANNING (S3 Client Implementation)
  • Story 6.3: 🔄 PLANNING (Database Schema Updates)
  • Story 6.4: 🔄 PLANNING (URL Generation for S3)
  • Story 6.5: 🔄 PLANNING (S3-Compatible Services Support)

Stories

Story 6.1: Storage Provider Abstraction Layer

Estimated Effort: 5 story points

As a developer, I want a unified storage interface that abstracts provider-specific details, so that the deployment service can work with any storage provider without code changes.

Acceptance Criteria:

  • Create a StorageClient protocol/interface with common methods:
    • upload_file(file_path: str, content: str, content_type: str) -> UploadResult
    • file_exists(file_path: str) -> bool
    • list_files(prefix: str = '') -> List[str]
  • Refactor BunnyStorageClient to implement the interface
  • Create a StorageClientFactory that returns the appropriate client based on provider type
  • Update DeploymentService to use the factory instead of hardcoding BunnyStorageClient
  • All existing Bunny.net deployments continue to work without changes
  • Unit tests verify interface compliance

Technical Notes:

  • Use Python Protocol (typing) or ABC for interface definition
  • Factory pattern: create_storage_client(site: SiteDeployment) -> StorageClient
  • Maintain backward compatibility: default provider is "bunny" if not specified

Story 6.2: AWS S3 Client Implementation

Estimated Effort: 8 story points

As a user, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure.

Acceptance Criteria:

  • Create S3StorageClient implementing StorageClient interface
  • Use boto3 library for AWS S3 operations
  • Support standard AWS S3 regions
  • Authentication via AWS credentials (access key ID, secret access key)
  • Handle bucket permissions (public read access required)
  • Upload files with correct content-type headers
  • Generate public URLs from bucket name and region
  • Support custom domain mapping (if configured)
  • Error handling for common S3 errors (403, 404, bucket not found, etc.)
  • Retry logic with exponential backoff (consistent with BunnyStorageClient)
  • Unit tests with mocked boto3 calls

Configuration:

  • AWS credentials from environment variables:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION (default region, can be overridden per-site)
  • Per-site configuration stored in database:
    • bucket_name: S3 bucket name
    • bucket_region: AWS region (optional, uses default if not set)
    • custom_domain: Optional custom domain for URL generation

URL Generation:

  • Default: https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}
  • With custom domain: https://{custom_domain}/{file_path}
  • Support for path-style URLs if needed: https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}

Technical Notes:

  • boto3 session management (reuse sessions for performance)
  • Content-type detection (text/html for HTML files)
  • Public read ACL or bucket policy required for public URLs

Story 6.3: Database Schema Updates for Multi-Cloud

Estimated Effort: 3 story points

As a developer, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider.

Acceptance Criteria:

  • Add storage_provider field to site_deployments table:
    • Type: String(20), Not Null, Default: 'bunny'
    • Values: 'bunny', 's3', 's3_compatible'
    • Indexed for query performance
  • Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'):
    • s3_bucket_name: String(255), Nullable
    • s3_bucket_region: String(50), Nullable
    • s3_custom_domain: String(255), Nullable
    • s3_endpoint_url: String(500), Nullable (for S3-compatible services)
  • Create migration script to:
    • Add new fields with appropriate defaults
    • Set storage_provider='bunny' for all existing records
    • Preserve all existing bunny.net fields
  • Update SiteDeployment model with new fields
  • Update repository methods to handle new fields
  • Backward compatibility: existing queries continue to work

Migration Strategy:

  • Existing sites default to 'bunny' provider
  • No data loss or breaking changes
  • New fields are nullable to support gradual migration

Story 6.4: URL Generation for S3 Providers

Estimated Effort: 3 story points

As a user, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs.

Acceptance Criteria:

  • Update generate_public_url() in url_generator.py to handle S3 providers
  • Support multiple URL formats:
    • Virtual-hosted style: https://bucket.s3.region.amazonaws.com/file.html
    • Path-style: https://s3.region.amazonaws.com/bucket/file.html (if needed)
    • Custom domain: https://custom-domain.com/file.html
  • URL generation logic based on storage_provider field
  • Maintain existing behavior for Bunny.net (no changes)
  • Handle S3-compatible services with custom endpoints
  • Unit tests for all URL generation scenarios

Technical Notes:

  • Virtual-hosted style is default for AWS S3
  • Custom domain takes precedence if configured
  • S3-compatible services may need path-style URLs depending on endpoint

Story 6.5: S3-Compatible Services Support

Estimated Effort: 5 story points

As a user, I want to deploy to S3-compatible services (DigitalOcean Spaces, Backblaze B2, Linode Object Storage), so that I can use cost-effective alternatives to AWS.

Acceptance Criteria:

  • Extend S3StorageClient to support S3-compatible endpoints
  • Support provider-specific configurations:
    • DigitalOcean Spaces: Custom endpoint (e.g., https://nyc3.digitaloceanspaces.com)
    • Backblaze B2: Custom endpoint and authentication
    • Linode Object Storage: Custom endpoint
  • Store s3_endpoint_url per site for custom endpoints
  • Handle provider-specific authentication differences
  • Support provider-specific URL generation
  • Configuration examples in documentation
  • Unit tests for each supported service

Supported Services (Initial):

  • DigitalOcean Spaces
  • Backblaze B2
  • Linode Object Storage
  • (Others can be added as needed)

Configuration:

  • Per-service credentials in .env or per-site in database
  • Endpoint URLs stored per-site in s3_endpoint_url field
  • Provider type stored in storage_provider ('s3_compatible')

Technical Notes:

  • Most S3-compatible services work with boto3 using custom endpoints
  • Some may require minor authentication adjustments
  • URL generation may differ (e.g., DigitalOcean uses different domain structure)

Technical Considerations

Architecture Changes

  1. Interface/Protocol Design:

    class StorageClient(Protocol):
        def upload_file(...) -> UploadResult: ...
        def file_exists(...) -> bool: ...
        def list_files(...) -> List[str]: ...
    
  2. Factory Pattern:

    def create_storage_client(site: SiteDeployment) -> StorageClient:
        if site.storage_provider == 'bunny':
            return BunnyStorageClient()
        elif site.storage_provider in ('s3', 's3_compatible'):
            return S3StorageClient(site)
        else:
            raise ValueError(f"Unknown provider: {site.storage_provider}")
    
  3. Dependency Injection:

    • DeploymentService receives StorageClient from factory
    • No hardcoded provider dependencies

Credential Management

Option A: Environment Variables (Recommended for AWS)

  • Global AWS credentials in .env
  • Simple, secure, follows AWS best practices
  • Works well for single-account deployments

Option B: Per-Site Credentials

  • Store credentials in database (encrypted)
  • Required for multi-account or S3-compatible services
  • More complex but more flexible

Decision Needed: Which approach for initial implementation?

URL Generation Strategy

Bunny.net: Uses CDN hostname (custom or bunny.net domain) AWS S3: Uses bucket name + region or custom domain S3-Compatible: Uses service-specific endpoint or custom domain

All providers should support custom domain mapping for consistent URLs.

Backward Compatibility

  • All existing Bunny.net sites continue to work
  • Default storage_provider='bunny' for existing records
  • No breaking changes to existing APIs
  • Migration is optional (sites can stay on Bunny.net)

Testing Strategy

  • Unit tests with mocked boto3/requests
  • Integration tests with test S3 buckets (optional)
  • Backward compatibility tests for Bunny.net
  • URL generation tests for all providers

Dependencies

  • boto3 library for AWS S3 operations
  • Existing deployment infrastructure (Epic 4)
  • Database migration tools

Open Questions

  1. Credential Storage: Per-site in DB vs. global env vars? (Recommendation: Start with env vars, add per-site later if needed)

  2. S3-Compatible Priority: Which services to support first? (Recommendation: DigitalOcean Spaces, then Backblaze B2)

  3. Custom Domains: How are custom domains configured? Manual setup or automated? (Recommendation: Manual for now, document process)

  4. Bucket Provisioning: Should we automate S3 bucket creation, or require manual setup? (Recommendation: Manual for now, similar to current Bunny.net approach)

  5. Public Access: How to ensure buckets are publicly readable? (Recommendation: Document requirements, validate in tests)

  6. Migration Path: Should we provide tools to migrate existing Bunny.net sites to S3? (Recommendation: Defer to future story)

Success Metrics

  • Deploy content to AWS S3 successfully
  • Deploy content to at least one S3-compatible service
  • All existing Bunny.net deployments continue working
  • URL generation works correctly for all providers
  • Zero breaking changes to existing functionality