Big-Link-Man/docs/prd/epic-6-multi-cloud-storage.md

16 KiB

Epic 6: Multi-Cloud Storage Support

Epic Goal

To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments.

Rationale

Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will:

  • Increase Flexibility: Support multiple cloud storage providers
  • Reduce Vendor Lock-in: Enable migration between providers
  • Leverage Existing Infrastructure: Use existing S3 buckets and credentials
  • Maintain Compatibility: Existing Bunny.net deployments continue to work unchanged

Status

  • Story 6.1: 🔄 PLANNING (Storage Provider Abstraction)
  • Story 6.2: 🔄 PLANNING (S3 Client Implementation)
  • Story 6.3: 🔄 PLANNING (Database Schema Updates)
  • Story 6.4: 🔄 PLANNING (URL Generation for S3)
  • Story 6.5: 🔄 PLANNING (S3-Compatible Services Support)
  • Story 6.6: 🔄 PLANNING (Bucket Provisioning Script)

Stories

Story 6.1: Storage Provider Abstraction Layer

Estimated Effort: 3 story points

As a developer, I want a simple way to support multiple storage providers without cluttering DeploymentService with if/elif chains, so that adding new providers (eventually 8+) is straightforward.

Acceptance Criteria:

  • Create a simple factory function create_storage_client(site: SiteDeployment) that returns the appropriate client:
    • 'bunny'BunnyStorageClient()
    • 's3'S3StorageClient()
    • 's3_compatible'S3StorageClient() (with custom endpoint)
    • Future providers added here
  • Refactor BunnyStorageClient.upload_file() to accept site: SiteDeployment parameter:
    • Change from: upload_file(zone_name, zone_password, zone_region, file_path, content)
    • Change to: upload_file(site: SiteDeployment, file_path: str, content: str)
    • Client extracts bunny-specific fields from site internally
  • Update DeploymentService to use factory and unified interface:
    • Remove hardcoded BunnyStorageClient from __init__
    • In deploy_article() and deploy_boilerplate_page(): create client per site
    • Call: client.upload_file(site, file_path, content) (same signature for all providers)
  • Optional: Add StorageClient Protocol for type hints (helps with 8+ providers)
  • All existing Bunny.net deployments continue to work without changes
  • Unit tests verify factory returns correct clients

Technical Notes:

  • Factory function is simple if/elif chain (one place to maintain)
  • All clients use same method signature: upload_file(site, file_path, content)
  • Each client extracts provider-specific fields from site object internally
  • Protocol is optional but recommended for type safety with many providers
  • Factory pattern keeps DeploymentService clean (no provider-specific logic)
  • Backward compatibility: default provider is "bunny" if not specified

Story 6.2: AWS S3 Client Implementation

Estimated Effort: 8 story points

As a user, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure.

Acceptance Criteria:

  • Create S3StorageClient implementing StorageClient interface
  • Use boto3 library for AWS S3 operations
  • Support standard AWS S3 regions
  • Authentication via AWS credentials from environment variables
  • Automatically configure bucket for public READ access only (not write):
    • Apply public-read ACL or bucket policy on first upload
    • Ensure bucket allows public read access (disable block public access settings)
    • Verify public read access is enabled before deployment
    • Security: Never enable public write access - only read permissions
  • Upload files with correct content-type headers
  • Generate public URLs from bucket name and region
  • Support custom domain mapping (if configured)
  • Error handling for common S3 errors (403, 404, bucket not found, etc.)
  • Retry logic with exponential backoff (consistent with BunnyStorageClient)
  • Unit tests with mocked boto3 calls

Configuration:

  • AWS credentials from environment variables (global):
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION (default region, can be overridden per-site)
  • Per-site configuration stored in database:
    • s3_bucket_name: S3 bucket name
    • s3_bucket_region: AWS region (optional, uses default if not set)
    • s3_custom_domain: Optional custom domain for URL generation (manual setup)

URL Generation:

  • Default: https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}
  • With custom domain: https://{custom_domain}/{file_path}
  • Support for path-style URLs if needed: https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}

Technical Notes:

  • boto3 session management (reuse sessions for performance)
  • Content-type detection (text/html for HTML files)
  • Automatic public read access configuration (read-only, never write):
    • Check and configure bucket policy for public read access only
    • Disable "Block Public Access" settings for read access
    • Apply public-read ACL to uploaded objects (not public-write)
    • Validate public read access before deployment
    • Security: Uploads require authenticated credentials, only reads are public

Story 6.3: Database Schema Updates for Multi-Cloud

Estimated Effort: 3 story points

As a developer, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider.

Acceptance Criteria:

  • Add storage_provider field to site_deployments table:
    • Type: String(20), Not Null, Default: 'bunny'
    • Values: 'bunny', 's3', 's3_compatible'
    • Indexed for query performance
  • Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'):
    • s3_bucket_name: String(255), Nullable
    • s3_bucket_region: String(50), Nullable
    • s3_custom_domain: String(255), Nullable
    • s3_endpoint_url: String(500), Nullable (for S3-compatible services)
  • Create migration script to:
    • Add new fields with appropriate defaults
    • Set storage_provider='bunny' for all existing records
    • Preserve all existing bunny.net fields
  • Update SiteDeployment model with new fields
  • Update repository methods to handle new fields
  • Backward compatibility: existing queries continue to work

Migration Strategy:

  • Existing sites default to 'bunny' provider
  • No data loss or breaking changes
  • New fields are nullable to support gradual migration

Story 6.4: URL Generation for S3 Providers

Estimated Effort: 3 story points

As a user, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs.

Acceptance Criteria:

  • Update generate_public_url() in url_generator.py to handle S3 providers
  • Support multiple URL formats:
    • Virtual-hosted style: https://bucket.s3.region.amazonaws.com/file.html
    • Path-style: https://s3.region.amazonaws.com/bucket/file.html (if needed)
    • Custom domain: https://custom-domain.com/file.html
  • URL generation logic based on storage_provider field
  • Maintain existing behavior for Bunny.net (no changes)
  • Handle S3-compatible services with custom endpoints
  • Unit tests for all URL generation scenarios

Technical Notes:

  • Virtual-hosted style is default for AWS S3
  • Custom domain takes precedence if configured
  • S3-compatible services may need path-style URLs depending on endpoint

Story 6.5: S3-Compatible Services Support

Estimated Effort: 5 story points

As a user, I want to deploy to S3-compatible services (Linode Object Storage, DreamHost Object Storage, DigitalOcean Spaces), so that I can use S3-compatible storage providers the same way I use Bunny.net.

Acceptance Criteria:

  • Extend S3StorageClient to support S3-compatible endpoints
  • Support provider-specific configurations:
    • Linode Object Storage: Custom endpoint
    • DreamHost Object Storage: Custom endpoint
    • DigitalOcean Spaces: Custom endpoint (e.g., https://nyc3.digitaloceanspaces.com)
  • Store s3_endpoint_url per site for custom endpoints
  • Handle provider-specific authentication differences
  • Support provider-specific URL generation
  • Configuration examples in documentation
  • Unit tests for each supported service

Supported Services:

  • AWS S3 (standard)
  • Linode Object Storage
  • DreamHost Object Storage
  • DigitalOcean Spaces
  • Backblaze
  • Cloudflare
  • (Other S3-compatible services can be added as needed)

Configuration:

  • Per-service credentials in .env (global environment variables):
    • LINODE_ACCESS_KEY / LINODE_SECRET_KEY (for Linode)
    • DREAMHOST_ACCESS_KEY / DREAMHOST_SECRET_KEY (for DreamHost)
    • DO_SPACES_ACCESS_KEY / DO_SPACES_SECRET_KEY (for DigitalOcean)
  • Endpoint URLs stored per-site in s3_endpoint_url field
  • Provider type stored in storage_provider ('s3_compatible')
  • Automatic public access configuration (same as AWS S3)

Technical Notes:

  • Most S3-compatible services work with boto3 using custom endpoints
  • Some may require minor authentication adjustments
  • URL generation may differ (e.g., DigitalOcean uses different domain structure)

Story 6.6: S3 Bucket Provisioning Script

Estimated Effort: 3 story points

As a user, I want a script to automatically create and configure S3 buckets with proper public access settings, so that I can quickly set up new storage targets without manual AWS console work.

Acceptance Criteria:

  • Create CLI command: provision-s3-bucket --name <bucket> --region <region> [--provider <s3|linode|dreamhost|do>]
  • Automatically create bucket if it doesn't exist
  • Configure bucket for public read access only (not write):
    • Apply bucket policy allowing public read (GET requests only)
    • Disable "Block Public Access" settings for read access
    • Set appropriate CORS headers if needed
    • Security: Never enable public write access - uploads require authentication
  • Support multiple providers:
    • AWS S3 (standard regions)
    • Linode Object Storage
    • DreamHost Object Storage
    • DigitalOcean Spaces
  • Validate bucket configuration after creation
  • Option to link bucket to existing site deployment
  • Clear error messages for common issues (bucket name conflicts, permissions, etc.)
  • Documentation with examples for each provider

Usage Examples:

# Create AWS S3 bucket
provision-s3-bucket --name my-site-bucket --region us-east-1

# Create Linode bucket
provision-s3-bucket --name my-site-bucket --region us-east-1 --provider linode

# Create and link to site
provision-s3-bucket --name my-site-bucket --region us-east-1 --site-id 5

Technical Notes:

  • Uses boto3 for all providers (with custom endpoints for S3-compatible)
  • Bucket naming validation (AWS rules apply)
  • Idempotent: safe to run multiple times
  • Optional: Can be integrated into provision-site command later

Technical Considerations

Architecture Changes

  1. Unified Method Signature:

    # All storage clients use the same signature
    class BunnyStorageClient:
        def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
            # Extract bunny-specific fields from site
            zone_name = site.storage_zone_name
            zone_password = site.storage_zone_password
            # ... do upload
    
    class S3StorageClient:
        def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
            # Extract S3-specific fields from site
            bucket_name = site.s3_bucket_name
            # ... do upload
    
  2. Simple Factory Function:

    def create_storage_client(site: SiteDeployment):
        """Create appropriate storage client based on site provider"""
        if site.storage_provider == 'bunny':
            return BunnyStorageClient()
        elif site.storage_provider == 's3':
            return S3StorageClient()
        elif site.storage_provider == 's3_compatible':
            return S3StorageClient()  # Same client, uses site.s3_endpoint_url
        # Future: elif site.storage_provider == 'cloudflare': ...
        else:
            raise ValueError(f"Unknown provider: {site.storage_provider}")
    
  3. Clean DeploymentService:

    # In deploy_article():
    client = create_storage_client(site)  # One line, works for all providers
    client.upload_file(site, file_path, content)  # Same call for all
    
  4. Optional Protocol (recommended for type safety with 8+ providers):

    from typing import Protocol
    
    class StorageClient(Protocol):
        def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: ...
    

Credential Management

Decision: Global Environment Variables

  • All credentials stored in .env file (global)
  • AWS: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
  • Linode: LINODE_ACCESS_KEY, LINODE_SECRET_KEY
  • DreamHost: DREAMHOST_ACCESS_KEY, DREAMHOST_SECRET_KEY
  • DigitalOcean: DO_SPACES_ACCESS_KEY, DO_SPACES_SECRET_KEY
  • Simple, secure, follows cloud provider best practices
  • Works well for single-account deployments
  • Per-site credentials can be added later if needed for multi-account scenarios

URL Generation Strategy

Bunny.net: Uses CDN hostname (custom or bunny.net domain) AWS S3: Uses bucket name + region or custom domain (manual setup) S3-Compatible: Uses service-specific endpoint or custom domain (manual setup)

Custom domain mapping is supported but requires manual configuration (documented, not automated).

Backward Compatibility

  • All existing Bunny.net sites continue to work
  • Default storage_provider='bunny' for existing records
  • No breaking changes to existing APIs
  • No migration tools provided (sites can stay on Bunny.net or be manually reconfigured)

Testing Strategy

  • Unit tests with mocked boto3/requests
  • Integration tests with test S3 buckets (optional)
  • Backward compatibility tests for Bunny.net
  • URL generation tests for all providers

Dependencies

  • boto3 library for AWS S3 operations
  • Existing deployment infrastructure (Epic 4)
  • Database migration tools

Decisions Made

  1. Credential Storage: Global environment variables (Option A)

    • All credentials in .env file
    • Simple, secure, follows cloud provider best practices
  2. S3-Compatible Services: Support Linode, DreamHost, and DigitalOcean

    • All services supported equally - no priority/decision logic in this epic
    • Provider selection happens elsewhere in the codebase
    • This epic just enables S3-compatible services to work the same as Bunny.net
  3. Custom Domains: Manual setup (deferred automation)

    • Custom domains require manual configuration
    • Documented process, no automation in this epic
  4. Bucket Provisioning: Manual with optional script (Story 6.6)

    • Primary: Manual bucket creation
    • Optional: provision-s3-bucket CLI script for automated setup
  5. Public Access: Automatic configuration (read-only)

    • System automatically configures buckets for public READ access only
    • Applies bucket policies for read access, disables block public access, sets public-read ACLs
    • Security: Never enables public write access - all uploads require authenticated credentials
  6. Migration Path: No migration tools

    • No automated migration from Bunny.net to S3
    • Sites can be manually reconfigured if needed

Success Metrics

  • Deploy content to AWS S3 successfully
  • Deploy content to S3-compatible services (Linode, DreamHost, DigitalOcean) successfully
  • All existing Bunny.net deployments continue working
  • URL generation works correctly for all providers
  • Buckets automatically configured for public read access (not write)
  • Zero breaking changes to existing functionality
  • Bucket provisioning script works for all supported providers