diff --git a/docs/prd/epic-6-multi-cloud-storage.md b/docs/prd/epic-6-multi-cloud-storage.md new file mode 100644 index 0000000..6543117 --- /dev/null +++ b/docs/prd/epic-6-multi-cloud-storage.md @@ -0,0 +1,265 @@ +# Epic 6: Multi-Cloud Storage Support + +## Epic Goal +To extend the deployment system to support AWS S3 and S3-compatible cloud storage providers (DigitalOcean Spaces, Backblaze B2, Linode Object Storage, etc.), providing flexibility beyond Bunny.net while maintaining backward compatibility with existing deployments. + +## Rationale +Currently, the system only supports Bunny.net storage, creating vendor lock-in and limiting deployment options. Many users have existing infrastructure on AWS S3 or prefer S3-compatible services for cost, performance, or compliance reasons. This epic will: + +- **Increase Flexibility**: Support multiple cloud storage providers +- **Reduce Vendor Lock-in**: Enable migration between providers +- **Leverage Existing Infrastructure**: Use existing S3 buckets and credentials +- **Maintain Compatibility**: Existing Bunny.net deployments continue to work unchanged + +## Status +- **Story 6.1**: 🔄 PLANNING (Storage Provider Abstraction) +- **Story 6.2**: 🔄 PLANNING (S3 Client Implementation) +- **Story 6.3**: 🔄 PLANNING (Database Schema Updates) +- **Story 6.4**: 🔄 PLANNING (URL Generation for S3) +- **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support) + +## Stories + +### Story 6.1: Storage Provider Abstraction Layer +**Estimated Effort**: 5 story points + +**As a developer**, I want a unified storage interface that abstracts provider-specific details, so that the deployment service can work with any storage provider without code changes. + +**Acceptance Criteria**: +* Create a `StorageClient` protocol/interface with common methods: + - `upload_file(file_path: str, content: str, content_type: str) -> UploadResult` + - `file_exists(file_path: str) -> bool` + - `list_files(prefix: str = '') -> List[str]` +* Refactor `BunnyStorageClient` to implement the interface +* Create a `StorageClientFactory` that returns the appropriate client based on provider type +* Update `DeploymentService` to use the factory instead of hardcoding `BunnyStorageClient` +* All existing Bunny.net deployments continue to work without changes +* Unit tests verify interface compliance + +**Technical Notes**: +* Use Python `Protocol` (typing) or ABC for interface definition +* Factory pattern: `create_storage_client(site: SiteDeployment) -> StorageClient` +* Maintain backward compatibility: default provider is "bunny" if not specified + +--- + +### Story 6.2: AWS S3 Client Implementation +**Estimated Effort**: 8 story points + +**As a user**, I want to deploy content to AWS S3 buckets, so that I can use my existing AWS infrastructure. + +**Acceptance Criteria**: +* Create `S3StorageClient` implementing `StorageClient` interface +* Use boto3 library for AWS S3 operations +* Support standard AWS S3 regions +* Authentication via AWS credentials (access key ID, secret access key) +* Handle bucket permissions (public read access required) +* Upload files with correct content-type headers +* Generate public URLs from bucket name and region +* Support custom domain mapping (if configured) +* Error handling for common S3 errors (403, 404, bucket not found, etc.) +* Retry logic with exponential backoff (consistent with BunnyStorageClient) +* Unit tests with mocked boto3 calls + +**Configuration**: +* AWS credentials from environment variables: + - `AWS_ACCESS_KEY_ID` + - `AWS_SECRET_ACCESS_KEY` + - `AWS_REGION` (default region, can be overridden per-site) +* Per-site configuration stored in database: + - `bucket_name`: S3 bucket name + - `bucket_region`: AWS region (optional, uses default if not set) + - `custom_domain`: Optional custom domain for URL generation + +**URL Generation**: +* Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}` +* With custom domain: `https://{custom_domain}/{file_path}` +* Support for path-style URLs if needed: `https://s3.{region}.amazonaws.com/{bucket_name}/{file_path}` + +**Technical Notes**: +* boto3 session management (reuse sessions for performance) +* Content-type detection (text/html for HTML files) +* Public read ACL or bucket policy required for public URLs + +--- + +### Story 6.3: Database Schema Updates for Multi-Cloud +**Estimated Effort**: 3 story points + +**As a developer**, I want to store provider-specific configuration in the database, so that each site can use its preferred storage provider. + +**Acceptance Criteria**: +* Add `storage_provider` field to `site_deployments` table: + - Type: String(20), Not Null, Default: 'bunny' + - Values: 'bunny', 's3', 's3_compatible' + - Indexed for query performance +* Add S3-specific fields (nullable, only used when provider is 's3' or 's3_compatible'): + - `s3_bucket_name`: String(255), Nullable + - `s3_bucket_region`: String(50), Nullable + - `s3_custom_domain`: String(255), Nullable + - `s3_endpoint_url`: String(500), Nullable (for S3-compatible services) +* Create migration script to: + - Add new fields with appropriate defaults + - Set `storage_provider='bunny'` for all existing records + - Preserve all existing bunny.net fields +* Update `SiteDeployment` model with new fields +* Update repository methods to handle new fields +* Backward compatibility: existing queries continue to work + +**Migration Strategy**: +* Existing sites default to 'bunny' provider +* No data loss or breaking changes +* New fields are nullable to support gradual migration + +--- + +### Story 6.4: URL Generation for S3 Providers +**Estimated Effort**: 3 story points + +**As a user**, I want public URLs for S3-deployed content to be generated correctly, so that articles are accessible via the expected URLs. + +**Acceptance Criteria**: +* Update `generate_public_url()` in `url_generator.py` to handle S3 providers +* Support multiple URL formats: + - Virtual-hosted style: `https://bucket.s3.region.amazonaws.com/file.html` + - Path-style: `https://s3.region.amazonaws.com/bucket/file.html` (if needed) + - Custom domain: `https://custom-domain.com/file.html` +* URL generation logic based on `storage_provider` field +* Maintain existing behavior for Bunny.net (no changes) +* Handle S3-compatible services with custom endpoints +* Unit tests for all URL generation scenarios + +**Technical Notes**: +* Virtual-hosted style is default for AWS S3 +* Custom domain takes precedence if configured +* S3-compatible services may need path-style URLs depending on endpoint + +--- + +### Story 6.5: S3-Compatible Services Support +**Estimated Effort**: 5 story points + +**As a user**, I want to deploy to S3-compatible services (DigitalOcean Spaces, Backblaze B2, Linode Object Storage), so that I can use cost-effective alternatives to AWS. + +**Acceptance Criteria**: +* Extend `S3StorageClient` to support S3-compatible endpoints +* Support provider-specific configurations: + - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`) + - **Backblaze B2**: Custom endpoint and authentication + - **Linode Object Storage**: Custom endpoint +* Store `s3_endpoint_url` per site for custom endpoints +* Handle provider-specific authentication differences +* Support provider-specific URL generation +* Configuration examples in documentation +* Unit tests for each supported service + +**Supported Services** (Initial): +* DigitalOcean Spaces +* Backblaze B2 +* Linode Object Storage +* (Others can be added as needed) + +**Configuration**: +* Per-service credentials in `.env` or per-site in database +* Endpoint URLs stored per-site in `s3_endpoint_url` field +* Provider type stored in `storage_provider` ('s3_compatible') + +**Technical Notes**: +* Most S3-compatible services work with boto3 using custom endpoints +* Some may require minor authentication adjustments +* URL generation may differ (e.g., DigitalOcean uses different domain structure) + +--- + +## Technical Considerations + +### Architecture Changes + +1. **Interface/Protocol Design**: + ```python + class StorageClient(Protocol): + def upload_file(...) -> UploadResult: ... + def file_exists(...) -> bool: ... + def list_files(...) -> List[str]: ... + ``` + +2. **Factory Pattern**: + ```python + def create_storage_client(site: SiteDeployment) -> StorageClient: + if site.storage_provider == 'bunny': + return BunnyStorageClient() + elif site.storage_provider in ('s3', 's3_compatible'): + return S3StorageClient(site) + else: + raise ValueError(f"Unknown provider: {site.storage_provider}") + ``` + +3. **Dependency Injection**: + - `DeploymentService` receives `StorageClient` from factory + - No hardcoded provider dependencies + +### Credential Management + +**Option A: Environment Variables (Recommended for AWS)** +- Global AWS credentials in `.env` +- Simple, secure, follows AWS best practices +- Works well for single-account deployments + +**Option B: Per-Site Credentials** +- Store credentials in database (encrypted) +- Required for multi-account or S3-compatible services +- More complex but more flexible + +**Decision Needed**: Which approach for initial implementation? + +### URL Generation Strategy + +**Bunny.net**: Uses CDN hostname (custom or bunny.net domain) +**AWS S3**: Uses bucket name + region or custom domain +**S3-Compatible**: Uses service-specific endpoint or custom domain + +All providers should support custom domain mapping for consistent URLs. + +### Backward Compatibility + +- All existing Bunny.net sites continue to work +- Default `storage_provider='bunny'` for existing records +- No breaking changes to existing APIs +- Migration is optional (sites can stay on Bunny.net) + +### Testing Strategy + +- Unit tests with mocked boto3/requests +- Integration tests with test S3 buckets (optional) +- Backward compatibility tests for Bunny.net +- URL generation tests for all providers + +## Dependencies + +- **boto3** library for AWS S3 operations +- Existing deployment infrastructure (Epic 4) +- Database migration tools + +## Open Questions + +1. **Credential Storage**: Per-site in DB vs. global env vars? (Recommendation: Start with env vars, add per-site later if needed) + +2. **S3-Compatible Priority**: Which services to support first? (Recommendation: DigitalOcean Spaces, then Backblaze B2) + +3. **Custom Domains**: How are custom domains configured? Manual setup or automated? (Recommendation: Manual for now, document process) + +4. **Bucket Provisioning**: Should we automate S3 bucket creation, or require manual setup? (Recommendation: Manual for now, similar to current Bunny.net approach) + +5. **Public Access**: How to ensure buckets are publicly readable? (Recommendation: Document requirements, validate in tests) + +6. **Migration Path**: Should we provide tools to migrate existing Bunny.net sites to S3? (Recommendation: Defer to future story) + +## Success Metrics + +- ✅ Deploy content to AWS S3 successfully +- ✅ Deploy content to at least one S3-compatible service +- ✅ All existing Bunny.net deployments continue working +- ✅ URL generation works correctly for all providers +- ✅ Zero breaking changes to existing functionality + + diff --git a/jobs/test_small.json b/jobs/test_small.json deleted file mode 100644 index d496fe6..0000000 --- a/jobs/test_small.json +++ /dev/null @@ -1,19 +0,0 @@ -{ - "jobs": [ - { - "project_id": 1, - "tiers": { - "tier1": { - "count": 1, - "min_word_count": 500, - "max_word_count": 800, - "min_h2_tags": 2, - "max_h2_tags": 3, - "min_h3_tags": 3, - "max_h3_tags": 6 - } - } - } - ] -} - diff --git a/src/generation/prompts/outline_generation.json b/src/generation/prompts/outline_generation.json index 105cd95..76068b2 100644 --- a/src/generation/prompts/outline_generation.json +++ b/src/generation/prompts/outline_generation.json @@ -1,5 +1,5 @@ { - "system_message": "You are an expert content outliner who creates well-structured, comprehensive article outlines that cover topics thoroughly and logically.", - "user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- Between {min_h2} and {max_h2} H2 headings\n- Between {min_h3} and {max_h3} H3 subheadings total (distributed across H2 sections)\n\nEntities to incorporate: {entities}\nRelated searches to address: {related_searches}\n\nReturn ONLY valid JSON in this exact format:\n{{\"outline\": [{{\"h2\": \"Heading text\", \"h3\": [\"Subheading 1\", \"Subheading 2\"]}}, ...]}}\n\nEnsure the outline meets the minimum heading requirements and includes relevant entities and related searches. You can be creative in your headings and subheadings - they just need to be related to the topic {keyword}." + "system_message": "You are an expert content outliner who creates well-structured, comprehensive article outlines that cover topics thoroughly and logically. You are creative and thorough in your headings and subheadings.", + "user_prompt": "Create an article outline for:\nTitle: {title}\nKeyword: {keyword}\n\nConstraints:\n- Between {min_h2} and {max_h2} H2 headings\n- Between {min_h3} and {max_h3} H3 subheadings total (distributed across H2 sections)\n\nEntities to incorporate: {entities}\nRelated searches to address: {related_searches}\n\nReturn ONLY valid JSON in this exact format:\n{{\"outline\": [{{\"h2\": \"Heading text\", \"h3\": [\"Subheading 1\", \"Subheading 2\"]}}, ...]}}\n\nEnsure the outline meets the minimum heading requirements and includes relevant entities and related searches. You can be creative in your headings and subheadings - they just need to be related to the topic {keyword}. Do not make the 'definition of {keyword}' a H2 or H3 heading." }