Add Epic 6: Multi-Cloud Storage Support planning and merge images branch changes

main
PeninsulaInd 2025-12-10 11:37:56 -06:00
parent 62074cd995
commit 7e21482419
20 changed files with 1127 additions and 80 deletions

View File

@ -0,0 +1,89 @@
# Image and Template Issues Analysis
## Problems Identified
### 1. Missing Image CSS in Templates
**Issue**: None of the templates (basic, modern, classic) have CSS for `<img>` tags.
**Impact**: Images display at full size, breaking layout especially in modern template with constrained article width (850px).
**Solution**: Add responsive image CSS to all templates:
```css
img {
max-width: 100%;
height: auto;
display: block;
margin: 1.5rem auto;
border-radius: 8px;
}
```
### 2. Template Storage Inconsistency
**Issue**: `template_used` field is only set when `apply_template()` is called. If:
- Templates are applied at different times
- Some articles skip template application
- Articles are moved between sites with different templates
- Template application fails silently
Then the database may show incorrect or missing template values.
**Evidence**: User reports articles showing "basic" when they're actually "modern".
**Solution**:
- Always apply templates before deployment
- Re-apply templates if `template_used` doesn't match site's `template_name`
- Add validation to ensure `template_used` matches site template
### 3. Images Lost During Interlink Injection
**Issue**: Processing order:
1. Images inserted into `content` → saved
2. Interlinks injected → BeautifulSoup parses/rewrites HTML → saved
3. Template applied → reads `content` → creates `formatted_html`
BeautifulSoup parsing may break image tags or lose them during HTML rewriting.
**Evidence**: User reports images were generated and uploaded (URLs in database) but don't appear in deployed articles.
**Solution Options**:
- **Option A**: Re-insert images after interlink injection (read from `hero_image_url` and `content_images` fields)
- **Option B**: Use more robust HTML parsing that preserves all tags
- **Option C**: Apply template immediately after image insertion, then inject interlinks into `formatted_html` instead of `content`
### 4. Image Size Not Constrained
**Issue**: Even if images are present, they're not constrained by template CSS, causing layout issues.
**Solution**: Add image CSS (see #1) and ensure images are inserted with proper attributes:
```html
<img src="..." alt="..." style="max-width: 100%; height: auto;" />
```
## Recommended Fixes
### Priority 1: Add Image CSS to All Templates
Add responsive image styling to:
- `src/templating/templates/basic.html`
- `src/templating/templates/modern.html`
- `src/templating/templates/classic.html`
### Priority 2: Fix Image Preservation
Modify `src/interlinking/content_injection.py` to preserve images:
- Use `html.parser` with `preserve_whitespace` or `html5lib` parser
- Or re-insert images after interlink injection using database fields
### Priority 3: Fix Template Tracking
- Add validation in deployment to ensure `template_used` matches site template
- Re-apply templates if mismatch detected
- Add script to backfill/correct `template_used` values
### Priority 4: Improve Image Insertion
- Add `max-width` style attribute when inserting images
- Ensure images are inserted with proper responsive attributes
## Code Locations
- Image insertion: `src/generation/image_injection.py`
- Interlink injection: `src/interlinking/content_injection.py` (line 53-76)
- Template application: `src/generation/service.py` (line 409-460)
- Template files: `src/templating/templates/*.html`
- Deployment: `src/deployment/deployment_service.py` (uses `formatted_html`)

View File

@ -41,6 +41,7 @@ Each job object defines a complete content generation batch for a specific proje
| `auto_create_sites` | `boolean` | `false` | Whether to auto-create sites when pool is insufficient (Story 3.1) | | `auto_create_sites` | `boolean` | `false` | Whether to auto-create sites when pool is insufficient (Story 3.1) |
| `create_sites_for_keywords` | `Array<Object>` | `null` | Array of keyword site creation configs (Story 3.1) | | `create_sites_for_keywords` | `Array<Object>` | `null` | Array of keyword site creation configs (Story 3.1) |
| `tiered_link_count_range` | `Object` | `null` | Configuration for tiered link counts (Story 3.2) | | `tiered_link_count_range` | `Object` | `null` | Configuration for tiered link counts (Story 3.2) |
| `image_theme_prompt` | `string` | `null` | Override image theme prompt for all images in this job (Story 7.1) |
## Tier Configuration ## Tier Configuration
@ -212,6 +213,36 @@ Each tier in the `tiers` object defines content generation parameters for that s
### Implementation Status ### Implementation Status
**Implemented** - The `models` field is fully functional. Different models can be specified for title, outline, and content generation stages. If a job file contains a `models` configuration and you also use the `--model` CLI flag, the system will warn you that the CLI flag is being ignored in favor of the job config. **Implemented** - The `models` field is fully functional. Different models can be specified for title, outline, and content generation stages. If a job file contains a `models` configuration and you also use the `--model` CLI flag, the system will warn you that the CLI flag is being ignored in favor of the job config.
## Image Theme Configuration (Story 7.1)
### `image_theme_prompt`
- **Type**: `string` (optional)
- **Purpose**: Override the image theme prompt for all images (hero and content) generated in this job
- **Behavior**:
- If provided, this string is used directly as the theme prompt for all image generation
- If not provided, the system checks for a cached theme in the project database
- If no cached theme exists, a new theme is generated using AI based on the project's keyword, entities, and related searches
- **Format**: A single string describing visual style, color scheme, lighting, environment, and overall aesthetic
- **Note**: This is the prompt sent directly to the image generation API (fal.ai FLUX.1 schnell), not split into system/user messages
### Example
```json
{
"image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic"
}
```
### Theme Prompt Priority
1. **Job override** (`image_theme_prompt` in job.json) - Highest priority
2. **Database cache** (`Project.image_theme_prompt`) - Used if no override
3. **AI generation** - Generated using `image_theme_generation.json` template if no cache exists
### Best Practices
- Use descriptive color schemes to avoid default blue tones
- Include lighting, environment, and style details
- Keep it concise (2-3 sentences recommended)
- Consider the industry/product when choosing colors and aesthetic
## Tiered Link Configuration (Story 3.2) ## Tiered Link Configuration (Story 3.2)
### `tiered_link_count_range` ### `tiered_link_count_range`
@ -270,6 +301,7 @@ Each tier in the `tiers` object defines content generation parameters for that s
"min": 3, "min": 3,
"max": 5 "max": 5
}, },
"image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic",
"tiers": { "tiers": {
"tier1": { "tier1": {
"count": 10, "count": 10,
@ -305,6 +337,7 @@ Each tier in the `tiers` object defines content generation parameters for that s
- `auto_create_sites` must be a boolean (if specified) - `auto_create_sites` must be a boolean (if specified)
- `create_sites_for_keywords` must be an array of objects with `keyword` and `count` fields (if specified) - `create_sites_for_keywords` must be an array of objects with `keyword` and `count` fields (if specified)
- `tiered_link_count_range` must have `min` >= 1 and `max` >= `min` (if specified) - `tiered_link_count_range` must have `min` >= 1 and `max` >= `min` (if specified)
- `image_theme_prompt` must be a non-empty string (if specified)
### Tier Level Validation ### Tier Level Validation
- `count` must be a positive integer - `count` must be a positive integer
@ -362,6 +395,11 @@ uv run python main.py generate-batch --job-file jobs/example.json --username adm
- Integrated with tiered link generation system - Integrated with tiered link generation system
- Added validation for link count ranges - Added validation for link count ranges
### Story 7.1: Image Generation
- Added `image_theme_prompt` for overriding image theme prompts
- Allows manual control over visual style and color schemes
- Overrides database cache and AI generation when specified
## Future Extensions ## Future Extensions
The schema is designed to be extensible for future features: The schema is designed to be extensible for future features:

View File

@ -17,29 +17,40 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
- **Story 6.3**: 🔄 PLANNING (Database Schema Updates) - **Story 6.3**: 🔄 PLANNING (Database Schema Updates)
- **Story 6.4**: 🔄 PLANNING (URL Generation for S3) - **Story 6.4**: 🔄 PLANNING (URL Generation for S3)
- **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support) - **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support)
- **Story 6.6**: 🔄 PLANNING (Bucket Provisioning Script)
## Stories ## Stories
### Story 6.1: Storage Provider Abstraction Layer ### Story 6.1: Storage Provider Abstraction Layer
**Estimated Effort**: 5 story points **Estimated Effort**: 3 story points
**As a developer**, I want a unified storage interface that abstracts provider-specific details, so that the deployment service can work with any storage provider without code changes. **As a developer**, I want a simple way to support multiple storage providers without cluttering `DeploymentService` with if/elif chains, so that adding new providers (eventually 8+) is straightforward.
**Acceptance Criteria**: **Acceptance Criteria**:
* Create a `StorageClient` protocol/interface with common methods: * Create a simple factory function `create_storage_client(site: SiteDeployment)` that returns the appropriate client:
- `upload_file(file_path: str, content: str, content_type: str) -> UploadResult` - `'bunny'``BunnyStorageClient()`
- `file_exists(file_path: str) -> bool` - `'s3'``S3StorageClient()`
- `list_files(prefix: str = '') -> List[str]` - `'s3_compatible'``S3StorageClient()` (with custom endpoint)
* Refactor `BunnyStorageClient` to implement the interface - Future providers added here
* Create a `StorageClientFactory` that returns the appropriate client based on provider type * Refactor `BunnyStorageClient.upload_file()` to accept `site: SiteDeployment` parameter:
* Update `DeploymentService` to use the factory instead of hardcoding `BunnyStorageClient` - Change from: `upload_file(zone_name, zone_password, zone_region, file_path, content)`
- Change to: `upload_file(site: SiteDeployment, file_path: str, content: str)`
- Client extracts bunny-specific fields from `site` internally
* Update `DeploymentService` to use factory and unified interface:
- Remove hardcoded `BunnyStorageClient` from `__init__`
- In `deploy_article()` and `deploy_boilerplate_page()`: create client per site
- Call: `client.upload_file(site, file_path, content)` (same signature for all providers)
* Optional: Add `StorageClient` Protocol for type hints (helps with 8+ providers)
* All existing Bunny.net deployments continue to work without changes * All existing Bunny.net deployments continue to work without changes
* Unit tests verify interface compliance * Unit tests verify factory returns correct clients
**Technical Notes**: **Technical Notes**:
* Use Python `Protocol` (typing) or ABC for interface definition * Factory function is simple if/elif chain (one place to maintain)
* Factory pattern: `create_storage_client(site: SiteDeployment) -> StorageClient` * All clients use same method signature: `upload_file(site, file_path, content)`
* Maintain backward compatibility: default provider is "bunny" if not specified * Each client extracts provider-specific fields from `site` object internally
* Protocol is optional but recommended for type safety with many providers
* Factory pattern keeps `DeploymentService` clean (no provider-specific logic)
* Backward compatibility: default provider is "bunny" if not specified
--- ---
@ -52,8 +63,12 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
* Create `S3StorageClient` implementing `StorageClient` interface * Create `S3StorageClient` implementing `StorageClient` interface
* Use boto3 library for AWS S3 operations * Use boto3 library for AWS S3 operations
* Support standard AWS S3 regions * Support standard AWS S3 regions
* Authentication via AWS credentials (access key ID, secret access key) * Authentication via AWS credentials from environment variables
* Handle bucket permissions (public read access required) * Automatically configure bucket for public READ access only (not write):
- Apply public-read ACL or bucket policy on first upload
- Ensure bucket allows public read access (disable block public access settings)
- Verify public read access is enabled before deployment
- **Security**: Never enable public write access - only read permissions
* Upload files with correct content-type headers * Upload files with correct content-type headers
* Generate public URLs from bucket name and region * Generate public URLs from bucket name and region
* Support custom domain mapping (if configured) * Support custom domain mapping (if configured)
@ -62,14 +77,14 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
* Unit tests with mocked boto3 calls * Unit tests with mocked boto3 calls
**Configuration**: **Configuration**:
* AWS credentials from environment variables: * AWS credentials from environment variables (global):
- `AWS_ACCESS_KEY_ID` - `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY` - `AWS_SECRET_ACCESS_KEY`
- `AWS_REGION` (default region, can be overridden per-site) - `AWS_REGION` (default region, can be overridden per-site)
* Per-site configuration stored in database: * Per-site configuration stored in database:
- `bucket_name`: S3 bucket name - `s3_bucket_name`: S3 bucket name
- `bucket_region`: AWS region (optional, uses default if not set) - `s3_bucket_region`: AWS region (optional, uses default if not set)
- `custom_domain`: Optional custom domain for URL generation - `s3_custom_domain`: Optional custom domain for URL generation (manual setup)
**URL Generation**: **URL Generation**:
* Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}` * Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}`
@ -79,7 +94,12 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
**Technical Notes**: **Technical Notes**:
* boto3 session management (reuse sessions for performance) * boto3 session management (reuse sessions for performance)
* Content-type detection (text/html for HTML files) * Content-type detection (text/html for HTML files)
* Public read ACL or bucket policy required for public URLs * Automatic public read access configuration (read-only, never write):
- Check and configure bucket policy for public read access only
- Disable "Block Public Access" settings for read access
- Apply public-read ACL to uploaded objects (not public-write)
- Validate public read access before deployment
- **Security**: Uploads require authenticated credentials, only reads are public
--- ---
@ -139,30 +159,37 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
### Story 6.5: S3-Compatible Services Support ### Story 6.5: S3-Compatible Services Support
**Estimated Effort**: 5 story points **Estimated Effort**: 5 story points
**As a user**, I want to deploy to S3-compatible services (DigitalOcean Spaces, Backblaze B2, Linode Object Storage), so that I can use cost-effective alternatives to AWS. **As a user**, I want to deploy to S3-compatible services (Linode Object Storage, DreamHost Object Storage, DigitalOcean Spaces), so that I can use S3-compatible storage providers the same way I use Bunny.net.
**Acceptance Criteria**: **Acceptance Criteria**:
* Extend `S3StorageClient` to support S3-compatible endpoints * Extend `S3StorageClient` to support S3-compatible endpoints
* Support provider-specific configurations: * Support provider-specific configurations:
- **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`)
- **Backblaze B2**: Custom endpoint and authentication
- **Linode Object Storage**: Custom endpoint - **Linode Object Storage**: Custom endpoint
- **DreamHost Object Storage**: Custom endpoint
- **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`)
* Store `s3_endpoint_url` per site for custom endpoints * Store `s3_endpoint_url` per site for custom endpoints
* Handle provider-specific authentication differences * Handle provider-specific authentication differences
* Support provider-specific URL generation * Support provider-specific URL generation
* Configuration examples in documentation * Configuration examples in documentation
* Unit tests for each supported service * Unit tests for each supported service
**Supported Services** (Initial): **Supported Services**:
* DigitalOcean Spaces * AWS S3 (standard)
* Backblaze B2
* Linode Object Storage * Linode Object Storage
* (Others can be added as needed) * DreamHost Object Storage
* DigitalOcean Spaces
* Backblaze
* Cloudflare
* (Other S3-compatible services can be added as needed)
**Configuration**: **Configuration**:
* Per-service credentials in `.env` or per-site in database * Per-service credentials in `.env` (global environment variables):
- `LINODE_ACCESS_KEY` / `LINODE_SECRET_KEY` (for Linode)
- `DREAMHOST_ACCESS_KEY` / `DREAMHOST_SECRET_KEY` (for DreamHost)
- `DO_SPACES_ACCESS_KEY` / `DO_SPACES_SECRET_KEY` (for DigitalOcean)
* Endpoint URLs stored per-site in `s3_endpoint_url` field * Endpoint URLs stored per-site in `s3_endpoint_url` field
* Provider type stored in `storage_provider` ('s3_compatible') * Provider type stored in `storage_provider` ('s3_compatible')
* Automatic public access configuration (same as AWS S3)
**Technical Notes**: **Technical Notes**:
* Most S3-compatible services work with boto3 using custom endpoints * Most S3-compatible services work with boto3 using custom endpoints
@ -171,61 +198,126 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a
--- ---
### Story 6.6: S3 Bucket Provisioning Script
**Estimated Effort**: 3 story points
**As a user**, I want a script to automatically create and configure S3 buckets with proper public access settings, so that I can quickly set up new storage targets without manual AWS console work.
**Acceptance Criteria**:
* Create CLI command: `provision-s3-bucket --name <bucket> --region <region> [--provider <s3|linode|dreamhost|do>]`
* Automatically create bucket if it doesn't exist
* Configure bucket for public read access only (not write):
- Apply bucket policy allowing public read (GET requests only)
- Disable "Block Public Access" settings for read access
- Set appropriate CORS headers if needed
- **Security**: Never enable public write access - uploads require authentication
* Support multiple providers:
- AWS S3 (standard regions)
- Linode Object Storage
- DreamHost Object Storage
- DigitalOcean Spaces
* Validate bucket configuration after creation
* Option to link bucket to existing site deployment
* Clear error messages for common issues (bucket name conflicts, permissions, etc.)
* Documentation with examples for each provider
**Usage Examples**:
```bash
# Create AWS S3 bucket
provision-s3-bucket --name my-site-bucket --region us-east-1
# Create Linode bucket
provision-s3-bucket --name my-site-bucket --region us-east-1 --provider linode
# Create and link to site
provision-s3-bucket --name my-site-bucket --region us-east-1 --site-id 5
```
**Technical Notes**:
* Uses boto3 for all providers (with custom endpoints for S3-compatible)
* Bucket naming validation (AWS rules apply)
* Idempotent: safe to run multiple times
* Optional: Can be integrated into `provision-site` command later
---
## Technical Considerations ## Technical Considerations
### Architecture Changes ### Architecture Changes
1. **Interface/Protocol Design**: 1. **Unified Method Signature**:
```python ```python
class StorageClient(Protocol): # All storage clients use the same signature
def upload_file(...) -> UploadResult: ... class BunnyStorageClient:
def file_exists(...) -> bool: ... def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
def list_files(...) -> List[str]: ... # Extract bunny-specific fields from site
zone_name = site.storage_zone_name
zone_password = site.storage_zone_password
# ... do upload
class S3StorageClient:
def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult:
# Extract S3-specific fields from site
bucket_name = site.s3_bucket_name
# ... do upload
``` ```
2. **Factory Pattern**: 2. **Simple Factory Function**:
```python ```python
def create_storage_client(site: SiteDeployment) -> StorageClient: def create_storage_client(site: SiteDeployment):
"""Create appropriate storage client based on site provider"""
if site.storage_provider == 'bunny': if site.storage_provider == 'bunny':
return BunnyStorageClient() return BunnyStorageClient()
elif site.storage_provider in ('s3', 's3_compatible'): elif site.storage_provider == 's3':
return S3StorageClient(site) return S3StorageClient()
elif site.storage_provider == 's3_compatible':
return S3StorageClient() # Same client, uses site.s3_endpoint_url
# Future: elif site.storage_provider == 'cloudflare': ...
else: else:
raise ValueError(f"Unknown provider: {site.storage_provider}") raise ValueError(f"Unknown provider: {site.storage_provider}")
``` ```
3. **Dependency Injection**: 3. **Clean DeploymentService**:
- `DeploymentService` receives `StorageClient` from factory ```python
- No hardcoded provider dependencies # In deploy_article():
client = create_storage_client(site) # One line, works for all providers
client.upload_file(site, file_path, content) # Same call for all
```
4. **Optional Protocol** (recommended for type safety with 8+ providers):
```python
from typing import Protocol
class StorageClient(Protocol):
def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: ...
```
### Credential Management ### Credential Management
**Option A: Environment Variables (Recommended for AWS)** **Decision: Global Environment Variables**
- Global AWS credentials in `.env` - All credentials stored in `.env` file (global)
- Simple, secure, follows AWS best practices - AWS: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
- Linode: `LINODE_ACCESS_KEY`, `LINODE_SECRET_KEY`
- DreamHost: `DREAMHOST_ACCESS_KEY`, `DREAMHOST_SECRET_KEY`
- DigitalOcean: `DO_SPACES_ACCESS_KEY`, `DO_SPACES_SECRET_KEY`
- Simple, secure, follows cloud provider best practices
- Works well for single-account deployments - Works well for single-account deployments
- Per-site credentials can be added later if needed for multi-account scenarios
**Option B: Per-Site Credentials**
- Store credentials in database (encrypted)
- Required for multi-account or S3-compatible services
- More complex but more flexible
**Decision Needed**: Which approach for initial implementation?
### URL Generation Strategy ### URL Generation Strategy
**Bunny.net**: Uses CDN hostname (custom or bunny.net domain) **Bunny.net**: Uses CDN hostname (custom or bunny.net domain)
**AWS S3**: Uses bucket name + region or custom domain **AWS S3**: Uses bucket name + region or custom domain (manual setup)
**S3-Compatible**: Uses service-specific endpoint or custom domain **S3-Compatible**: Uses service-specific endpoint or custom domain (manual setup)
All providers should support custom domain mapping for consistent URLs. Custom domain mapping is supported but requires manual configuration (documented, not automated).
### Backward Compatibility ### Backward Compatibility
- All existing Bunny.net sites continue to work - All existing Bunny.net sites continue to work
- Default `storage_provider='bunny'` for existing records - Default `storage_provider='bunny'` for existing records
- No breaking changes to existing APIs - No breaking changes to existing APIs
- Migration is optional (sites can stay on Bunny.net) - No migration tools provided (sites can stay on Bunny.net or be manually reconfigured)
### Testing Strategy ### Testing Strategy
@ -240,26 +332,42 @@ All providers should support custom domain mapping for consistent URLs.
- Existing deployment infrastructure (Epic 4) - Existing deployment infrastructure (Epic 4)
- Database migration tools - Database migration tools
## Open Questions ## Decisions Made
1. **Credential Storage**: Per-site in DB vs. global env vars? (Recommendation: Start with env vars, add per-site later if needed) 1. **Credential Storage**: ✅ Global environment variables (Option A)
- All credentials in `.env` file
- Simple, secure, follows cloud provider best practices
2. **S3-Compatible Priority**: Which services to support first? (Recommendation: DigitalOcean Spaces, then Backblaze B2) 2. **S3-Compatible Services**: ✅ Support Linode, DreamHost, and DigitalOcean
- All services supported equally - no priority/decision logic in this epic
- Provider selection happens elsewhere in the codebase
- This epic just enables S3-compatible services to work the same as Bunny.net
3. **Custom Domains**: How are custom domains configured? Manual setup or automated? (Recommendation: Manual for now, document process) 3. **Custom Domains**: ✅ Manual setup (deferred automation)
- Custom domains require manual configuration
- Documented process, no automation in this epic
4. **Bucket Provisioning**: Should we automate S3 bucket creation, or require manual setup? (Recommendation: Manual for now, similar to current Bunny.net approach) 4. **Bucket Provisioning**: ✅ Manual with optional script (Story 6.6)
- Primary: Manual bucket creation
- Optional: `provision-s3-bucket` CLI script for automated setup
5. **Public Access**: How to ensure buckets are publicly readable? (Recommendation: Document requirements, validate in tests) 5. **Public Access**: ✅ Automatic configuration (read-only)
- System automatically configures buckets for public READ access only
- Applies bucket policies for read access, disables block public access, sets public-read ACLs
- **Security**: Never enables public write access - all uploads require authenticated credentials
6. **Migration Path**: Should we provide tools to migrate existing Bunny.net sites to S3? (Recommendation: Defer to future story) 6. **Migration Path**: ✅ No migration tools
- No automated migration from Bunny.net to S3
- Sites can be manually reconfigured if needed
## Success Metrics ## Success Metrics
- ✅ Deploy content to AWS S3 successfully - ✅ Deploy content to AWS S3 successfully
- ✅ Deploy content to at least one S3-compatible service - ✅ Deploy content to S3-compatible services (Linode, DreamHost, DigitalOcean) successfully
- ✅ All existing Bunny.net deployments continue working - ✅ All existing Bunny.net deployments continue working
- ✅ URL generation works correctly for all providers - ✅ URL generation works correctly for all providers
- ✅ Buckets automatically configured for public read access (not write)
- ✅ Zero breaking changes to existing functionality - ✅ Zero breaking changes to existing functionality
- ✅ Bucket provisioning script works for all supported providers

View File

@ -32,6 +32,7 @@ Job files define batch content generation parameters using JSON format.
- `tiers` (required): Dictionary of tier configurations - `tiers` (required): Dictionary of tier configurations
- `deployment_targets` (optional): Array of site custom_hostnames or site_deployment_ids to cycle through - `deployment_targets` (optional): Array of site custom_hostnames or site_deployment_ids to cycle through
- `deployment_overflow` (optional): Strategy when batch size exceeds deployment_targets ("round_robin", "random_available", or "none"). Default: "round_robin" - `deployment_overflow` (optional): Strategy when batch size exceeds deployment_targets ("round_robin", "random_available", or "none"). Default: "round_robin"
- `image_theme_prompt` (optional): Override the image theme prompt for all images in this job. If not specified, uses the cached theme from the database or generates a new one using AI. This is a single string that describes the visual style, color scheme, lighting, and overall aesthetic for generated images.
### Tier Level ### Tier Level
- `count` (required): Number of articles to generate for this tier - `count` (required): Number of articles to generate for this tier
@ -155,6 +156,25 @@ If tier parameters are not specified, these defaults are used:
} }
``` ```
### Custom Image Theme
```json
{
"jobs": [
{
"project_id": 1,
"image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic",
"tiers": {
"tier1": {
"count": 5
}
}
}
]
}
```
The `image_theme_prompt` overrides the default AI-generated theme for all images (hero and content) in this job. Use it to ensure consistent visual styling or to avoid default color schemes. If omitted, the system will use the cached theme from the project database, or generate a new one if none exists.
## Usage ## Usage
Run batch generation with: Run batch generation with:

View File

@ -0,0 +1,76 @@
"""
Randomly assign templates to all domains
Usage:
uv run python scripts/assign_templates_to_domains.py
"""
import sys
from pathlib import Path
import random
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
from src.database.session import db_manager
from src.database.models import SiteDeployment
from src.templating.service import TemplateService
def assign_templates():
"""Randomly assign templates to all site deployments"""
db_manager.initialize()
session = db_manager.get_session()
try:
template_service = TemplateService()
available_templates = template_service.get_available_templates()
if not available_templates:
print("Error: No templates found!")
return
print(f"Available templates: {', '.join(available_templates)}")
sites = session.query(SiteDeployment).all()
fqdn_sites = [s for s in sites if s.custom_hostname is not None]
bcdn_sites = [s for s in sites if s.custom_hostname is None]
print(f"\nTotal sites: {len(sites)}")
print(f" FQDN domains: {len(fqdn_sites)}")
print(f" b-cdn.net domains: {len(bcdn_sites)}")
updated_fqdn = 0
updated_bcdn = 0
for site in fqdn_sites:
if site.template_name == "basic":
site.template_name = random.choice(available_templates)
updated_fqdn += 1
for site in bcdn_sites:
if site.template_name == "basic":
site.template_name = random.choice(available_templates)
updated_bcdn += 1
session.commit()
print(f"\nUpdated templates:")
print(f" FQDN domains: {updated_fqdn}")
print(f" b-cdn.net domains: {updated_bcdn}")
print(f" Total: {updated_fqdn + updated_bcdn}")
except Exception as e:
session.rollback()
print(f"Error: {e}")
raise
finally:
session.close()
if __name__ == "__main__":
assign_templates()

View File

@ -0,0 +1,110 @@
"""
Script to check image theme prompts in the database
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from src.database.session import db_manager
from src.database.repositories import ProjectRepository
def check_theme_prompts():
"""Check all projects and their image theme prompts"""
db_manager.initialize()
session = db_manager.get_session()
try:
project_repo = ProjectRepository(session)
projects = project_repo.get_all()
print("=" * 80)
print("IMAGE THEME PROMPTS IN DATABASE")
print("=" * 80)
print()
projects_with_themes = []
projects_without_themes = []
for project in projects:
if project.image_theme_prompt:
projects_with_themes.append(project)
else:
projects_without_themes.append(project)
print(f"Total projects: {len(projects)}")
print(f"Projects WITH theme prompts: {len(projects_with_themes)}")
print(f"Projects WITHOUT theme prompts: {len(projects_without_themes)}")
print()
if projects_with_themes:
print("=" * 80)
print("PROJECTS WITH THEME PROMPTS:")
print("=" * 80)
print()
for project in projects_with_themes:
print(f"Project ID: {project.id}")
print(f"Name: {project.name}")
print(f"Main Keyword: {project.main_keyword}")
print(f"Theme Prompt:")
print(f" {project.image_theme_prompt}")
print()
print("-" * 80)
print()
if projects_without_themes:
print("=" * 80)
print("PROJECTS WITHOUT THEME PROMPTS:")
print("=" * 80)
print()
for project in projects_without_themes:
print(f" ID {project.id}: {project.name} ({project.main_keyword})")
print()
# Check for common patterns
if projects_with_themes:
print("=" * 80)
print("ANALYSIS:")
print("=" * 80)
print()
blue_mentions = []
for project in projects_with_themes:
theme_lower = project.image_theme_prompt.lower()
if 'blue' in theme_lower:
blue_mentions.append((project.id, project.name, project.image_theme_prompt))
print(f"Projects mentioning 'blue': {len(blue_mentions)}/{len(projects_with_themes)}")
if blue_mentions:
print()
print("Projects with 'blue' in theme:")
for proj_id, name, theme in blue_mentions:
print(f" ID {proj_id}: {name}")
print(f" Theme: {theme}")
print()
# Check for other common color mentions
colors = ['red', 'green', 'yellow', 'orange', 'purple', 'gray', 'grey', 'black', 'white']
color_counts = {}
for color in colors:
count = sum(1 for p in projects_with_themes if color in p.image_theme_prompt.lower())
if count > 0:
color_counts[color] = count
if color_counts:
print("Other color mentions:")
for color, count in sorted(color_counts.items(), key=lambda x: x[1], reverse=True):
print(f" {color}: {count} projects")
print()
finally:
session.close()
db_manager.close()
if __name__ == "__main__":
check_theme_prompts()

View File

@ -0,0 +1,44 @@
"""
Count how many domains have each template assigned
Usage:
uv run python scripts/count_templates_by_domain.py
"""
import sys
from pathlib import Path
from collections import Counter
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
from src.database.session import db_manager
from src.database.models import SiteDeployment
def count_templates():
"""Count templates across all site deployments"""
db_manager.initialize()
session = db_manager.get_session()
try:
sites = session.query(SiteDeployment.template_name).all()
template_counts = Counter()
for (template_name,) in sites:
template_counts[template_name] += 1
print(f"\nTotal sites: {sum(template_counts.values())}")
print("\nTemplate distribution:")
print("-" * 40)
for template, count in sorted(template_counts.items()):
print(f" {template:20} : {count:4}")
print("-" * 40)
finally:
session.close()
if __name__ == "__main__":
count_templates()

View File

@ -0,0 +1,66 @@
"""
List all Tier 1 articles for a project with their URLs, templates, and hero URLs
Usage:
uv run python scripts/list_t1_articles.py [project_id]
If project_id is not provided, defaults to project 30.
"""
import sys
from pathlib import Path
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
from src.database.session import db_manager
from src.database.repositories import GeneratedContentRepository, ProjectRepository
def list_t1_articles(project_id: int = 30):
"""List all Tier 1 articles for a project"""
session = db_manager.get_session()
try:
content_repo = GeneratedContentRepository(session)
project_repo = ProjectRepository(session)
project = project_repo.get_by_id(project_id)
if not project:
print(f"Project {project_id} not found")
return
articles = content_repo.get_by_project_and_tier(project_id, "tier1", require_site=False)
if not articles:
print(f"No Tier 1 articles found for project {project_id}")
return
print(f"\nProject {project_id}: {project.name}")
print("=" * 140)
print(f"{'Article URL':<60} {'Template':<20} {'Hero URL':<60}")
print("=" * 140)
for article in articles:
article_url = article.deployed_url or "(Not deployed)"
template = article.template_used or "(No template)"
hero_url = article.hero_image_url or "(No hero image)"
print(f"{article_url:<60} {template:<20} {hero_url:<60}")
print("=" * 140)
print(f"\nTotal Tier 1 articles: {len(articles)}")
finally:
session.close()
if __name__ == "__main__":
project_id = 30
if len(sys.argv) > 1:
try:
project_id = int(sys.argv[1])
except ValueError:
print(f"Invalid project_id: {sys.argv[1]}. Using default: 30")
list_t1_articles(project_id)

View File

@ -266,8 +266,7 @@ def test_image_generation(project_id: int):
except Exception as e: except Exception as e:
click.echo(f" [ERROR] {str(e)[:200]}") click.echo(f" [ERROR] {str(e)[:200]}")
click.echo("\n2. Content Images:")
click.echo(" (Skipped - T2 articles don't get content images by default)")
click.echo(f"\n\n{'='*60}") click.echo(f"\n\n{'='*60}")
click.echo("TEST COMPLETE") click.echo("TEST COMPLETE")

View File

@ -0,0 +1,173 @@
"""
Test script to verify image reinsertion after interlink injection
Tests the new flow:
1. Get existing articles (2 T1, 2 T2) from project 30
2. Simulate interlink injection (already done, just read current content)
3. Re-insert images using _reinsert_images logic
4. Apply templates
5. Save formatted HTML locally to verify images display
Usage:
uv run python scripts/test_image_reinsertion.py
"""
import sys
from pathlib import Path
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
from src.database.session import db_manager
from src.database.repositories import GeneratedContentRepository, ProjectRepository, SiteDeploymentRepository
from src.generation.image_injection import insert_hero_after_h1, insert_content_images_after_h2s, generate_alt_text
from src.templating.service import TemplateService
def test_image_reinsertion(project_id: int = 30):
"""Test image reinsertion on existing articles"""
session = db_manager.get_session()
try:
content_repo = GeneratedContentRepository(session)
project_repo = ProjectRepository(session)
site_repo = SiteDeploymentRepository(session)
project = project_repo.get_by_id(project_id)
if not project:
print(f"Project {project_id} not found")
return
# Get 2 T1 and 2 T2 articles
t1_articles = content_repo.get_by_project_and_tier(project_id, "tier1", require_site=False)
t2_articles = content_repo.get_by_project_and_tier(project_id, "tier2", require_site=False)
if len(t1_articles) < 2:
print(f"Not enough T1 articles (found {len(t1_articles)}, need 2)")
return
if len(t2_articles) < 2:
print(f"Not enough T2 articles (found {len(t2_articles)}, need 2)")
return
test_articles = t1_articles[:2] + t2_articles[:2]
print(f"\nTesting image reinsertion for project {project_id}: {project.name}")
print(f"Selected {len(test_articles)} articles:")
for article in test_articles:
has_hero = article.hero_image_url or "None"
has_content = f"{len(article.content_images) if article.content_images else 0} images"
existing_imgs = article.content.count("<img")
print(f" - {article.tier}: {article.title[:50]}")
print(f" Hero URL in DB: {has_hero}")
print(f" Content images in DB: {has_content}")
print(f" Existing <img> tags in content: {existing_imgs}")
# Create output directory
output_dir = Path("test_output")
output_dir.mkdir(exist_ok=True)
# Initialize template service
template_service = TemplateService()
# Process each article
for article in test_articles:
print(f"\nProcessing: {article.title[:50]}...")
# Step 1: Get current content (after interlink injection)
html = article.content
print(f" Content length: {len(html)} chars")
# Step 2: Re-insert images (simulating _reinsert_images)
if article.hero_image_url or article.content_images:
print(f" Re-inserting images...")
# Remove existing images first (to avoid duplicates)
import re
existing_count = html.count("<img")
if existing_count > 0:
print(f" Removing {existing_count} existing image(s)...")
html = re.sub(r'<img[^>]*>', '', html)
# Insert hero image if exists
if article.hero_image_url:
alt_text = generate_alt_text(project)
html = insert_hero_after_h1(html, article.hero_image_url, alt_text)
print(f" Hero image inserted: {article.hero_image_url}")
else:
print(f" No hero image URL in database")
# Insert content images if exist
if article.content_images:
alt_texts = [generate_alt_text(project) for _ in article.content_images]
html = insert_content_images_after_h2s(html, article.content_images, alt_texts)
print(f" {len(article.content_images)} content images inserted")
else:
print(f" No images to insert (hero_image_url and content_images both empty)")
# Step 3: Apply template
print(f" Applying template...")
try:
# Get template name from site or use default
template_name = template_service.select_template_for_content(
site_deployment_id=article.site_deployment_id,
site_deployment_repo=site_repo
)
# Generate meta description
import re
from html import unescape
text = re.sub(r'<[^>]+>', '', html)
text = unescape(text)
words = text.split()[:25]
meta_description = ' '.join(words) + '...'
# Format content with template
formatted_html = template_service.format_content(
content=html,
title=article.title,
meta_description=meta_description,
template_name=template_name,
canonical_url=article.deployed_url
)
print(f" Template '{template_name}' applied")
# Step 4: Save to file
safe_title = "".join(c for c in article.title if c.isalnum() or c in (' ', '-', '_')).rstrip()[:50]
filename = f"{article.tier}_{article.id}_{safe_title}.html"
filepath = output_dir / filename
with open(filepath, 'w', encoding='utf-8') as f:
f.write(formatted_html)
print(f" Saved to: {filepath}")
# Check if images are in the HTML
hero_count = formatted_html.count(article.hero_image_url) if article.hero_image_url else 0
content_count = sum(formatted_html.count(url) for url in (article.content_images or []))
print(f" Image check: Hero={hero_count}, Content={content_count}")
except Exception as e:
print(f" ERROR applying template: {e}")
import traceback
traceback.print_exc()
print(f"\n✓ Test complete! Check files in {output_dir}/")
print(f" Open the HTML files in a browser to verify images display correctly.")
finally:
session.close()
if __name__ == "__main__":
project_id = 30
if len(sys.argv) > 1:
try:
project_id = int(sys.argv[1])
except ValueError:
print(f"Invalid project_id: {sys.argv[1]}. Using default: 30")
test_image_reinsertion(project_id)

View File

@ -23,6 +23,7 @@ from src.database.repositories import GeneratedContentRepository, SitePageReposi
from src.deployment.bunny_storage import BunnyStorageClient, BunnyStorageError from src.deployment.bunny_storage import BunnyStorageClient, BunnyStorageError
from src.deployment.deployment_service import DeploymentService from src.deployment.deployment_service import DeploymentService
from src.deployment.url_logger import URLLogger from src.deployment.url_logger import URLLogger
from src.templating.service import TemplateService
from dotenv import load_dotenv from dotenv import load_dotenv
import os import os
import requests import requests
@ -433,6 +434,15 @@ def provision_site(name: str, domain: str, storage_name: str, region: str,
pull_zone_bcdn_hostname=pull_result.hostname pull_zone_bcdn_hostname=pull_result.hostname
) )
# Randomly assign template
template_service = TemplateService()
available_templates = template_service.get_available_templates()
if available_templates:
deployment.template_name = random.choice(available_templates)
session.commit()
session.refresh(deployment)
click.echo(f" Template assigned: {deployment.template_name}")
click.echo("\n" + "=" * 70) click.echo("\n" + "=" * 70)
click.echo("Site provisioned successfully!") click.echo("Site provisioned successfully!")
click.echo("=" * 70) click.echo("=" * 70)
@ -540,6 +550,15 @@ def attach_domain(name: str, domain: str, storage_name: str,
pull_zone_bcdn_hostname=pull_result.hostname pull_zone_bcdn_hostname=pull_result.hostname
) )
# Randomly assign template
template_service = TemplateService()
available_templates = template_service.get_available_templates()
if available_templates:
deployment.template_name = random.choice(available_templates)
session.commit()
session.refresh(deployment)
click.echo(f" Template assigned: {deployment.template_name}")
click.echo("\n" + "=" * 70) click.echo("\n" + "=" * 70)
click.echo("Domain attached successfully!") click.echo("Domain attached successfully!")
click.echo("=" * 70) click.echo("=" * 70)
@ -841,11 +860,20 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run
custom_hostname=custom_hostname custom_hostname=custom_hostname
) )
# Randomly assign template
template_service = TemplateService()
available_templates = template_service.get_available_templates()
if available_templates:
deployment.template_name = random.choice(available_templates)
session.commit()
session.refresh(deployment)
click.echo(f"IMPORTED: {check_hostname}") click.echo(f"IMPORTED: {check_hostname}")
click.echo(f" Storage Zone: {storage_zone['Name']} (Region: {storage_zone.get('Region', 'Unknown')})") click.echo(f" Storage Zone: {storage_zone['Name']} (Region: {storage_zone.get('Region', 'Unknown')})")
click.echo(f" Pull Zone: {pz['Name']} (ID: {pz['Id']})") click.echo(f" Pull Zone: {pz['Name']} (ID: {pz['Id']})")
if custom_hostname: if custom_hostname:
click.echo(f" Custom Domain: {custom_hostname}") click.echo(f" Custom Domain: {custom_hostname}")
click.echo(f" Template: {deployment.template_name}")
imported += 1 imported += 1
except Exception as e: except Exception as e:

Binary file not shown.

View File

@ -401,7 +401,8 @@ class BatchProcessor:
tier_config=tier_config, tier_config=tier_config,
title=title, title=title,
site_deployment_id=site_deployment_id, site_deployment_id=site_deployment_id,
prefix=prefix prefix=prefix,
theme_override=job.image_theme_prompt
) )
# Update article with image URLs # Update article with image URLs
@ -420,7 +421,8 @@ class BatchProcessor:
title: str, title: str,
content: str, content: str,
site_deployment_id: Optional[int], site_deployment_id: Optional[int],
prefix: str prefix: str,
theme_override: Optional[str] = None
) -> tuple[str, Optional[str], List[str]]: ) -> tuple[str, Optional[str], List[str]]:
""" """
Generate images and insert into HTML content Generate images and insert into HTML content
@ -444,7 +446,8 @@ class BatchProcessor:
image_generator = ImageGenerator( image_generator = ImageGenerator(
ai_client=self.generator.ai_client, ai_client=self.generator.ai_client,
prompt_manager=self.generator.prompt_manager, prompt_manager=self.generator.prompt_manager,
project_repo=self.project_repo project_repo=self.project_repo,
theme_override=theme_override
) )
storage_client = BunnyStorageClient() storage_client = BunnyStorageClient()
@ -539,7 +542,8 @@ class BatchProcessor:
tier_config: TierConfig, tier_config: TierConfig,
title: str, title: str,
site_deployment_id: Optional[int], site_deployment_id: Optional[int],
prefix: str prefix: str,
theme_override: Optional[str] = None
) -> tuple[Optional[str], List[str]]: ) -> tuple[Optional[str], List[str]]:
""" """
Generate images and upload to storage, but don't insert into HTML. Generate images and upload to storage, but don't insert into HTML.
@ -559,7 +563,8 @@ class BatchProcessor:
image_generator = ImageGenerator( image_generator = ImageGenerator(
ai_client=self.generator.ai_client, ai_client=self.generator.ai_client,
prompt_manager=self.generator.prompt_manager, prompt_manager=self.generator.prompt_manager,
project_repo=self.project_repo project_repo=self.project_repo,
theme_override=theme_override
) )
storage_client = BunnyStorageClient() storage_client = BunnyStorageClient()
@ -896,7 +901,8 @@ class BatchProcessor:
thread_image_generator = ImageGenerator( thread_image_generator = ImageGenerator(
ai_client=thread_generator.ai_client, ai_client=thread_generator.ai_client,
prompt_manager=thread_generator.prompt_manager, prompt_manager=thread_generator.prompt_manager,
project_repo=thread_project_repo project_repo=thread_project_repo,
theme_override=job.image_theme_prompt
) )
hero_url = None hero_url = None

View File

@ -19,13 +19,56 @@ logger = logging.getLogger(__name__)
def truncate_title(title: str, max_words: int = 4) -> str: def truncate_title(title: str, max_words: int = 4) -> str:
"""Truncate title to max_words and convert to UPPERCASE""" """Truncate a title to a maximum number of words and convert to uppercase.
Takes the first max_words from the title, joins them with spaces, and converts
the result to uppercase. Useful for creating short, prominent text overlays
on images.
Args:
title: The title text to truncate. Can contain any number of words.
max_words: Maximum number of words to keep from the beginning of the title.
Defaults to 4.
Returns:
A string containing the first max_words of the title in UPPERCASE format.
If the title has fewer words than max_words, returns the entire title
in uppercase.
Example:
>>> truncate_title("The Quick Brown Fox Jumps Over", 4)
'THE QUICK BROWN FOX'
>>> truncate_title("Short Title", 4)
'SHORT TITLE'
"""
words = title.split()[:max_words] words = title.split()[:max_words]
return " ".join(words).upper() return " ".join(words).upper()
def slugify(text: str) -> str: def slugify(text: str) -> str:
"""Convert text to URL-friendly slug""" """Convert text to a URL-friendly slug format.
Transforms text into a lowercase slug suitable for use in URLs or filenames.
Replaces all non-alphanumeric characters with hyphens and removes leading/trailing
hyphens. Multiple consecutive non-alphanumeric characters are collapsed into
a single hyphen.
Args:
text: The text string to convert to a slug. Can contain any characters.
Returns:
A lowercase string containing only alphanumeric characters and hyphens,
with no leading or trailing hyphens. Multiple consecutive hyphens are
collapsed into a single hyphen.
Example:
>>> slugify("Hello World! 123")
'hello-world-123'
>>> slugify(" Test---String ")
'test-string'
>>> slugify("Special@#$Characters")
'special-characters'
"""
text = text.lower() text = text.lower()
text = re.sub(r'[^a-z0-9]+', '-', text) text = re.sub(r'[^a-z0-9]+', '-', text)
text = text.strip('-') text = text.strip('-')
@ -33,17 +76,57 @@ def slugify(text: str) -> str:
class ImageGenerator: class ImageGenerator:
"""Generate images using fal.ai API""" """Generate images using fal.ai FLUX.1 schnell API.
This class handles image generation for projects, including hero images with
text overlays and content images. It manages theme prompts, coordinates with
AI services for prompt generation, and uses the fal.ai API for actual image
creation. Images are generated asynchronously using a thread pool executor
for concurrent processing.
The generator maintains project-specific theme prompts that are either
retrieved from the database or generated on-demand using AI. Hero images
include text overlays with automatic wrapping and styling, while content
images focus on specific entities and related search terms.
"""
def __init__( def __init__(
self, self,
ai_client: AIClient, ai_client: AIClient,
prompt_manager: PromptManager, prompt_manager: PromptManager,
project_repo: ProjectRepository project_repo: ProjectRepository,
theme_override: Optional[str] = None
): ):
"""Initialize the ImageGenerator with required dependencies.
Sets up the image generator with AI client for prompt generation, prompt
manager for formatting prompts, and project repository for database access.
Configures the fal.ai API key from environment variables and creates a
thread pool executor for concurrent image generation.
Args:
ai_client: Client for generating AI completions (used for theme prompts).
prompt_manager: Manager for formatting and retrieving prompt templates.
project_repo: Repository for accessing and updating project data.
Note:
The fal_client library expects FAL_KEY environment variable, but this
implementation uses FAL_API_KEY. The constructor automatically sets
FAL_KEY from FAL_API_KEY if needed for compatibility. If neither is
set, a warning is logged and image generation will fail.
Attributes:
ai_client: AI client instance for generating completions.
prompt_manager: Prompt manager for template handling.
project_repo: Project repository for database operations.
fal_key: API key for fal.ai service (from FAL_API_KEY or FAL_KEY env var).
max_concurrent: Maximum number of concurrent image generation tasks (default: 5).
executor: ThreadPoolExecutor for managing concurrent image generation.
"""
self.ai_client = ai_client self.ai_client = ai_client
self.prompt_manager = prompt_manager self.prompt_manager = prompt_manager
self.project_repo = project_repo self.project_repo = project_repo
self.theme_override = theme_override
# fal_client library expects FAL_KEY, but we use FAL_API_KEY in our env # fal_client library expects FAL_KEY, but we use FAL_API_KEY in our env
# Set both for compatibility # Set both for compatibility
self.fal_key = os.getenv("FAL_API_KEY") or os.getenv("FAL_KEY") self.fal_key = os.getenv("FAL_API_KEY") or os.getenv("FAL_KEY")
@ -55,15 +138,44 @@ class ImageGenerator:
self.executor = ThreadPoolExecutor(max_workers=self.max_concurrent) self.executor = ThreadPoolExecutor(max_workers=self.max_concurrent)
def get_theme_prompt(self, project_id: int) -> str: def get_theme_prompt(self, project_id: int) -> str:
"""Get or generate theme prompt for project""" """Get or generate a theme prompt for a project.
Retrieves the cached theme prompt from the project if it exists, otherwise
generates a new one using AI based on the project's main keyword, entities,
and related searches. The generated prompt is saved to the database for
future use, ensuring consistency across image generations for the same project.
Args:
project_id: The unique identifier of the project to get/generate
the theme prompt for.
Returns:
A string containing the theme prompt that describes the visual style
and theme for images in this project.
Raises:
ValueError: If the project with the given project_id is not found
in the database.
Note:
The theme prompt is generated using the "image_theme_generation" prompt
template with the project's main keyword, entities, and related searches.
Once generated, it is persisted to the database and reused for all
subsequent image generations for this project.
"""
project = self.project_repo.get_by_id(project_id) project = self.project_repo.get_by_id(project_id)
if not project: if not project:
raise ValueError(f"Project {project_id} not found") raise ValueError(f"Project {project_id} not found")
# Check for override first (from job.json)
if self.theme_override:
return self.theme_override
# Then check cached theme in database
if project.image_theme_prompt: if project.image_theme_prompt:
return project.image_theme_prompt return project.image_theme_prompt
# Generate theme prompt using AI # Finally, generate new theme using AI
entities_str = ", ".join(project.entities or []) entities_str = ", ".join(project.entities or [])
related_str = ", ".join(project.related_searches or []) related_str = ", ".join(project.related_searches or [])
@ -95,7 +207,33 @@ class ImageGenerator:
width: int, width: int,
height: int height: int
) -> bytes: ) -> bytes:
"""Overlay text on image using PIL""" """Overlay text on an image with automatic wrapping and styling.
Takes an image in bytes format and overlays centered text with automatic
word wrapping, a semi-transparent dark background box for readability,
and white text with a black outline for contrast. The text is positioned
in the center of the image and wrapped to fit within 80% of the image width.
Args:
image_bytes: Raw image data in bytes format (JPEG, PNG, etc.).
text: The text string to overlay on the image. Will be automatically
wrapped to fit within the image boundaries.
width: The width of the image in pixels. Used for calculating font
size and text positioning.
height: The height of the image in pixels. Used for vertical centering
of the text.
Returns:
Image bytes in JPEG format with the text overlay applied. The image
is converted to RGB mode if necessary and saved with 95% quality.
Note:
Font size is calculated as width // 15. If Arial font is not available,
falls back to the default PIL font. The text is rendered with a
semi-transparent black background (alpha=180) and white text with
a black outline for maximum readability across different image backgrounds.
Line spacing is set to 130% of the line height for comfortable reading.
"""
img = Image.open(io.BytesIO(image_bytes)) img = Image.open(io.BytesIO(image_bytes))
if img.mode != 'RGBA': if img.mode != 'RGBA':
img = img.convert('RGBA') img = img.convert('RGBA')
@ -183,7 +321,41 @@ class ImageGenerator:
width: int = 1280, width: int = 1280,
height: int = 720 height: int = 720
) -> Optional[bytes]: ) -> Optional[bytes]:
"""Generate hero image with title text""" """Generate a hero image with title text overlay.
Creates a hero image using the project's theme prompt via the fal.ai
FLUX.1 schnell API, then overlays the provided title text on the generated
image. The image is generated with optimized settings for fast generation
(4 inference steps) and downloaded from the API response URL.
The workflow:
1. Retrieves or generates the project's theme prompt
2. Calls fal.ai API with the theme prompt to generate the base image
3. Downloads the generated image from the API response URL
4. Overlays the title text with automatic wrapping and styling
5. Returns the final image as JPEG bytes
Args:
project_id: The unique identifier of the project. Used to retrieve
the project's theme prompt for image generation.
title: The title text to overlay on the hero image. Will be automatically
wrapped and styled for readability.
width: Desired width of the generated image in pixels. Defaults to 1280
(standard HD width).
height: Desired height of the generated image in pixels. Defaults to 720
(standard HD height).
Returns:
Bytes containing the JPEG image data with title overlay, or None if
generation fails. Failure can occur due to missing API key, API errors,
network issues, or malformed API responses.
Note:
Uses fal.ai FLUX.1 schnell model with 4 inference steps and guidance
scale of 3.5 for fast generation. The API response structure is
handled flexibly to accommodate different response formats. All errors
are logged with detailed information for debugging.
"""
if not self.fal_key: if not self.fal_key:
logger.error("FAL_API_KEY not set") logger.error("FAL_API_KEY not set")
return None return None
@ -254,7 +426,42 @@ class ImageGenerator:
width: int = 512, width: int = 512,
height: int = 512 height: int = 512
) -> Optional[bytes]: ) -> Optional[bytes]:
"""Generate content image with entity and related search""" """Generate a content image focused on a specific entity and related search.
Creates a content image that combines the project's theme prompt with
specific focus on an entity and related search term. Unlike hero images,
content images do not include text overlays and are optimized for smaller
dimensions (default 512x512). The prompt explicitly requests a professional
illustration style.
The workflow:
1. Retrieves or generates the project's theme prompt
2. Constructs a focused prompt combining theme, entity, and related search
3. Calls fal.ai API to generate the image
4. Downloads and returns the image as JPEG bytes
Args:
project_id: The unique identifier of the project. Used to retrieve
the project's theme prompt for consistent styling.
entity: The main entity or subject to focus on in the image. This
is incorporated into the generation prompt.
related_search: A related search term to include in the image context.
Combined with the entity to create a more specific image.
width: Desired width of the generated image in pixels. Defaults to 512.
height: Desired height of the generated image in pixels. Defaults to 512.
Returns:
Bytes containing the JPEG image data, or None if generation fails.
Failure can occur due to missing API key, API errors, network issues,
or malformed API responses.
Note:
The generated prompt format is: "{theme} Focus on {entity} and
{related_search}, professional illustration style." Uses the same
API settings as hero images (4 inference steps, guidance scale 3.5)
but without text overlay processing. All errors are logged with
detailed information for debugging.
"""
if not self.fal_key: if not self.fal_key:
logger.error("FAL_API_KEY not set") logger.error("FAL_API_KEY not set")
return None return None

View File

@ -120,6 +120,7 @@ class Job:
failure_config: Optional[FailureConfig] = None failure_config: Optional[FailureConfig] = None
interlinking: Optional[InterlinkingConfig] = None interlinking: Optional[InterlinkingConfig] = None
max_workers: Optional[int] = None max_workers: Optional[int] = None
image_theme_prompt: Optional[str] = None
class JobConfig: class JobConfig:
@ -319,6 +320,15 @@ class JobConfig:
if not isinstance(max_workers, int) or max_workers < 1: if not isinstance(max_workers, int) or max_workers < 1:
raise ValueError("'max_workers' must be a positive integer") raise ValueError("'max_workers' must be a positive integer")
# Parse image_theme_prompt (optional override)
image_theme_prompt = job_data.get("image_theme_prompt")
if image_theme_prompt is not None:
if not isinstance(image_theme_prompt, str):
raise ValueError("'image_theme_prompt' must be a string")
image_theme_prompt = image_theme_prompt.strip()
if not image_theme_prompt:
raise ValueError("'image_theme_prompt' cannot be empty")
return Job( return Job(
project_id=project_id, project_id=project_id,
tiers=tiers, tiers=tiers,
@ -331,7 +341,8 @@ class JobConfig:
anchor_text_config=anchor_text_config, anchor_text_config=anchor_text_config,
failure_config=failure_config, failure_config=failure_config,
interlinking=interlinking, interlinking=interlinking,
max_workers=max_workers max_workers=max_workers,
image_theme_prompt=image_theme_prompt
) )
def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig: def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig:

View File

@ -424,11 +424,15 @@ class ContentGenerator:
True if successful, False otherwise True if successful, False otherwise
""" """
try: try:
# Refresh to ensure we have latest content (especially after image reinsertion)
content_record = self.content_repo.get_by_id(content_id) content_record = self.content_repo.get_by_id(content_id)
if not content_record: if not content_record:
print(f"Warning: Content {content_id} not found") print(f"Warning: Content {content_id} not found")
return False return False
# Force refresh from database to get latest content
self.content_repo.session.refresh(content_record)
if not meta_description: if not meta_description:
text = re.sub(r'<[^>]+>', '', content_record.content) text = re.sub(r'<[^>]+>', '', content_record.content)
text = unescape(text) text = unescape(text)
@ -452,11 +456,19 @@ class ContentGenerator:
content_record.template_used = template_name content_record.template_used = template_name
self.content_repo.update(content_record) self.content_repo.update(content_record)
# Verify it was saved
self.content_repo.session.refresh(content_record)
if content_record.template_used != template_name:
print(f"ERROR: template_used not saved! Expected '{template_name}', got '{content_record.template_used}'")
return False
print(f"Applied template '{template_name}' to content {content_id}") print(f"Applied template '{template_name}' to content {content_id}")
return True return True
except Exception as e: except Exception as e:
print(f"Error applying template to content {content_id}: {e}") print(f"Error applying template to content {content_id}: {e}")
import traceback
traceback.print_exc()
return False return False
def _clean_markdown_fences(self, content: str) -> str: def _clean_markdown_fences(self, content: str) -> str:

View File

@ -100,11 +100,26 @@
background-color: #e7f1ff; background-color: #e7f1ff;
text-decoration: none; text-decoration: none;
} }
img {
max-width: 100%;
height: auto;
display: block;
margin: 1.5rem auto;
border-radius: 4px;
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
}
h1 + img {
margin-top: 1rem;
margin-bottom: 2rem;
}
@media (max-width: 768px) { @media (max-width: 768px) {
nav ul { nav ul {
flex-wrap: wrap; flex-wrap: wrap;
gap: 1rem; gap: 1rem;
} }
img {
margin: 1rem auto;
}
} }
</style> </style>
</head> </head>

View File

@ -106,6 +106,19 @@
background-color: #f9f6f2; background-color: #f9f6f2;
color: #5d4a37; color: #5d4a37;
} }
img {
max-width: 100%;
height: auto;
display: block;
margin: 2rem auto;
border-radius: 4px;
border: 1px solid #e0d7c9;
box-shadow: 0 2px 6px rgba(0,0,0,0.08);
}
h1 + img {
margin-top: 1.5rem;
margin-bottom: 2.5rem;
}
@media (max-width: 768px) { @media (max-width: 768px) {
body { body {
padding: 10px; padding: 10px;
@ -132,6 +145,9 @@
flex-wrap: wrap; flex-wrap: wrap;
gap: 1rem; gap: 1rem;
} }
img {
margin: 1.5rem auto;
}
} }
</style> </style>
</head> </head>

View File

@ -91,6 +91,16 @@
nav a:hover { nav a:hover {
border-bottom-color: #000; border-bottom-color: #000;
} }
img {
max-width: 100%;
height: auto;
display: block;
margin: 2rem auto;
}
h1 + img {
margin-top: 1.5rem;
margin-bottom: 2.5rem;
}
@media (max-width: 768px) { @media (max-width: 768px) {
body { body {
padding: 20px 15px; padding: 20px 15px;
@ -108,6 +118,9 @@
flex-wrap: wrap; flex-wrap: wrap;
gap: 1rem; gap: 1rem;
} }
img {
margin: 1.5rem auto;
}
} }
</style> </style>
</head> </head>

View File

@ -115,6 +115,18 @@
text-decoration: none; text-decoration: none;
transform: translateY(-2px); transform: translateY(-2px);
} }
img {
max-width: 100%;
height: auto;
display: block;
margin: 2rem auto;
border-radius: 8px;
box-shadow: 0 4px 12px rgba(0,0,0,0.15);
}
h1 + img {
margin-top: 1.5rem;
margin-bottom: 2.5rem;
}
@media (max-width: 768px) { @media (max-width: 768px) {
body { body {
padding: 20px 10px; padding: 20px 10px;
@ -138,6 +150,10 @@
flex-wrap: wrap; flex-wrap: wrap;
gap: 1rem; gap: 1rem;
} }
img {
margin: 1.5rem auto;
border-radius: 6px;
}
} }
</style> </style>
</head> </head>