diff --git a/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md b/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md new file mode 100644 index 0000000..237b364 --- /dev/null +++ b/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md @@ -0,0 +1,89 @@ +# Image and Template Issues Analysis + +## Problems Identified + +### 1. Missing Image CSS in Templates +**Issue**: None of the templates (basic, modern, classic) have CSS for `` tags. + +**Impact**: Images display at full size, breaking layout especially in modern template with constrained article width (850px). + +**Solution**: Add responsive image CSS to all templates: +```css +img { + max-width: 100%; + height: auto; + display: block; + margin: 1.5rem auto; + border-radius: 8px; +} +``` + +### 2. Template Storage Inconsistency +**Issue**: `template_used` field is only set when `apply_template()` is called. If: +- Templates are applied at different times +- Some articles skip template application +- Articles are moved between sites with different templates +- Template application fails silently + +Then the database may show incorrect or missing template values. + +**Evidence**: User reports articles showing "basic" when they're actually "modern". + +**Solution**: +- Always apply templates before deployment +- Re-apply templates if `template_used` doesn't match site's `template_name` +- Add validation to ensure `template_used` matches site template + +### 3. Images Lost During Interlink Injection +**Issue**: Processing order: +1. Images inserted into `content` → saved +2. Interlinks injected → BeautifulSoup parses/rewrites HTML → saved +3. Template applied → reads `content` → creates `formatted_html` + +BeautifulSoup parsing may break image tags or lose them during HTML rewriting. + +**Evidence**: User reports images were generated and uploaded (URLs in database) but don't appear in deployed articles. + +**Solution Options**: +- **Option A**: Re-insert images after interlink injection (read from `hero_image_url` and `content_images` fields) +- **Option B**: Use more robust HTML parsing that preserves all tags +- **Option C**: Apply template immediately after image insertion, then inject interlinks into `formatted_html` instead of `content` + +### 4. Image Size Not Constrained +**Issue**: Even if images are present, they're not constrained by template CSS, causing layout issues. + +**Solution**: Add image CSS (see #1) and ensure images are inserted with proper attributes: +```html +... +``` + +## Recommended Fixes + +### Priority 1: Add Image CSS to All Templates +Add responsive image styling to: +- `src/templating/templates/basic.html` +- `src/templating/templates/modern.html` +- `src/templating/templates/classic.html` + +### Priority 2: Fix Image Preservation +Modify `src/interlinking/content_injection.py` to preserve images: +- Use `html.parser` with `preserve_whitespace` or `html5lib` parser +- Or re-insert images after interlink injection using database fields + +### Priority 3: Fix Template Tracking +- Add validation in deployment to ensure `template_used` matches site template +- Re-apply templates if mismatch detected +- Add script to backfill/correct `template_used` values + +### Priority 4: Improve Image Insertion +- Add `max-width` style attribute when inserting images +- Ensure images are inserted with proper responsive attributes + +## Code Locations + +- Image insertion: `src/generation/image_injection.py` +- Interlink injection: `src/interlinking/content_injection.py` (line 53-76) +- Template application: `src/generation/service.py` (line 409-460) +- Template files: `src/templating/templates/*.html` +- Deployment: `src/deployment/deployment_service.py` (uses `formatted_html`) + diff --git a/docs/job-schema.md b/docs/job-schema.md index 9a1635a..58d0f61 100644 --- a/docs/job-schema.md +++ b/docs/job-schema.md @@ -41,6 +41,7 @@ Each job object defines a complete content generation batch for a specific proje | `auto_create_sites` | `boolean` | `false` | Whether to auto-create sites when pool is insufficient (Story 3.1) | | `create_sites_for_keywords` | `Array` | `null` | Array of keyword site creation configs (Story 3.1) | | `tiered_link_count_range` | `Object` | `null` | Configuration for tiered link counts (Story 3.2) | +| `image_theme_prompt` | `string` | `null` | Override image theme prompt for all images in this job (Story 7.1) | ## Tier Configuration @@ -212,6 +213,36 @@ Each tier in the `tiers` object defines content generation parameters for that s ### Implementation Status **Implemented** - The `models` field is fully functional. Different models can be specified for title, outline, and content generation stages. If a job file contains a `models` configuration and you also use the `--model` CLI flag, the system will warn you that the CLI flag is being ignored in favor of the job config. +## Image Theme Configuration (Story 7.1) + +### `image_theme_prompt` +- **Type**: `string` (optional) +- **Purpose**: Override the image theme prompt for all images (hero and content) generated in this job +- **Behavior**: + - If provided, this string is used directly as the theme prompt for all image generation + - If not provided, the system checks for a cached theme in the project database + - If no cached theme exists, a new theme is generated using AI based on the project's keyword, entities, and related searches +- **Format**: A single string describing visual style, color scheme, lighting, environment, and overall aesthetic +- **Note**: This is the prompt sent directly to the image generation API (fal.ai FLUX.1 schnell), not split into system/user messages + +### Example +```json +{ + "image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic" +} +``` + +### Theme Prompt Priority +1. **Job override** (`image_theme_prompt` in job.json) - Highest priority +2. **Database cache** (`Project.image_theme_prompt`) - Used if no override +3. **AI generation** - Generated using `image_theme_generation.json` template if no cache exists + +### Best Practices +- Use descriptive color schemes to avoid default blue tones +- Include lighting, environment, and style details +- Keep it concise (2-3 sentences recommended) +- Consider the industry/product when choosing colors and aesthetic + ## Tiered Link Configuration (Story 3.2) ### `tiered_link_count_range` @@ -270,6 +301,7 @@ Each tier in the `tiers` object defines content generation parameters for that s "min": 3, "max": 5 }, + "image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic", "tiers": { "tier1": { "count": 10, @@ -305,6 +337,7 @@ Each tier in the `tiers` object defines content generation parameters for that s - `auto_create_sites` must be a boolean (if specified) - `create_sites_for_keywords` must be an array of objects with `keyword` and `count` fields (if specified) - `tiered_link_count_range` must have `min` >= 1 and `max` >= `min` (if specified) +- `image_theme_prompt` must be a non-empty string (if specified) ### Tier Level Validation - `count` must be a positive integer @@ -362,6 +395,11 @@ uv run python main.py generate-batch --job-file jobs/example.json --username adm - Integrated with tiered link generation system - Added validation for link count ranges +### Story 7.1: Image Generation +- Added `image_theme_prompt` for overriding image theme prompts +- Allows manual control over visual style and color schemes +- Overrides database cache and AI generation when specified + ## Future Extensions The schema is designed to be extensible for future features: diff --git a/docs/prd/epic-6-multi-cloud-storage.md b/docs/prd/epic-6-multi-cloud-storage.md index 6543117..a50782e 100644 --- a/docs/prd/epic-6-multi-cloud-storage.md +++ b/docs/prd/epic-6-multi-cloud-storage.md @@ -17,29 +17,40 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a - **Story 6.3**: 🔄 PLANNING (Database Schema Updates) - **Story 6.4**: 🔄 PLANNING (URL Generation for S3) - **Story 6.5**: 🔄 PLANNING (S3-Compatible Services Support) +- **Story 6.6**: 🔄 PLANNING (Bucket Provisioning Script) ## Stories ### Story 6.1: Storage Provider Abstraction Layer -**Estimated Effort**: 5 story points +**Estimated Effort**: 3 story points -**As a developer**, I want a unified storage interface that abstracts provider-specific details, so that the deployment service can work with any storage provider without code changes. +**As a developer**, I want a simple way to support multiple storage providers without cluttering `DeploymentService` with if/elif chains, so that adding new providers (eventually 8+) is straightforward. **Acceptance Criteria**: -* Create a `StorageClient` protocol/interface with common methods: - - `upload_file(file_path: str, content: str, content_type: str) -> UploadResult` - - `file_exists(file_path: str) -> bool` - - `list_files(prefix: str = '') -> List[str]` -* Refactor `BunnyStorageClient` to implement the interface -* Create a `StorageClientFactory` that returns the appropriate client based on provider type -* Update `DeploymentService` to use the factory instead of hardcoding `BunnyStorageClient` +* Create a simple factory function `create_storage_client(site: SiteDeployment)` that returns the appropriate client: + - `'bunny'` → `BunnyStorageClient()` + - `'s3'` → `S3StorageClient()` + - `'s3_compatible'` → `S3StorageClient()` (with custom endpoint) + - Future providers added here +* Refactor `BunnyStorageClient.upload_file()` to accept `site: SiteDeployment` parameter: + - Change from: `upload_file(zone_name, zone_password, zone_region, file_path, content)` + - Change to: `upload_file(site: SiteDeployment, file_path: str, content: str)` + - Client extracts bunny-specific fields from `site` internally +* Update `DeploymentService` to use factory and unified interface: + - Remove hardcoded `BunnyStorageClient` from `__init__` + - In `deploy_article()` and `deploy_boilerplate_page()`: create client per site + - Call: `client.upload_file(site, file_path, content)` (same signature for all providers) +* Optional: Add `StorageClient` Protocol for type hints (helps with 8+ providers) * All existing Bunny.net deployments continue to work without changes -* Unit tests verify interface compliance +* Unit tests verify factory returns correct clients **Technical Notes**: -* Use Python `Protocol` (typing) or ABC for interface definition -* Factory pattern: `create_storage_client(site: SiteDeployment) -> StorageClient` -* Maintain backward compatibility: default provider is "bunny" if not specified +* Factory function is simple if/elif chain (one place to maintain) +* All clients use same method signature: `upload_file(site, file_path, content)` +* Each client extracts provider-specific fields from `site` object internally +* Protocol is optional but recommended for type safety with many providers +* Factory pattern keeps `DeploymentService` clean (no provider-specific logic) +* Backward compatibility: default provider is "bunny" if not specified --- @@ -52,8 +63,12 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a * Create `S3StorageClient` implementing `StorageClient` interface * Use boto3 library for AWS S3 operations * Support standard AWS S3 regions -* Authentication via AWS credentials (access key ID, secret access key) -* Handle bucket permissions (public read access required) +* Authentication via AWS credentials from environment variables +* Automatically configure bucket for public READ access only (not write): + - Apply public-read ACL or bucket policy on first upload + - Ensure bucket allows public read access (disable block public access settings) + - Verify public read access is enabled before deployment + - **Security**: Never enable public write access - only read permissions * Upload files with correct content-type headers * Generate public URLs from bucket name and region * Support custom domain mapping (if configured) @@ -62,14 +77,14 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a * Unit tests with mocked boto3 calls **Configuration**: -* AWS credentials from environment variables: +* AWS credentials from environment variables (global): - `AWS_ACCESS_KEY_ID` - `AWS_SECRET_ACCESS_KEY` - `AWS_REGION` (default region, can be overridden per-site) * Per-site configuration stored in database: - - `bucket_name`: S3 bucket name - - `bucket_region`: AWS region (optional, uses default if not set) - - `custom_domain`: Optional custom domain for URL generation + - `s3_bucket_name`: S3 bucket name + - `s3_bucket_region`: AWS region (optional, uses default if not set) + - `s3_custom_domain`: Optional custom domain for URL generation (manual setup) **URL Generation**: * Default: `https://{bucket_name}.s3.{region}.amazonaws.com/{file_path}` @@ -79,7 +94,12 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a **Technical Notes**: * boto3 session management (reuse sessions for performance) * Content-type detection (text/html for HTML files) -* Public read ACL or bucket policy required for public URLs +* Automatic public read access configuration (read-only, never write): + - Check and configure bucket policy for public read access only + - Disable "Block Public Access" settings for read access + - Apply public-read ACL to uploaded objects (not public-write) + - Validate public read access before deployment + - **Security**: Uploads require authenticated credentials, only reads are public --- @@ -139,30 +159,37 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a ### Story 6.5: S3-Compatible Services Support **Estimated Effort**: 5 story points -**As a user**, I want to deploy to S3-compatible services (DigitalOcean Spaces, Backblaze B2, Linode Object Storage), so that I can use cost-effective alternatives to AWS. +**As a user**, I want to deploy to S3-compatible services (Linode Object Storage, DreamHost Object Storage, DigitalOcean Spaces), so that I can use S3-compatible storage providers the same way I use Bunny.net. **Acceptance Criteria**: * Extend `S3StorageClient` to support S3-compatible endpoints * Support provider-specific configurations: - - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`) - - **Backblaze B2**: Custom endpoint and authentication - **Linode Object Storage**: Custom endpoint + - **DreamHost Object Storage**: Custom endpoint + - **DigitalOcean Spaces**: Custom endpoint (e.g., `https://nyc3.digitaloceanspaces.com`) * Store `s3_endpoint_url` per site for custom endpoints * Handle provider-specific authentication differences * Support provider-specific URL generation * Configuration examples in documentation * Unit tests for each supported service -**Supported Services** (Initial): -* DigitalOcean Spaces -* Backblaze B2 +**Supported Services**: +* AWS S3 (standard) * Linode Object Storage -* (Others can be added as needed) +* DreamHost Object Storage +* DigitalOcean Spaces +* Backblaze +* Cloudflare +* (Other S3-compatible services can be added as needed) **Configuration**: -* Per-service credentials in `.env` or per-site in database +* Per-service credentials in `.env` (global environment variables): + - `LINODE_ACCESS_KEY` / `LINODE_SECRET_KEY` (for Linode) + - `DREAMHOST_ACCESS_KEY` / `DREAMHOST_SECRET_KEY` (for DreamHost) + - `DO_SPACES_ACCESS_KEY` / `DO_SPACES_SECRET_KEY` (for DigitalOcean) * Endpoint URLs stored per-site in `s3_endpoint_url` field * Provider type stored in `storage_provider` ('s3_compatible') +* Automatic public access configuration (same as AWS S3) **Technical Notes**: * Most S3-compatible services work with boto3 using custom endpoints @@ -171,61 +198,126 @@ Currently, the system only supports Bunny.net storage, creating vendor lock-in a --- +### Story 6.6: S3 Bucket Provisioning Script +**Estimated Effort**: 3 story points + +**As a user**, I want a script to automatically create and configure S3 buckets with proper public access settings, so that I can quickly set up new storage targets without manual AWS console work. + +**Acceptance Criteria**: +* Create CLI command: `provision-s3-bucket --name --region [--provider ]` +* Automatically create bucket if it doesn't exist +* Configure bucket for public read access only (not write): + - Apply bucket policy allowing public read (GET requests only) + - Disable "Block Public Access" settings for read access + - Set appropriate CORS headers if needed + - **Security**: Never enable public write access - uploads require authentication +* Support multiple providers: + - AWS S3 (standard regions) + - Linode Object Storage + - DreamHost Object Storage + - DigitalOcean Spaces +* Validate bucket configuration after creation +* Option to link bucket to existing site deployment +* Clear error messages for common issues (bucket name conflicts, permissions, etc.) +* Documentation with examples for each provider + +**Usage Examples**: +```bash +# Create AWS S3 bucket +provision-s3-bucket --name my-site-bucket --region us-east-1 + +# Create Linode bucket +provision-s3-bucket --name my-site-bucket --region us-east-1 --provider linode + +# Create and link to site +provision-s3-bucket --name my-site-bucket --region us-east-1 --site-id 5 +``` + +**Technical Notes**: +* Uses boto3 for all providers (with custom endpoints for S3-compatible) +* Bucket naming validation (AWS rules apply) +* Idempotent: safe to run multiple times +* Optional: Can be integrated into `provision-site` command later + +--- + ## Technical Considerations ### Architecture Changes -1. **Interface/Protocol Design**: +1. **Unified Method Signature**: ```python - class StorageClient(Protocol): - def upload_file(...) -> UploadResult: ... - def file_exists(...) -> bool: ... - def list_files(...) -> List[str]: ... + # All storage clients use the same signature + class BunnyStorageClient: + def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: + # Extract bunny-specific fields from site + zone_name = site.storage_zone_name + zone_password = site.storage_zone_password + # ... do upload + + class S3StorageClient: + def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: + # Extract S3-specific fields from site + bucket_name = site.s3_bucket_name + # ... do upload ``` -2. **Factory Pattern**: +2. **Simple Factory Function**: ```python - def create_storage_client(site: SiteDeployment) -> StorageClient: + def create_storage_client(site: SiteDeployment): + """Create appropriate storage client based on site provider""" if site.storage_provider == 'bunny': return BunnyStorageClient() - elif site.storage_provider in ('s3', 's3_compatible'): - return S3StorageClient(site) + elif site.storage_provider == 's3': + return S3StorageClient() + elif site.storage_provider == 's3_compatible': + return S3StorageClient() # Same client, uses site.s3_endpoint_url + # Future: elif site.storage_provider == 'cloudflare': ... else: raise ValueError(f"Unknown provider: {site.storage_provider}") ``` -3. **Dependency Injection**: - - `DeploymentService` receives `StorageClient` from factory - - No hardcoded provider dependencies +3. **Clean DeploymentService**: + ```python + # In deploy_article(): + client = create_storage_client(site) # One line, works for all providers + client.upload_file(site, file_path, content) # Same call for all + ``` + +4. **Optional Protocol** (recommended for type safety with 8+ providers): + ```python + from typing import Protocol + + class StorageClient(Protocol): + def upload_file(self, site: SiteDeployment, file_path: str, content: str) -> UploadResult: ... + ``` ### Credential Management -**Option A: Environment Variables (Recommended for AWS)** -- Global AWS credentials in `.env` -- Simple, secure, follows AWS best practices +**Decision: Global Environment Variables** +- All credentials stored in `.env` file (global) +- AWS: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION` +- Linode: `LINODE_ACCESS_KEY`, `LINODE_SECRET_KEY` +- DreamHost: `DREAMHOST_ACCESS_KEY`, `DREAMHOST_SECRET_KEY` +- DigitalOcean: `DO_SPACES_ACCESS_KEY`, `DO_SPACES_SECRET_KEY` +- Simple, secure, follows cloud provider best practices - Works well for single-account deployments - -**Option B: Per-Site Credentials** -- Store credentials in database (encrypted) -- Required for multi-account or S3-compatible services -- More complex but more flexible - -**Decision Needed**: Which approach for initial implementation? +- Per-site credentials can be added later if needed for multi-account scenarios ### URL Generation Strategy **Bunny.net**: Uses CDN hostname (custom or bunny.net domain) -**AWS S3**: Uses bucket name + region or custom domain -**S3-Compatible**: Uses service-specific endpoint or custom domain +**AWS S3**: Uses bucket name + region or custom domain (manual setup) +**S3-Compatible**: Uses service-specific endpoint or custom domain (manual setup) -All providers should support custom domain mapping for consistent URLs. +Custom domain mapping is supported but requires manual configuration (documented, not automated). ### Backward Compatibility - All existing Bunny.net sites continue to work - Default `storage_provider='bunny'` for existing records - No breaking changes to existing APIs -- Migration is optional (sites can stay on Bunny.net) +- No migration tools provided (sites can stay on Bunny.net or be manually reconfigured) ### Testing Strategy @@ -240,26 +332,42 @@ All providers should support custom domain mapping for consistent URLs. - Existing deployment infrastructure (Epic 4) - Database migration tools -## Open Questions +## Decisions Made -1. **Credential Storage**: Per-site in DB vs. global env vars? (Recommendation: Start with env vars, add per-site later if needed) +1. **Credential Storage**: ✅ Global environment variables (Option A) + - All credentials in `.env` file + - Simple, secure, follows cloud provider best practices -2. **S3-Compatible Priority**: Which services to support first? (Recommendation: DigitalOcean Spaces, then Backblaze B2) +2. **S3-Compatible Services**: ✅ Support Linode, DreamHost, and DigitalOcean + - All services supported equally - no priority/decision logic in this epic + - Provider selection happens elsewhere in the codebase + - This epic just enables S3-compatible services to work the same as Bunny.net -3. **Custom Domains**: How are custom domains configured? Manual setup or automated? (Recommendation: Manual for now, document process) +3. **Custom Domains**: ✅ Manual setup (deferred automation) + - Custom domains require manual configuration + - Documented process, no automation in this epic -4. **Bucket Provisioning**: Should we automate S3 bucket creation, or require manual setup? (Recommendation: Manual for now, similar to current Bunny.net approach) +4. **Bucket Provisioning**: ✅ Manual with optional script (Story 6.6) + - Primary: Manual bucket creation + - Optional: `provision-s3-bucket` CLI script for automated setup -5. **Public Access**: How to ensure buckets are publicly readable? (Recommendation: Document requirements, validate in tests) +5. **Public Access**: ✅ Automatic configuration (read-only) + - System automatically configures buckets for public READ access only + - Applies bucket policies for read access, disables block public access, sets public-read ACLs + - **Security**: Never enables public write access - all uploads require authenticated credentials -6. **Migration Path**: Should we provide tools to migrate existing Bunny.net sites to S3? (Recommendation: Defer to future story) +6. **Migration Path**: ✅ No migration tools + - No automated migration from Bunny.net to S3 + - Sites can be manually reconfigured if needed ## Success Metrics - ✅ Deploy content to AWS S3 successfully -- ✅ Deploy content to at least one S3-compatible service +- ✅ Deploy content to S3-compatible services (Linode, DreamHost, DigitalOcean) successfully - ✅ All existing Bunny.net deployments continue working - ✅ URL generation works correctly for all providers +- ✅ Buckets automatically configured for public read access (not write) - ✅ Zero breaking changes to existing functionality +- ✅ Bucket provisioning script works for all supported providers diff --git a/jobs/README.md b/jobs/README.md index 5b8ef81..d4fad26 100644 --- a/jobs/README.md +++ b/jobs/README.md @@ -32,6 +32,7 @@ Job files define batch content generation parameters using JSON format. - `tiers` (required): Dictionary of tier configurations - `deployment_targets` (optional): Array of site custom_hostnames or site_deployment_ids to cycle through - `deployment_overflow` (optional): Strategy when batch size exceeds deployment_targets ("round_robin", "random_available", or "none"). Default: "round_robin" +- `image_theme_prompt` (optional): Override the image theme prompt for all images in this job. If not specified, uses the cached theme from the database or generates a new one using AI. This is a single string that describes the visual style, color scheme, lighting, and overall aesthetic for generated images. ### Tier Level - `count` (required): Number of articles to generate for this tier @@ -155,6 +156,25 @@ If tier parameters are not specified, these defaults are used: } ``` +### Custom Image Theme +```json +{ + "jobs": [ + { + "project_id": 1, + "image_theme_prompt": "Modern industrial workspace, warm amber lighting, deep burgundy accents, professional photography style, clean minimalist aesthetic", + "tiers": { + "tier1": { + "count": 5 + } + } + } + ] +} +``` + +The `image_theme_prompt` overrides the default AI-generated theme for all images (hero and content) in this job. Use it to ensure consistent visual styling or to avoid default color schemes. If omitted, the system will use the cached theme from the project database, or generate a new one if none exists. + ## Usage Run batch generation with: diff --git a/scripts/assign_templates_to_domains.py b/scripts/assign_templates_to_domains.py new file mode 100644 index 0000000..b63a983 --- /dev/null +++ b/scripts/assign_templates_to_domains.py @@ -0,0 +1,76 @@ +""" +Randomly assign templates to all domains + +Usage: + uv run python scripts/assign_templates_to_domains.py +""" + +import sys +from pathlib import Path +import random + +project_root = Path(__file__).parent.parent +sys.path.insert(0, str(project_root)) + +from src.database.session import db_manager +from src.database.models import SiteDeployment +from src.templating.service import TemplateService + + +def assign_templates(): + """Randomly assign templates to all site deployments""" + db_manager.initialize() + session = db_manager.get_session() + + try: + template_service = TemplateService() + available_templates = template_service.get_available_templates() + + if not available_templates: + print("Error: No templates found!") + return + + print(f"Available templates: {', '.join(available_templates)}") + + sites = session.query(SiteDeployment).all() + + fqdn_sites = [s for s in sites if s.custom_hostname is not None] + bcdn_sites = [s for s in sites if s.custom_hostname is None] + + print(f"\nTotal sites: {len(sites)}") + print(f" FQDN domains: {len(fqdn_sites)}") + print(f" b-cdn.net domains: {len(bcdn_sites)}") + + updated_fqdn = 0 + updated_bcdn = 0 + + for site in fqdn_sites: + if site.template_name == "basic": + site.template_name = random.choice(available_templates) + updated_fqdn += 1 + + for site in bcdn_sites: + if site.template_name == "basic": + site.template_name = random.choice(available_templates) + updated_bcdn += 1 + + session.commit() + + print(f"\nUpdated templates:") + print(f" FQDN domains: {updated_fqdn}") + print(f" b-cdn.net domains: {updated_bcdn}") + print(f" Total: {updated_fqdn + updated_bcdn}") + + except Exception as e: + session.rollback() + print(f"Error: {e}") + raise + finally: + session.close() + + +if __name__ == "__main__": + assign_templates() + + + diff --git a/scripts/check_theme_prompts.py b/scripts/check_theme_prompts.py new file mode 100644 index 0000000..01fd54a --- /dev/null +++ b/scripts/check_theme_prompts.py @@ -0,0 +1,110 @@ +""" +Script to check image theme prompts in the database +""" + +import sys +from pathlib import Path +sys.path.insert(0, str(Path(__file__).parent.parent)) + +from src.database.session import db_manager +from src.database.repositories import ProjectRepository + + +def check_theme_prompts(): + """Check all projects and their image theme prompts""" + db_manager.initialize() + session = db_manager.get_session() + + try: + project_repo = ProjectRepository(session) + projects = project_repo.get_all() + + print("=" * 80) + print("IMAGE THEME PROMPTS IN DATABASE") + print("=" * 80) + print() + + projects_with_themes = [] + projects_without_themes = [] + + for project in projects: + if project.image_theme_prompt: + projects_with_themes.append(project) + else: + projects_without_themes.append(project) + + print(f"Total projects: {len(projects)}") + print(f"Projects WITH theme prompts: {len(projects_with_themes)}") + print(f"Projects WITHOUT theme prompts: {len(projects_without_themes)}") + print() + + if projects_with_themes: + print("=" * 80) + print("PROJECTS WITH THEME PROMPTS:") + print("=" * 80) + print() + + for project in projects_with_themes: + print(f"Project ID: {project.id}") + print(f"Name: {project.name}") + print(f"Main Keyword: {project.main_keyword}") + print(f"Theme Prompt:") + print(f" {project.image_theme_prompt}") + print() + print("-" * 80) + print() + + if projects_without_themes: + print("=" * 80) + print("PROJECTS WITHOUT THEME PROMPTS:") + print("=" * 80) + print() + + for project in projects_without_themes: + print(f" ID {project.id}: {project.name} ({project.main_keyword})") + print() + + # Check for common patterns + if projects_with_themes: + print("=" * 80) + print("ANALYSIS:") + print("=" * 80) + print() + + blue_mentions = [] + for project in projects_with_themes: + theme_lower = project.image_theme_prompt.lower() + if 'blue' in theme_lower: + blue_mentions.append((project.id, project.name, project.image_theme_prompt)) + + print(f"Projects mentioning 'blue': {len(blue_mentions)}/{len(projects_with_themes)}") + if blue_mentions: + print() + print("Projects with 'blue' in theme:") + for proj_id, name, theme in blue_mentions: + print(f" ID {proj_id}: {name}") + print(f" Theme: {theme}") + print() + + # Check for other common color mentions + colors = ['red', 'green', 'yellow', 'orange', 'purple', 'gray', 'grey', 'black', 'white'] + color_counts = {} + for color in colors: + count = sum(1 for p in projects_with_themes if color in p.image_theme_prompt.lower()) + if count > 0: + color_counts[color] = count + + if color_counts: + print("Other color mentions:") + for color, count in sorted(color_counts.items(), key=lambda x: x[1], reverse=True): + print(f" {color}: {count} projects") + print() + + finally: + session.close() + db_manager.close() + + +if __name__ == "__main__": + check_theme_prompts() + diff --git a/scripts/count_templates_by_domain.py b/scripts/count_templates_by_domain.py new file mode 100644 index 0000000..3875740 --- /dev/null +++ b/scripts/count_templates_by_domain.py @@ -0,0 +1,44 @@ +""" +Count how many domains have each template assigned + +Usage: + uv run python scripts/count_templates_by_domain.py +""" + +import sys +from pathlib import Path +from collections import Counter + +project_root = Path(__file__).parent.parent +sys.path.insert(0, str(project_root)) + +from src.database.session import db_manager +from src.database.models import SiteDeployment + + +def count_templates(): + """Count templates across all site deployments""" + db_manager.initialize() + session = db_manager.get_session() + + try: + sites = session.query(SiteDeployment.template_name).all() + + template_counts = Counter() + for (template_name,) in sites: + template_counts[template_name] += 1 + + print(f"\nTotal sites: {sum(template_counts.values())}") + print("\nTemplate distribution:") + print("-" * 40) + for template, count in sorted(template_counts.items()): + print(f" {template:20} : {count:4}") + print("-" * 40) + + finally: + session.close() + + +if __name__ == "__main__": + count_templates() + diff --git a/scripts/list_t1_articles.py b/scripts/list_t1_articles.py new file mode 100644 index 0000000..5067556 --- /dev/null +++ b/scripts/list_t1_articles.py @@ -0,0 +1,66 @@ +""" +List all Tier 1 articles for a project with their URLs, templates, and hero URLs + +Usage: + uv run python scripts/list_t1_articles.py [project_id] + +If project_id is not provided, defaults to project 30. +""" + +import sys +from pathlib import Path + +project_root = Path(__file__).parent.parent +sys.path.insert(0, str(project_root)) + +from src.database.session import db_manager +from src.database.repositories import GeneratedContentRepository, ProjectRepository + + +def list_t1_articles(project_id: int = 30): + """List all Tier 1 articles for a project""" + session = db_manager.get_session() + try: + content_repo = GeneratedContentRepository(session) + project_repo = ProjectRepository(session) + + project = project_repo.get_by_id(project_id) + if not project: + print(f"Project {project_id} not found") + return + + articles = content_repo.get_by_project_and_tier(project_id, "tier1", require_site=False) + + if not articles: + print(f"No Tier 1 articles found for project {project_id}") + return + + print(f"\nProject {project_id}: {project.name}") + print("=" * 140) + print(f"{'Article URL':<60} {'Template':<20} {'Hero URL':<60}") + print("=" * 140) + + for article in articles: + article_url = article.deployed_url or "(Not deployed)" + template = article.template_used or "(No template)" + hero_url = article.hero_image_url or "(No hero image)" + + print(f"{article_url:<60} {template:<20} {hero_url:<60}") + + print("=" * 140) + print(f"\nTotal Tier 1 articles: {len(articles)}") + + finally: + session.close() + + +if __name__ == "__main__": + project_id = 30 + if len(sys.argv) > 1: + try: + project_id = int(sys.argv[1]) + except ValueError: + print(f"Invalid project_id: {sys.argv[1]}. Using default: 30") + + list_t1_articles(project_id) + diff --git a/scripts/test_image_generation.py b/scripts/test_image_generation.py index 933adf5..245e8ba 100644 --- a/scripts/test_image_generation.py +++ b/scripts/test_image_generation.py @@ -266,8 +266,7 @@ def test_image_generation(project_id: int): except Exception as e: click.echo(f" [ERROR] {str(e)[:200]}") - click.echo("\n2. Content Images:") - click.echo(" (Skipped - T2 articles don't get content images by default)") + click.echo(f"\n\n{'='*60}") click.echo("TEST COMPLETE") diff --git a/scripts/test_image_reinsertion.py b/scripts/test_image_reinsertion.py new file mode 100644 index 0000000..fe1989f --- /dev/null +++ b/scripts/test_image_reinsertion.py @@ -0,0 +1,173 @@ +""" +Test script to verify image reinsertion after interlink injection + +Tests the new flow: +1. Get existing articles (2 T1, 2 T2) from project 30 +2. Simulate interlink injection (already done, just read current content) +3. Re-insert images using _reinsert_images logic +4. Apply templates +5. Save formatted HTML locally to verify images display + +Usage: + uv run python scripts/test_image_reinsertion.py +""" + +import sys +from pathlib import Path + +project_root = Path(__file__).parent.parent +sys.path.insert(0, str(project_root)) + +from src.database.session import db_manager +from src.database.repositories import GeneratedContentRepository, ProjectRepository, SiteDeploymentRepository +from src.generation.image_injection import insert_hero_after_h1, insert_content_images_after_h2s, generate_alt_text +from src.templating.service import TemplateService + + +def test_image_reinsertion(project_id: int = 30): + """Test image reinsertion on existing articles""" + session = db_manager.get_session() + + try: + content_repo = GeneratedContentRepository(session) + project_repo = ProjectRepository(session) + site_repo = SiteDeploymentRepository(session) + + project = project_repo.get_by_id(project_id) + if not project: + print(f"Project {project_id} not found") + return + + # Get 2 T1 and 2 T2 articles + t1_articles = content_repo.get_by_project_and_tier(project_id, "tier1", require_site=False) + t2_articles = content_repo.get_by_project_and_tier(project_id, "tier2", require_site=False) + + if len(t1_articles) < 2: + print(f"Not enough T1 articles (found {len(t1_articles)}, need 2)") + return + + if len(t2_articles) < 2: + print(f"Not enough T2 articles (found {len(t2_articles)}, need 2)") + return + + test_articles = t1_articles[:2] + t2_articles[:2] + + print(f"\nTesting image reinsertion for project {project_id}: {project.name}") + print(f"Selected {len(test_articles)} articles:") + for article in test_articles: + has_hero = article.hero_image_url or "None" + has_content = f"{len(article.content_images) if article.content_images else 0} images" + existing_imgs = article.content.count(" tags in content: {existing_imgs}") + + # Create output directory + output_dir = Path("test_output") + output_dir.mkdir(exist_ok=True) + + # Initialize template service + template_service = TemplateService() + + # Process each article + for article in test_articles: + print(f"\nProcessing: {article.title[:50]}...") + + # Step 1: Get current content (after interlink injection) + html = article.content + print(f" Content length: {len(html)} chars") + + # Step 2: Re-insert images (simulating _reinsert_images) + if article.hero_image_url or article.content_images: + print(f" Re-inserting images...") + + # Remove existing images first (to avoid duplicates) + import re + existing_count = html.count(" 0: + print(f" Removing {existing_count} existing image(s)...") + html = re.sub(r']*>', '', html) + + # Insert hero image if exists + if article.hero_image_url: + alt_text = generate_alt_text(project) + html = insert_hero_after_h1(html, article.hero_image_url, alt_text) + print(f" Hero image inserted: {article.hero_image_url}") + else: + print(f" No hero image URL in database") + + # Insert content images if exist + if article.content_images: + alt_texts = [generate_alt_text(project) for _ in article.content_images] + html = insert_content_images_after_h2s(html, article.content_images, alt_texts) + print(f" {len(article.content_images)} content images inserted") + else: + print(f" No images to insert (hero_image_url and content_images both empty)") + + # Step 3: Apply template + print(f" Applying template...") + try: + # Get template name from site or use default + template_name = template_service.select_template_for_content( + site_deployment_id=article.site_deployment_id, + site_deployment_repo=site_repo + ) + + # Generate meta description + import re + from html import unescape + text = re.sub(r'<[^>]+>', '', html) + text = unescape(text) + words = text.split()[:25] + meta_description = ' '.join(words) + '...' + + # Format content with template + formatted_html = template_service.format_content( + content=html, + title=article.title, + meta_description=meta_description, + template_name=template_name, + canonical_url=article.deployed_url + ) + + print(f" Template '{template_name}' applied") + + # Step 4: Save to file + safe_title = "".join(c for c in article.title if c.isalnum() or c in (' ', '-', '_')).rstrip()[:50] + filename = f"{article.tier}_{article.id}_{safe_title}.html" + filepath = output_dir / filename + + with open(filepath, 'w', encoding='utf-8') as f: + f.write(formatted_html) + + print(f" Saved to: {filepath}") + + # Check if images are in the HTML + hero_count = formatted_html.count(article.hero_image_url) if article.hero_image_url else 0 + content_count = sum(formatted_html.count(url) for url in (article.content_images or [])) + + print(f" Image check: Hero={hero_count}, Content={content_count}") + + except Exception as e: + print(f" ERROR applying template: {e}") + import traceback + traceback.print_exc() + + print(f"\n✓ Test complete! Check files in {output_dir}/") + print(f" Open the HTML files in a browser to verify images display correctly.") + + finally: + session.close() + + +if __name__ == "__main__": + project_id = 30 + if len(sys.argv) > 1: + try: + project_id = int(sys.argv[1]) + except ValueError: + print(f"Invalid project_id: {sys.argv[1]}. Using default: 30") + + test_image_reinsertion(project_id) + diff --git a/src/cli/commands.py b/src/cli/commands.py index fb20a6a..6e92e68 100644 --- a/src/cli/commands.py +++ b/src/cli/commands.py @@ -23,6 +23,7 @@ from src.database.repositories import GeneratedContentRepository, SitePageReposi from src.deployment.bunny_storage import BunnyStorageClient, BunnyStorageError from src.deployment.deployment_service import DeploymentService from src.deployment.url_logger import URLLogger +from src.templating.service import TemplateService from dotenv import load_dotenv import os import requests @@ -433,6 +434,15 @@ def provision_site(name: str, domain: str, storage_name: str, region: str, pull_zone_bcdn_hostname=pull_result.hostname ) + # Randomly assign template + template_service = TemplateService() + available_templates = template_service.get_available_templates() + if available_templates: + deployment.template_name = random.choice(available_templates) + session.commit() + session.refresh(deployment) + click.echo(f" Template assigned: {deployment.template_name}") + click.echo("\n" + "=" * 70) click.echo("Site provisioned successfully!") click.echo("=" * 70) @@ -540,6 +550,15 @@ def attach_domain(name: str, domain: str, storage_name: str, pull_zone_bcdn_hostname=pull_result.hostname ) + # Randomly assign template + template_service = TemplateService() + available_templates = template_service.get_available_templates() + if available_templates: + deployment.template_name = random.choice(available_templates) + session.commit() + session.refresh(deployment) + click.echo(f" Template assigned: {deployment.template_name}") + click.echo("\n" + "=" * 70) click.echo("Domain attached successfully!") click.echo("=" * 70) @@ -841,11 +860,20 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run custom_hostname=custom_hostname ) + # Randomly assign template + template_service = TemplateService() + available_templates = template_service.get_available_templates() + if available_templates: + deployment.template_name = random.choice(available_templates) + session.commit() + session.refresh(deployment) + click.echo(f"IMPORTED: {check_hostname}") click.echo(f" Storage Zone: {storage_zone['Name']} (Region: {storage_zone.get('Region', 'Unknown')})") click.echo(f" Pull Zone: {pz['Name']} (ID: {pz['Id']})") if custom_hostname: click.echo(f" Custom Domain: {custom_hostname}") + click.echo(f" Template: {deployment.template_name}") imported += 1 except Exception as e: diff --git a/src/generation/.service.py.swp b/src/generation/.service.py.swp new file mode 100644 index 0000000..4695a39 Binary files /dev/null and b/src/generation/.service.py.swp differ diff --git a/src/generation/batch_processor.py b/src/generation/batch_processor.py index 4a6be14..cca4db4 100644 --- a/src/generation/batch_processor.py +++ b/src/generation/batch_processor.py @@ -401,7 +401,8 @@ class BatchProcessor: tier_config=tier_config, title=title, site_deployment_id=site_deployment_id, - prefix=prefix + prefix=prefix, + theme_override=job.image_theme_prompt ) # Update article with image URLs @@ -420,7 +421,8 @@ class BatchProcessor: title: str, content: str, site_deployment_id: Optional[int], - prefix: str + prefix: str, + theme_override: Optional[str] = None ) -> tuple[str, Optional[str], List[str]]: """ Generate images and insert into HTML content @@ -444,7 +446,8 @@ class BatchProcessor: image_generator = ImageGenerator( ai_client=self.generator.ai_client, prompt_manager=self.generator.prompt_manager, - project_repo=self.project_repo + project_repo=self.project_repo, + theme_override=theme_override ) storage_client = BunnyStorageClient() @@ -539,7 +542,8 @@ class BatchProcessor: tier_config: TierConfig, title: str, site_deployment_id: Optional[int], - prefix: str + prefix: str, + theme_override: Optional[str] = None ) -> tuple[Optional[str], List[str]]: """ Generate images and upload to storage, but don't insert into HTML. @@ -559,7 +563,8 @@ class BatchProcessor: image_generator = ImageGenerator( ai_client=self.generator.ai_client, prompt_manager=self.generator.prompt_manager, - project_repo=self.project_repo + project_repo=self.project_repo, + theme_override=theme_override ) storage_client = BunnyStorageClient() @@ -896,7 +901,8 @@ class BatchProcessor: thread_image_generator = ImageGenerator( ai_client=thread_generator.ai_client, prompt_manager=thread_generator.prompt_manager, - project_repo=thread_project_repo + project_repo=thread_project_repo, + theme_override=job.image_theme_prompt ) hero_url = None diff --git a/src/generation/image_generator.py b/src/generation/image_generator.py index 507372f..9978a03 100644 --- a/src/generation/image_generator.py +++ b/src/generation/image_generator.py @@ -19,13 +19,56 @@ logger = logging.getLogger(__name__) def truncate_title(title: str, max_words: int = 4) -> str: - """Truncate title to max_words and convert to UPPERCASE""" + """Truncate a title to a maximum number of words and convert to uppercase. + + Takes the first max_words from the title, joins them with spaces, and converts + the result to uppercase. Useful for creating short, prominent text overlays + on images. + + Args: + title: The title text to truncate. Can contain any number of words. + max_words: Maximum number of words to keep from the beginning of the title. + Defaults to 4. + + Returns: + A string containing the first max_words of the title in UPPERCASE format. + If the title has fewer words than max_words, returns the entire title + in uppercase. + + Example: + >>> truncate_title("The Quick Brown Fox Jumps Over", 4) + 'THE QUICK BROWN FOX' + >>> truncate_title("Short Title", 4) + 'SHORT TITLE' + """ words = title.split()[:max_words] return " ".join(words).upper() def slugify(text: str) -> str: - """Convert text to URL-friendly slug""" + """Convert text to a URL-friendly slug format. + + Transforms text into a lowercase slug suitable for use in URLs or filenames. + Replaces all non-alphanumeric characters with hyphens and removes leading/trailing + hyphens. Multiple consecutive non-alphanumeric characters are collapsed into + a single hyphen. + + Args: + text: The text string to convert to a slug. Can contain any characters. + + Returns: + A lowercase string containing only alphanumeric characters and hyphens, + with no leading or trailing hyphens. Multiple consecutive hyphens are + collapsed into a single hyphen. + + Example: + >>> slugify("Hello World! 123") + 'hello-world-123' + >>> slugify(" Test---String ") + 'test-string' + >>> slugify("Special@#$Characters") + 'special-characters' + """ text = text.lower() text = re.sub(r'[^a-z0-9]+', '-', text) text = text.strip('-') @@ -33,17 +76,57 @@ def slugify(text: str) -> str: class ImageGenerator: - """Generate images using fal.ai API""" + """Generate images using fal.ai FLUX.1 schnell API. + + This class handles image generation for projects, including hero images with + text overlays and content images. It manages theme prompts, coordinates with + AI services for prompt generation, and uses the fal.ai API for actual image + creation. Images are generated asynchronously using a thread pool executor + for concurrent processing. + + The generator maintains project-specific theme prompts that are either + retrieved from the database or generated on-demand using AI. Hero images + include text overlays with automatic wrapping and styling, while content + images focus on specific entities and related search terms. + """ def __init__( self, ai_client: AIClient, prompt_manager: PromptManager, - project_repo: ProjectRepository + project_repo: ProjectRepository, + theme_override: Optional[str] = None ): + """Initialize the ImageGenerator with required dependencies. + + Sets up the image generator with AI client for prompt generation, prompt + manager for formatting prompts, and project repository for database access. + Configures the fal.ai API key from environment variables and creates a + thread pool executor for concurrent image generation. + + Args: + ai_client: Client for generating AI completions (used for theme prompts). + prompt_manager: Manager for formatting and retrieving prompt templates. + project_repo: Repository for accessing and updating project data. + + Note: + The fal_client library expects FAL_KEY environment variable, but this + implementation uses FAL_API_KEY. The constructor automatically sets + FAL_KEY from FAL_API_KEY if needed for compatibility. If neither is + set, a warning is logged and image generation will fail. + + Attributes: + ai_client: AI client instance for generating completions. + prompt_manager: Prompt manager for template handling. + project_repo: Project repository for database operations. + fal_key: API key for fal.ai service (from FAL_API_KEY or FAL_KEY env var). + max_concurrent: Maximum number of concurrent image generation tasks (default: 5). + executor: ThreadPoolExecutor for managing concurrent image generation. + """ self.ai_client = ai_client self.prompt_manager = prompt_manager self.project_repo = project_repo + self.theme_override = theme_override # fal_client library expects FAL_KEY, but we use FAL_API_KEY in our env # Set both for compatibility self.fal_key = os.getenv("FAL_API_KEY") or os.getenv("FAL_KEY") @@ -55,15 +138,44 @@ class ImageGenerator: self.executor = ThreadPoolExecutor(max_workers=self.max_concurrent) def get_theme_prompt(self, project_id: int) -> str: - """Get or generate theme prompt for project""" + """Get or generate a theme prompt for a project. + + Retrieves the cached theme prompt from the project if it exists, otherwise + generates a new one using AI based on the project's main keyword, entities, + and related searches. The generated prompt is saved to the database for + future use, ensuring consistency across image generations for the same project. + + Args: + project_id: The unique identifier of the project to get/generate + the theme prompt for. + + Returns: + A string containing the theme prompt that describes the visual style + and theme for images in this project. + + Raises: + ValueError: If the project with the given project_id is not found + in the database. + + Note: + The theme prompt is generated using the "image_theme_generation" prompt + template with the project's main keyword, entities, and related searches. + Once generated, it is persisted to the database and reused for all + subsequent image generations for this project. + """ project = self.project_repo.get_by_id(project_id) if not project: raise ValueError(f"Project {project_id} not found") + # Check for override first (from job.json) + if self.theme_override: + return self.theme_override + + # Then check cached theme in database if project.image_theme_prompt: return project.image_theme_prompt - # Generate theme prompt using AI + # Finally, generate new theme using AI entities_str = ", ".join(project.entities or []) related_str = ", ".join(project.related_searches or []) @@ -95,7 +207,33 @@ class ImageGenerator: width: int, height: int ) -> bytes: - """Overlay text on image using PIL""" + """Overlay text on an image with automatic wrapping and styling. + + Takes an image in bytes format and overlays centered text with automatic + word wrapping, a semi-transparent dark background box for readability, + and white text with a black outline for contrast. The text is positioned + in the center of the image and wrapped to fit within 80% of the image width. + + Args: + image_bytes: Raw image data in bytes format (JPEG, PNG, etc.). + text: The text string to overlay on the image. Will be automatically + wrapped to fit within the image boundaries. + width: The width of the image in pixels. Used for calculating font + size and text positioning. + height: The height of the image in pixels. Used for vertical centering + of the text. + + Returns: + Image bytes in JPEG format with the text overlay applied. The image + is converted to RGB mode if necessary and saved with 95% quality. + + Note: + Font size is calculated as width // 15. If Arial font is not available, + falls back to the default PIL font. The text is rendered with a + semi-transparent black background (alpha=180) and white text with + a black outline for maximum readability across different image backgrounds. + Line spacing is set to 130% of the line height for comfortable reading. + """ img = Image.open(io.BytesIO(image_bytes)) if img.mode != 'RGBA': img = img.convert('RGBA') @@ -183,7 +321,41 @@ class ImageGenerator: width: int = 1280, height: int = 720 ) -> Optional[bytes]: - """Generate hero image with title text""" + """Generate a hero image with title text overlay. + + Creates a hero image using the project's theme prompt via the fal.ai + FLUX.1 schnell API, then overlays the provided title text on the generated + image. The image is generated with optimized settings for fast generation + (4 inference steps) and downloaded from the API response URL. + + The workflow: + 1. Retrieves or generates the project's theme prompt + 2. Calls fal.ai API with the theme prompt to generate the base image + 3. Downloads the generated image from the API response URL + 4. Overlays the title text with automatic wrapping and styling + 5. Returns the final image as JPEG bytes + + Args: + project_id: The unique identifier of the project. Used to retrieve + the project's theme prompt for image generation. + title: The title text to overlay on the hero image. Will be automatically + wrapped and styled for readability. + width: Desired width of the generated image in pixels. Defaults to 1280 + (standard HD width). + height: Desired height of the generated image in pixels. Defaults to 720 + (standard HD height). + + Returns: + Bytes containing the JPEG image data with title overlay, or None if + generation fails. Failure can occur due to missing API key, API errors, + network issues, or malformed API responses. + + Note: + Uses fal.ai FLUX.1 schnell model with 4 inference steps and guidance + scale of 3.5 for fast generation. The API response structure is + handled flexibly to accommodate different response formats. All errors + are logged with detailed information for debugging. + """ if not self.fal_key: logger.error("FAL_API_KEY not set") return None @@ -254,7 +426,42 @@ class ImageGenerator: width: int = 512, height: int = 512 ) -> Optional[bytes]: - """Generate content image with entity and related search""" + """Generate a content image focused on a specific entity and related search. + + Creates a content image that combines the project's theme prompt with + specific focus on an entity and related search term. Unlike hero images, + content images do not include text overlays and are optimized for smaller + dimensions (default 512x512). The prompt explicitly requests a professional + illustration style. + + The workflow: + 1. Retrieves or generates the project's theme prompt + 2. Constructs a focused prompt combining theme, entity, and related search + 3. Calls fal.ai API to generate the image + 4. Downloads and returns the image as JPEG bytes + + Args: + project_id: The unique identifier of the project. Used to retrieve + the project's theme prompt for consistent styling. + entity: The main entity or subject to focus on in the image. This + is incorporated into the generation prompt. + related_search: A related search term to include in the image context. + Combined with the entity to create a more specific image. + width: Desired width of the generated image in pixels. Defaults to 512. + height: Desired height of the generated image in pixels. Defaults to 512. + + Returns: + Bytes containing the JPEG image data, or None if generation fails. + Failure can occur due to missing API key, API errors, network issues, + or malformed API responses. + + Note: + The generated prompt format is: "{theme} Focus on {entity} and + {related_search}, professional illustration style." Uses the same + API settings as hero images (4 inference steps, guidance scale 3.5) + but without text overlay processing. All errors are logged with + detailed information for debugging. + """ if not self.fal_key: logger.error("FAL_API_KEY not set") return None diff --git a/src/generation/job_config.py b/src/generation/job_config.py index d84c40f..dbaa015 100644 --- a/src/generation/job_config.py +++ b/src/generation/job_config.py @@ -120,6 +120,7 @@ class Job: failure_config: Optional[FailureConfig] = None interlinking: Optional[InterlinkingConfig] = None max_workers: Optional[int] = None + image_theme_prompt: Optional[str] = None class JobConfig: @@ -319,6 +320,15 @@ class JobConfig: if not isinstance(max_workers, int) or max_workers < 1: raise ValueError("'max_workers' must be a positive integer") + # Parse image_theme_prompt (optional override) + image_theme_prompt = job_data.get("image_theme_prompt") + if image_theme_prompt is not None: + if not isinstance(image_theme_prompt, str): + raise ValueError("'image_theme_prompt' must be a string") + image_theme_prompt = image_theme_prompt.strip() + if not image_theme_prompt: + raise ValueError("'image_theme_prompt' cannot be empty") + return Job( project_id=project_id, tiers=tiers, @@ -331,7 +341,8 @@ class JobConfig: anchor_text_config=anchor_text_config, failure_config=failure_config, interlinking=interlinking, - max_workers=max_workers + max_workers=max_workers, + image_theme_prompt=image_theme_prompt ) def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig: diff --git a/src/generation/service.py b/src/generation/service.py index 7cb4b46..b697875 100644 --- a/src/generation/service.py +++ b/src/generation/service.py @@ -424,11 +424,15 @@ class ContentGenerator: True if successful, False otherwise """ try: + # Refresh to ensure we have latest content (especially after image reinsertion) content_record = self.content_repo.get_by_id(content_id) if not content_record: print(f"Warning: Content {content_id} not found") return False + # Force refresh from database to get latest content + self.content_repo.session.refresh(content_record) + if not meta_description: text = re.sub(r'<[^>]+>', '', content_record.content) text = unescape(text) @@ -452,11 +456,19 @@ class ContentGenerator: content_record.template_used = template_name self.content_repo.update(content_record) + # Verify it was saved + self.content_repo.session.refresh(content_record) + if content_record.template_used != template_name: + print(f"ERROR: template_used not saved! Expected '{template_name}', got '{content_record.template_used}'") + return False + print(f"Applied template '{template_name}' to content {content_id}") return True except Exception as e: print(f"Error applying template to content {content_id}: {e}") + import traceback + traceback.print_exc() return False def _clean_markdown_fences(self, content: str) -> str: diff --git a/src/templating/templates/basic.html b/src/templating/templates/basic.html index 1138e2c..1a97033 100644 --- a/src/templating/templates/basic.html +++ b/src/templating/templates/basic.html @@ -100,11 +100,26 @@ background-color: #e7f1ff; text-decoration: none; } + img { + max-width: 100%; + height: auto; + display: block; + margin: 1.5rem auto; + border-radius: 4px; + box-shadow: 0 2px 8px rgba(0,0,0,0.1); + } + h1 + img { + margin-top: 1rem; + margin-bottom: 2rem; + } @media (max-width: 768px) { nav ul { flex-wrap: wrap; gap: 1rem; } + img { + margin: 1rem auto; + } } diff --git a/src/templating/templates/classic.html b/src/templating/templates/classic.html index fed126e..868ab37 100644 --- a/src/templating/templates/classic.html +++ b/src/templating/templates/classic.html @@ -106,6 +106,19 @@ background-color: #f9f6f2; color: #5d4a37; } + img { + max-width: 100%; + height: auto; + display: block; + margin: 2rem auto; + border-radius: 4px; + border: 1px solid #e0d7c9; + box-shadow: 0 2px 6px rgba(0,0,0,0.08); + } + h1 + img { + margin-top: 1.5rem; + margin-bottom: 2.5rem; + } @media (max-width: 768px) { body { padding: 10px; @@ -132,6 +145,9 @@ flex-wrap: wrap; gap: 1rem; } + img { + margin: 1.5rem auto; + } } diff --git a/src/templating/templates/minimal.html b/src/templating/templates/minimal.html index 84bfe72..8ab29a9 100644 --- a/src/templating/templates/minimal.html +++ b/src/templating/templates/minimal.html @@ -91,6 +91,16 @@ nav a:hover { border-bottom-color: #000; } + img { + max-width: 100%; + height: auto; + display: block; + margin: 2rem auto; + } + h1 + img { + margin-top: 1.5rem; + margin-bottom: 2.5rem; + } @media (max-width: 768px) { body { padding: 20px 15px; @@ -108,6 +118,9 @@ flex-wrap: wrap; gap: 1rem; } + img { + margin: 1.5rem auto; + } } diff --git a/src/templating/templates/modern.html b/src/templating/templates/modern.html index 46674fd..e8c1240 100644 --- a/src/templating/templates/modern.html +++ b/src/templating/templates/modern.html @@ -115,6 +115,18 @@ text-decoration: none; transform: translateY(-2px); } + img { + max-width: 100%; + height: auto; + display: block; + margin: 2rem auto; + border-radius: 8px; + box-shadow: 0 4px 12px rgba(0,0,0,0.15); + } + h1 + img { + margin-top: 1.5rem; + margin-bottom: 2.5rem; + } @media (max-width: 768px) { body { padding: 20px 10px; @@ -138,6 +150,10 @@ flex-wrap: wrap; gap: 1rem; } + img { + margin: 1.5rem auto; + border-radius: 6px; + } }