Big-Link-Man/docs/stories/story-7.1-image-generation.md

15 KiB

Story 7.1: Generate and Insert Images into Articles

Status

PLANNING - Ready for Implementation

Story

As a developer, I want to automatically generate hero images for all article tiers and content images for T1 articles using AI-powered image generation, so that articles are visually enhanced without manual image creation.

Context

  • Epic 7 introduces AI-powered image generation to enhance article quality
  • fal.ai FLUX.1 schnell API provides fast text-to-image generation (<3 seconds)
  • Hero images needed for all tiers to improve visual appeal
  • Content images needed for T1 articles (1-3 images) to break up text
  • Images should be thematically consistent across a project
  • Images are optional enhancements (failures shouldn't block article generation)
  • Job JSON configuration allows per-tier customization

Acceptance Criteria

Core Image Generation Functionality

  • Hero images generated for all tiers (if enabled in job config)
  • Content images generated for T1 articles (1-3 by default, configurable)
  • Images generated using fal.ai FLUX.1 schnell API
  • Theme prompt generated once per project using AI, cached in database
  • Images uploaded to storage zone images/ subdirectory
  • Images inserted into HTML content before saving to database
  • Alt text includes 3 random entities and 2 random related searches
  • Graceful failure handling (log errors, continue without images)

Hero Image Specifications

  • Default size: 1280x720 (configurable per tier in job JSON)
  • Title truncated to 3-4 words, converted to UPPERCASE
  • Prompt format: {theme_prompt} Text: '{TITLE_UPPERCASE}' in clean simple uppercase letters, positioned in middle of image.
  • Naming: {main-keyword}.jpg (slugified)
  • Placement: Inserted immediately after first <h1> tag
  • Format: Always JPG
  • Title truncation: Extract first 3-4 words from title, convert to UPPERCASE

Content Image Specifications

  • Default size: 512x512 (configurable per tier in job JSON)
  • Default count: T1 = 1-3 images (configurable min/max in job JSON)
  • Prompt format: {theme_prompt} Focus on {entity} and {related_search}, professional illustration style.
  • Naming: {main-keyword}-{random-entity}-{random-related-search}.jpg (all slugified)
  • Placement: Distributed after H2 sections (one image per H2, evenly distributed)
  • Format: Always JPG
  • Alt text: alt="entity1 related_search1 entity2 related_search2 entity3" (3 entities, 2 related searches)

Theme Prompt Generation

  • Generated once per project using AI (OpenRouter)
  • Cached in Project.image_theme_prompt field (TEXT, nullable)
  • Prompt template: src/generation/prompts/image_theme_generation.json
  • Reused for all images in the project for visual consistency
  • Example: "Professional industrial scene, modern manufacturing equipment, clean factory environment, high-quality industrial photography, dramatic lighting."

Job JSON Configuration

  • New image_config section per tier in job JSON
  • Structure:
    {
      "tiers": {
        "tier1": {
          "image_config": {
            "hero": {
              "width": 1280,
              "height": 720
            },
            "content": {
              "min_num_images": 1,
              "max_num_images": 3,
              "width": 512,
              "height": 512
            }
          }
        }
      }
    }
    
  • Defaults if not specified:
    • Hero: 1280x720 (all tiers)
    • Content: T1 = 1-3, T2/T3 = 0-0 (disabled)
  • If image_config missing entirely: Use tier defaults
  • If image_config present but hero/content missing: Use defaults for missing keys

Database Schema Updates

  • Add image_theme_prompt (TEXT, nullable) to Project model
  • Add hero_image_url (TEXT, nullable) to GeneratedContent model
  • Add content_images (JSON, nullable) to GeneratedContent model
    • Format: ["url1", "url2", "url3"] (array of image URLs)
  • Migration script required

Image Storage

  • Images uploaded to images/ subdirectory in storage zone
  • Check if file exists before upload (use existing if collision)
  • Return full public URL for database storage
  • Handle file collisions gracefully (use existing file)

Error Handling

  • Log errors with prompt used for debugging
  • Continue article generation if image generation fails
  • Images are "nice to have", not required
  • Max 5 concurrent fal.ai API calls (same pattern as OpenRouter)
  • Retry logic similar to OpenRouter (3 attempts with backoff)

Tasks / Subtasks

1. Create Image Generation Module

Effort: 3 story points

  • Create src/generation/image_generator.py module
  • Implement ImageGenerator class:
    • get_theme_prompt(project_id) -> str (get or generate)
    • generate_hero_image(project_id, title, width, height) -> bytes
    • generate_content_image(project_id, entity, related_search, width, height) -> bytes
  • Integrate fal.ai API client (fal-client package)
  • Use FAL_API_KEY from .env for authentication (note: env var is FAL_API_KEY, not FAL_KEY)
  • Implement concurrency control (max 5 concurrent calls)
  • Implement retry logic (3 attempts with exponential backoff)
  • Log errors with prompts used

2. Create Theme Prompt Generation

Effort: 2 story points

  • Create src/generation/prompts/image_theme_generation.json prompt template
  • Implement theme prompt generation using AI (OpenRouter)
  • Cache theme prompt in Project.image_theme_prompt field
  • Generate only once per project (reuse for all articles)
  • Update Project model with image_theme_prompt field
  • Create migration script for database schema update

3. Extend Job Configuration

Effort: 2 story points

  • Update TierConfig in src/generation/job_config.py:
    • Parse image_config from JSON
    • Provide defaults if missing
    • Validate min/max ranges
  • Support hero and content sub-configs
  • Handle missing image_config gracefully (use tier defaults)

4. Implement Image Upload & Storage

Effort: 2 story points

  • Extend BunnyStorageClient or create image upload method
  • Upload to images/ subdirectory in storage zone
  • Check if file exists before upload (use existing if collision)
  • Generate full public URL for database storage
  • Handle file naming (slugify main_keyword, entities, related_searches)
  • Return URL for database storage

5. Implement HTML Image Insertion

Effort: 3 story points

  • Create src/generation/image_injection.py module
  • Implement insert_hero_after_h1(html, hero_url, alt_text) -> str
  • Implement insert_content_images_after_h2s(html, image_urls, alt_texts) -> str
  • Parse HTML to find H1/H2 tags
  • Distribute content images evenly across H2 sections
  • Generate alt text: 3 random entities + 2 random related searches
  • Preserve HTML structure and formatting

6. Integrate into Article Generation Flow

Effort: 3 story points

  • Update BatchProcessor._generate_single_article():
    • Generate hero image after title is available (all tiers)
    • Generate content images after content generation (T1 only)
    • Insert images into HTML before saving
  • Update BatchProcessor._generate_single_article_thread_safe() (same changes)
  • Handle tier-specific image config
  • Store image URLs in database (hero_image_url, content_images JSON)
  • Continue on image generation failure (log but don't block)

7. Database Schema Updates

Effort: 2 story points

  • Update Project model: Add image_theme_prompt field
  • Update GeneratedContent model:
    • Add hero_image_url field
    • Add content_images field (JSON)
  • Create migration script scripts/migrate_add_image_fields.py
  • Update repository interfaces if needed

8. Title Truncation Logic

Effort: 1 story point

  • Implement title truncation to 3-4 words (extract first 3-4 words)
  • Convert to UPPERCASE
  • Handle edge cases (short titles, special characters)
  • Use in hero image prompt generation

9. Environment Variable Setup

Effort: 1 story point

  • Add FAL_API_KEY to env.example
  • Document in README or setup guide
  • Validate key exists before image generation

10. Unit Tests

Effort: 3 story points

  • Test ImageGenerator class (mock fal.ai API)
  • Test theme prompt generation and caching
  • Test title truncation logic
  • Test HTML image insertion
  • Test job config parsing
  • Test error handling and retry logic
  • Achieve >80% code coverage

11. Integration Tests

Effort: 2 story points

  • Test end-to-end image generation for small batch
  • Test hero image generation (all tiers)
  • Test content image generation (T1 only)
  • Test image insertion into HTML
  • Test database storage of image URLs
  • Test file collision handling
  • Test graceful failure (API errors)

Technical Notes

fal.ai API Integration

Installation:

pip install fal-client

Environment Variable:

FAL_API_KEY=your_fal_api_key_here

Note: The environment variable is FAL_API_KEY (not FAL_KEY).

API Usage:

import fal_client

result = fal_client.subscribe(
    "fal-ai/flux-1/schnell",
    arguments={
        "prompt": "Your prompt here",
        "image_size": {"width": 1280, "height": 720},
        "num_inference_steps": 4,
        "guidance_scale": 3.5,
        "output_format": "jpeg"
    },
    with_logs=True
)

# result["images"][0]["url"] contains the image URL
# Download image from URL or use data URI if sync_mode=True

Concurrency:

  • Max 5 concurrent calls (same pattern as OpenRouter)
  • Use ThreadPoolExecutor or similar
  • Queue requests if limit exceeded

Theme Prompt Generation

Prompt Template:

{
  "system_message": "You are an expert at creating visual style descriptions for professional images.",
  "user_prompt": "Create a concise visual style description (2-3 sentences) for professional images related to: {main_keyword}\n\nEntities: {entities}\nRelated searches: {related_searches}\n\nReturn only the style description, focusing on visual elements like lighting, environment, color scheme, and overall aesthetic. This will be used as a base for all images in this project.\n\nExample format: 'Professional industrial scene, modern manufacturing equipment, clean factory environment, high-quality industrial photography, dramatic lighting.'"
}

Caching:

  • Store in Project.image_theme_prompt field
  • Generate once per project
  • Reuse for all articles in project

Image Naming Convention

Hero Image:

  • Format: {main-keyword}.jpg
  • Example: shaft-machining.jpg
  • Slugify main_keyword

Content Images:

  • Format: {main-keyword}-{entity}-{related-search}.jpg
  • Example: shaft-machining-cnc-lathe-precision-turning.jpg
  • Pick random entity from project.entities
  • Pick random related_search from project.related_searches
  • Slugify all parts
  • Handle missing entities/related_searches gracefully

HTML Insertion Logic

Hero Image:

  • Find first <h1> tag in HTML
  • Insert <img> tag immediately after closing </h1>
  • Format: <img src="{hero_url}" alt="{alt_text}" />

Content Images:

  • Find all <h2> sections
  • Distribute images evenly across H2s
  • If 1 image: after first H2
  • If 2 images: after first and second H2
  • If 3 images: after first, second, third H2
  • Insert after closing </h2>, before next content

Alt Text Generation:

  • Pick 3 random entities from project.entities
  • Pick 2 random related_searches from project.related_searches
  • Format: alt="entity1 related_search1 entity2 related_search2 entity3"
  • Handle missing entities/searches gracefully

Job JSON Example

{
  "project_id": 1,
  "tiers": {
    "tier1": {
      "count": 5,
      "image_config": {
        "hero": {
          "width": 1280,
          "height": 720
        },
        "content": {
          "min_num_images": 1,
          "max_num_images": 3,
          "width": 512,
          "height": 512
        }
      }
    },
    "tier2": {
      "count": 20,
      "image_config": {
        "hero": {
          "width": 1280,
          "height": 720
        }
      }
    }
  }
}

Database Schema Updates

-- Add theme prompt to projects
ALTER TABLE projects ADD COLUMN image_theme_prompt TEXT NULL;

-- Add image fields to generated_content
ALTER TABLE generated_content ADD COLUMN hero_image_url TEXT NULL;
ALTER TABLE generated_content ADD COLUMN content_images JSON NULL;

Dependencies

  • fal.ai API account and FAL_API_KEY
  • fal-client Python package
  • OpenRouter API for theme prompt generation
  • Bunny.net storage for image hosting
  • Story 4.1: Deployment infrastructure (for image uploads)

Future Considerations

  • Image optimization (compression, WebP conversion)
  • CDN caching for images
  • Image format conversion (PNG, WebP options)
  • Custom image styles per project
  • Image regeneration on content updates
  • Image analytics (views, engagement)

Technical Debt Created

  • Image optimization deferred (current: full-size JPGs)
  • No CDN cache purging for images
  • No image format conversion (always JPG)
  • Theme prompt regeneration not implemented (manual update required)
  • Image deletion not implemented (orphaned images may accumulate)

Total Effort

24 story points

Effort Breakdown

  1. Image Generation Module (3 points)
  2. Theme Prompt Generation (2 points)
  3. Job Configuration Extension (2 points)
  4. Image Upload & Storage (2 points)
  5. HTML Image Insertion (3 points)
  6. Article Generation Integration (3 points)
  7. Database Schema Updates (2 points)
  8. Title Truncation Logic (1 point)
  9. Environment Variable Setup (1 point)
  10. Unit Tests (3 points)
  11. Integration Tests (2 points)

Questions & Clarifications

Question 1: Image Format

Status: ✓ RESOLVED

Decision: Always JPG

  • Consistent format across all images
  • Good balance of quality and file size
  • Future: Could add format options in job config

Question 2: Hero Image Title Truncation

Status: ✓ RESOLVED

Decision: Truncate to 3-4 words, UPPERCASE

  • Prevents overly long text in images
  • Improves readability
  • Example: "Overview of Shaft Machining" → "OVERVIEW OF SHAFT MACHINING"

Question 3: Content Image Count

Status: ✓ RESOLVED

Decision: T1 default 1-3 images, configurable per tier

  • Job JSON allows fine-grained control
  • Defaults: T1 = 1-3, T2/T3 = 0-0 (disabled)
  • Can override in job config

Question 4: Image Failure Handling

Status: ✓ RESOLVED

Decision: Log errors, continue without images

  • Images are enhancements, not required
  • Article generation continues even if images fail
  • Log includes prompt used for debugging

Notes

  • Images enhance SEO and user engagement
  • fal.ai provides fast generation (<3 seconds per image)
  • Theme consistency ensures visual coherence across project
  • Job JSON provides flexibility for different use cases
  • Graceful degradation if image generation fails
  • All API keys from .env file only