Big-Link-Man/INTEGRATION_GAP_VISUAL.md

12 KiB

Visual: The Integration Gap

What Currently Happens

┌─────────────────────────────────────────────────────────────┐
│  uv run python main.py generate-batch --job-file jobs/x.json │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│               BatchProcessor.process_job()                   │
│                                                              │
│  For each tier (tier1, tier2, tier3):                       │
│    For each article (1 to N):                               │
│      ┌──────────────────────────────────┐                   │
│      │  1. Generate title               │                   │
│      │  2. Generate outline             │                   │
│      │  3. Generate content             │                   │
│      │  4. Augment if too short         │                   │
│      │  5. Save to database             │                   │
│      └──────────────────────────────────┘                   │
│                                                              │
│  ⚠️  STOPS HERE! ⚠️                                          │
└─────────────────────────────────────────────────────────────┘

Result in database:
┌──────────────────────────────────────────────────────────────┐
│ generated_content table:                                     │
│  - Raw HTML (no links)                                       │
│  - No site_deployment_id (most articles)                     │
│  - No final URL                                              │
│  - No formatted_html                                         │
│                                                              │
│ article_links table:                                         │
│  - EMPTY (no records)                                        │
└──────────────────────────────────────────────────────────────┘

What SHOULD Happen

┌─────────────────────────────────────────────────────────────┐
│  uv run python main.py generate-batch --job-file jobs/x.json │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│               BatchProcessor.process_job()                   │
│                                                              │
│  For each tier (tier1, tier2, tier3):                       │
│    For each article (1 to N):                               │
│      ┌──────────────────────────────────┐                   │
│      │  1. Generate title               │                   │
│      │  2. Generate outline             │                   │
│      │  3. Generate content             │                   │
│      │  4. Augment if too short         │                   │
│      │  5. Save to database             │                   │
│      └──────────────────────────────────┘                   │
│                                                              │
│    ✨ NEW: After all articles in tier generated ✨          │
│      ┌──────────────────────────────────┐                   │
│      │  6. Assign sites (Story 3.1)     │ ← MISSING         │
│      │  7. Generate URLs (Story 3.1)    │ ← MISSING         │
│      │  8. Find tiered links (3.2)      │ ← MISSING         │
│      │  9. Inject interlinks (3.3)      │ ← MISSING         │
│      │ 10. Apply templates              │ ← MISSING         │
│      └──────────────────────────────────┘                   │
└─────────────────────────────────────────────────────────────┘

Result in database:
┌──────────────────────────────────────────────────────────────┐
│ generated_content table:                                     │
│  ✅ Final HTML with all links injected                       │
│  ✅ site_deployment_id assigned                              │
│  ✅ Final URL generated                                      │
│  ✅ formatted_html with template applied                     │
│                                                              │
│ article_links table:                                         │
│  ✅ Tiered links (T1→money site, T2→T1)                      │
│  ✅ Homepage links (all→/index.html)                         │
│  ✅ See Also links (all→all in batch)                        │
└──────────────────────────────────────────────────────────────┘

The Gap in Code

Current Code Structure

# src/generation/batch_processor.py

class BatchProcessor:
    def _process_tier(self, project_id, tier_name, tier_config, ...):
        """Process all articles for a tier"""
        
        # Generate each article
        for article_num in range(1, tier_config.count + 1):
            self._generate_single_article(...)
            self.stats["generated_articles"] += 1
        
        # ⚠️ Method ends here!
        # Nothing happens after article generation

What Needs to Be Added

# src/generation/batch_processor.py

class BatchProcessor:
    def _process_tier(self, project_id, tier_name, tier_config, ...):
        """Process all articles for a tier"""
        
        # Generate each article
        for article_num in range(1, tier_config.count + 1):
            self._generate_single_article(...)
            self.stats["generated_articles"] += 1
        
        # ✨ NEW: Post-processing
        click.echo(f"  {tier_name}: Post-processing {tier_config.count} articles...")
        self._post_process_tier(project_id, tier_name, job, debug)
    
    def _post_process_tier(self, project_id, tier_name, job, debug):
        """Apply URL generation, interlinking, and templating"""
        
        # Get all articles for this tier
        content_records = self.content_repo.get_by_project_and_tier(
            project_id, tier_name, status=["generated", "augmented"]
        )
        
        if not content_records:
            click.echo(f"    No articles to post-process")
            return
        
        project = self.project_repo.get_by_id(project_id)
        
        # Step 1: Assign sites (Story 3.1)
        # (Site assignment might already be done via deployment_targets)
        
        # Step 2: Generate URLs (Story 3.1)
        from src.generation.url_generator import generate_urls_for_batch
        click.echo(f"    Generating URLs...")
        article_urls = generate_urls_for_batch(content_records, self.site_deployment_repo)
        
        # Step 3: Find tiered links (Story 3.2)
        from src.interlinking.tiered_links import find_tiered_links
        click.echo(f"    Finding tiered links...")
        tiered_links = find_tiered_links(
            content_records, job, self.project_repo, 
            self.content_repo, self.site_deployment_repo
        )
        
        # Step 4: Inject interlinks (Story 3.3)
        from src.interlinking.content_injection import inject_interlinks
        from src.database.repositories import ArticleLinkRepository
        click.echo(f"    Injecting interlinks...")
        
        session = self.content_repo.session  # Use same session
        link_repo = ArticleLinkRepository(session)
        inject_interlinks(
            content_records, article_urls, tiered_links,
            project, job, self.content_repo, link_repo
        )
        
        # Step 5: Apply templates
        click.echo(f"    Applying templates...")
        for content in content_records:
            self.generator.apply_template(content.id)
        
        click.echo(f"    Post-processing complete: {len(content_records)} articles ready")

Files That Need Changes

src/generation/batch_processor.py
  ├─ Add imports at top
  ├─ Add call to _post_process_tier() in _process_tier()
  └─ Add new method _post_process_tier()

src/database/repositories.py
  └─ May need to add get_by_project_and_tier() if it doesn't exist

Why Tests Still Pass

┌─────────────────────────────────────────┐
│  Unit Tests                             │
│  ✅ Test inject_interlinks() directly   │
│  ✅ Test find_tiered_links() directly   │
│  ✅ Test generate_urls_for_batch()      │
│                                         │
│  These call the functions directly,     │
│  so they work perfectly!                │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Integration Tests                      │
│  ✅ Create test database                │
│  ✅ Call functions in sequence          │
│  ✅ Verify results                      │
│                                         │
│  These simulate the workflow manually,  │
│  so they work perfectly!                │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│  Real CLI Usage                         │
│  ✅ Generates articles                  │
│  ❌ Never calls Story 3.1-3.3 functions │
│  ❌ Articles incomplete                 │
│                                         │
│  This is missing the integration!       │
└─────────────────────────────────────────┘

Summary

The Analogy:

Imagine you built a perfect car engine:

  • All parts work perfectly
  • Each part tested individually
  • Each part fits together

But you never installed it in the car

That's the current state:

  • Story 3.3 functions work perfectly
  • Tests prove it works
  • But the CLI never calls them
  • So users get articles with no links

The Fix: Install the engine (add 50 lines to BatchProcessor)

Time: 30-60 minutes

Priority: High (if deploying), Medium (if still developing)