Big-Link-Man/CLI_INTEGRATION_EXPLANATION.md

7.7 KiB

CLI Integration Explanation - Story 3.3

The Problem

Story 3.3's inject_interlinks() function (and Stories 3.1-3.2) are implemented and tested perfectly, but they're never called in the actual batch generation workflow.

Current Workflow

When you run:

uv run python main.py generate-batch --job-file jobs/example.json

Here's what actually happens:

Step-by-Step Current Flow

1. CLI Command (src/cli/commands.py)
   └─> generate_batch() function called
       └─> Creates BatchProcessor
           └─> BatchProcessor.process_job()
   
2. BatchProcessor.process_job() (src/generation/batch_processor.py)
   └─> Reads job file
   └─> For each job:
       └─> _process_single_job()
           └─> Validates deployment targets
           └─> For each tier (tier1, tier2, tier3):
               └─> _process_tier()
   
3. _process_tier()
   └─> For each article (1 to count):
       └─> _generate_single_article()
           ├─> Generate title
           ├─> Generate outline  
           ├─> Generate content
           ├─> Augment if needed
           └─> SAVE to database
   
4. END! ⚠️
   
   Nothing happens after articles are generated!
   No URLs, no tiered links, no interlinking!

What's Missing

After all articles are generated for a tier, we need to add Story 3.1-3.3:

# THIS CODE DOES NOT EXIST YET!
# Needs to be added at the end of _process_tier() or _process_single_job()

# 1. Get all generated content for this batch
content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name)

# 2. Assign sites (Story 3.1)
from src.generation.site_assignment import assign_sites_to_batch
assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword)

# 3. Generate URLs (Story 3.1) 
from src.generation.url_generator import generate_urls_for_batch
article_urls = generate_urls_for_batch(content_records, site_repo)

# 4. Find tiered links (Story 3.2)
from src.interlinking.tiered_links import find_tiered_links
tiered_links = find_tiered_links(
    content_records, job_config, project_repo, content_repo, site_repo
)

# 5. Inject interlinks (Story 3.3)
from src.interlinking.content_injection import inject_interlinks
from src.database.repositories import ArticleLinkRepository
link_repo = ArticleLinkRepository(session)
inject_interlinks(
    content_records, article_urls, tiered_links, 
    project, job_config, content_repo, link_repo
)

# 6. Apply templates (existing functionality)
for content in content_records:
    content_generator.apply_template(content.id)

Why This Matters

Current State

✓ Articles are generated
✗ Articles have NO internal links
✗ Articles have NO tiered links
✗ Articles have NO "See Also" section
✗ Articles have NO final URLs assigned
✗ Templates are NOT applied

Result: Articles sit in database with raw HTML, no links, unusable for deployment

With Integration

✓ Articles are generated
✓ Sites are assigned to articles
✓ Final URLs are generated
✓ Tiered links are found
✓ All links are injected
✓ Templates are applied
✓ Articles are ready for deployment

Result: Complete, interlinked articles ready for Story 4.x deployment

Where to Add Integration

Add the integration code at line 162 (after the article generation loop):

def _process_tier(self, project_id, tier_name, tier_config, ...):
    # ... existing article generation loop ...
    
    # NEW: Post-generation interlinking
    click.echo(f"  {tier_name}: Injecting interlinks for {tier_config.count} articles...")
    self._inject_tier_interlinks(project_id, tier_name, job, debug)

Then create new method:

def _inject_tier_interlinks(self, project_id, tier_name, job, debug):
    """Inject interlinks for all articles in a tier"""
    # Get all articles for this tier
    content_records = self.content_repo.get_by_project_and_tier(
        project_id, tier_name
    )
    
    if not content_records:
        click.echo(f"    Warning: No articles found for {tier_name}")
        return
    
    # Steps 1-6 from above...

Option 2: End of _process_single_job()

Add integration after ALL tiers are generated (processes entire job at once):

def _process_single_job(self, job, job_idx, debug, continue_on_error):
    # ... existing tier processing ...
    
    # NEW: Process all tiers together
    click.echo(f"\nPost-processing: Injecting interlinks...")
    for tier_name in job.tiers.keys():
        self._inject_tier_interlinks(job.project_id, tier_name, job, debug)

Why It Wasn't Integrated Yet

Looking at the story implementations, it appears:

  1. Story 3.1 (URL Generation) - Functions exist but not integrated
  2. Story 3.2 (Tiered Links) - Functions exist but not integrated
  3. Story 3.3 (Content Injection) - Functions exist but not integrated

This suggests the stories focused on building the functionality with the expectation that Story 4.x (Deployment) would integrate everything together.

Impact of Missing Integration

Tests Still Pass ✓

  • Unit tests test functions in isolation
  • Integration tests use the functions directly
  • All 42 tests pass because the functions work perfectly

But Real Usage Fails ✗

When you actually run generate-batch:

  • Articles are generated
  • They're saved to database
  • But they have no links, no URLs, nothing
  • Story 4.x deployment would fail because articles aren't ready

Effort to Fix

Time Estimate: 30-60 minutes

Tasks:

  1. Add imports to batch_processor.py (2 minutes)
  2. Create _inject_tier_interlinks() method (15 minutes)
  3. Add call at end of _process_tier() (2 minutes)
  4. Test with real job file (10 minutes)
  5. Debug any issues (10-20 minutes)

Complexity: Low - just wiring existing functions together

Testing the Integration

After adding integration:

# 1. Run batch generation
uv run python main.py generate-batch \
  --job-file jobs/test_small.json \
  --username admin \
  --password yourpass

# 2. Check database for links
uv run python -c "
from src.database.session import db_manager
from src.database.repositories import ArticleLinkRepository

session = db_manager.get_session()
link_repo = ArticleLinkRepository(session)
links = link_repo.get_all()
print(f'Total links: {len(links)}')
for link in links[:5]:
    print(f'  {link.link_type}: {link.anchor_text} -> {link.to_url or link.to_content_id}')
session.close()
"

# 3. Verify articles have links in content
uv run python -c "
from src.database.session import db_manager
from src.database.repositories import GeneratedContentRepository

session = db_manager.get_session()
content_repo = GeneratedContentRepository(session)
articles = content_repo.get_all(limit=1)
if articles:
    print('Sample article content:')
    print(articles[0].content[:500])
    print(f'Contains links: {\"<a href=\" in articles[0].content}')
    print(f'Has See Also: {\"See Also\" in articles[0].content}')
session.close()
"

Summary

The Good News:

  • All Story 3.3 code is perfect ✓
  • Tests prove functionality works ✓
  • No bugs, no issues ✓

The Bad News:

  • Code isn't wired into CLI workflow ✗
  • Running generate-batch doesn't use Story 3.1-3.3 ✗
  • Articles are incomplete without integration ✗

The Fix:

  • Add ~50 lines of integration code
  • Wire existing functions into BatchProcessor
  • Test with real job file
  • Done! ✓

When to Fix:

  • Now (before Story 4.x) - RECOMMENDED
  • Or during Story 4.x (when deployment needs links)
  • Not urgent if not deploying yet

This explains why all tests pass but the feature "isn't done" yet - the plumbing exists, it's just not connected to the main pipeline.