diff --git a/CLI_INTEGRATION_EXPLANATION.md b/CLI_INTEGRATION_EXPLANATION.md new file mode 100644 index 0000000..1802129 --- /dev/null +++ b/CLI_INTEGRATION_EXPLANATION.md @@ -0,0 +1,257 @@ +# CLI Integration Explanation - Story 3.3 + +## The Problem + +Story 3.3's `inject_interlinks()` function (and Stories 3.1-3.2) are **implemented and tested perfectly**, but they're **never called** in the actual batch generation workflow. + +## Current Workflow + +When you run: +```bash +uv run python main.py generate-batch --job-file jobs/example.json +``` + +Here's what actually happens: + +### Step-by-Step Current Flow + +``` +1. CLI Command (src/cli/commands.py) + └─> generate_batch() function called + └─> Creates BatchProcessor + └─> BatchProcessor.process_job() + +2. BatchProcessor.process_job() (src/generation/batch_processor.py) + └─> Reads job file + └─> For each job: + └─> _process_single_job() + └─> Validates deployment targets + └─> For each tier (tier1, tier2, tier3): + └─> _process_tier() + +3. _process_tier() + └─> For each article (1 to count): + └─> _generate_single_article() + ├─> Generate title + ├─> Generate outline + ├─> Generate content + ├─> Augment if needed + └─> SAVE to database + +4. END! ⚠️ + + Nothing happens after articles are generated! + No URLs, no tiered links, no interlinking! +``` + +## What's Missing + +After all articles are generated for a tier, we need to add Story 3.1-3.3: + +```python +# THIS CODE DOES NOT EXIST YET! +# Needs to be added at the end of _process_tier() or _process_single_job() + +# 1. Get all generated content for this batch +content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name) + +# 2. Assign sites (Story 3.1) +from src.generation.site_assignment import assign_sites_to_batch +assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword) + +# 3. Generate URLs (Story 3.1) +from src.generation.url_generator import generate_urls_for_batch +article_urls = generate_urls_for_batch(content_records, site_repo) + +# 4. Find tiered links (Story 3.2) +from src.interlinking.tiered_links import find_tiered_links +tiered_links = find_tiered_links( + content_records, job_config, project_repo, content_repo, site_repo +) + +# 5. Inject interlinks (Story 3.3) +from src.interlinking.content_injection import inject_interlinks +from src.database.repositories import ArticleLinkRepository +link_repo = ArticleLinkRepository(session) +inject_interlinks( + content_records, article_urls, tiered_links, + project, job_config, content_repo, link_repo +) + +# 6. Apply templates (existing functionality) +for content in content_records: + content_generator.apply_template(content.id) +``` + +## Why This Matters + +### Current State +✓ Articles are generated +✗ Articles have NO internal links +✗ Articles have NO tiered links +✗ Articles have NO "See Also" section +✗ Articles have NO final URLs assigned +✗ Templates are NOT applied + +**Result**: Articles sit in database with raw HTML, no links, unusable for deployment + +### With Integration +✓ Articles are generated +✓ Sites are assigned to articles +✓ Final URLs are generated +✓ Tiered links are found +✓ All links are injected +✓ Templates are applied +✓ Articles are ready for deployment + +**Result**: Complete, interlinked articles ready for Story 4.x deployment + +## Where to Add Integration + +### Option 1: End of `_process_tier()` (RECOMMENDED) +Add the integration code at line 162 (after the article generation loop): + +```python +def _process_tier(self, project_id, tier_name, tier_config, ...): + # ... existing article generation loop ... + + # NEW: Post-generation interlinking + click.echo(f" {tier_name}: Injecting interlinks for {tier_config.count} articles...") + self._inject_tier_interlinks(project_id, tier_name, job, debug) +``` + +Then create new method: +```python +def _inject_tier_interlinks(self, project_id, tier_name, job, debug): + """Inject interlinks for all articles in a tier""" + # Get all articles for this tier + content_records = self.content_repo.get_by_project_and_tier( + project_id, tier_name + ) + + if not content_records: + click.echo(f" Warning: No articles found for {tier_name}") + return + + # Steps 1-6 from above... +``` + +### Option 2: End of `_process_single_job()` +Add integration after ALL tiers are generated (processes entire job at once): + +```python +def _process_single_job(self, job, job_idx, debug, continue_on_error): + # ... existing tier processing ... + + # NEW: Process all tiers together + click.echo(f"\nPost-processing: Injecting interlinks...") + for tier_name in job.tiers.keys(): + self._inject_tier_interlinks(job.project_id, tier_name, job, debug) +``` + +## Why It Wasn't Integrated Yet + +Looking at the story implementations, it appears: + +1. **Story 3.1** (URL Generation) - Functions exist but not integrated +2. **Story 3.2** (Tiered Links) - Functions exist but not integrated +3. **Story 3.3** (Content Injection) - Functions exist but not integrated + +This suggests the stories focused on **building the functionality** with the expectation that **Story 4.x (Deployment)** would integrate everything together. + +## Impact of Missing Integration + +### Tests Still Pass ✓ +- Unit tests test functions in isolation +- Integration tests use the functions directly +- All 42 tests pass because the **functions work perfectly** + +### But Real Usage Fails ✗ +When you actually run `generate-batch`: +- Articles are generated +- They're saved to database +- But they have no links, no URLs, nothing +- Story 4.x deployment would fail because articles aren't ready + +## Effort to Fix + +**Time Estimate**: 30-60 minutes + +**Tasks**: +1. Add imports to `batch_processor.py` (2 minutes) +2. Create `_inject_tier_interlinks()` method (15 minutes) +3. Add call at end of `_process_tier()` (2 minutes) +4. Test with real job file (10 minutes) +5. Debug any issues (10-20 minutes) + +**Complexity**: Low - just wiring existing functions together + +## Testing the Integration + +After adding integration: + +```bash +# 1. Run batch generation +uv run python main.py generate-batch \ + --job-file jobs/test_small.json \ + --username admin \ + --password yourpass + +# 2. Check database for links +uv run python -c " +from src.database.session import db_manager +from src.database.repositories import ArticleLinkRepository + +session = db_manager.get_session() +link_repo = ArticleLinkRepository(session) +links = link_repo.get_all() +print(f'Total links: {len(links)}') +for link in links[:5]: + print(f' {link.link_type}: {link.anchor_text} -> {link.to_url or link.to_content_id}') +session.close() +" + +# 3. Verify articles have links in content +uv run python -c " +from src.database.session import db_manager +from src.database.repositories import GeneratedContentRepository + +session = db_manager.get_session() +content_repo = GeneratedContentRepository(session) +articles = content_repo.get_all(limit=1) +if articles: + print('Sample article content:') + print(articles[0].content[:500]) + print(f'Contains links: {\"See Also` + `