Big-Link-Man/CLI_INTEGRATION_EXPLANATION.md

258 lines
7.7 KiB
Markdown

# CLI Integration Explanation - Story 3.3
## The Problem
Story 3.3's `inject_interlinks()` function (and Stories 3.1-3.2) are **implemented and tested perfectly**, but they're **never called** in the actual batch generation workflow.
## Current Workflow
When you run:
```bash
uv run python main.py generate-batch --job-file jobs/example.json
```
Here's what actually happens:
### Step-by-Step Current Flow
```
1. CLI Command (src/cli/commands.py)
└─> generate_batch() function called
└─> Creates BatchProcessor
└─> BatchProcessor.process_job()
2. BatchProcessor.process_job() (src/generation/batch_processor.py)
└─> Reads job file
└─> For each job:
└─> _process_single_job()
└─> Validates deployment targets
└─> For each tier (tier1, tier2, tier3):
└─> _process_tier()
3. _process_tier()
└─> For each article (1 to count):
└─> _generate_single_article()
├─> Generate title
├─> Generate outline
├─> Generate content
├─> Augment if needed
└─> SAVE to database
4. END! ⚠️
Nothing happens after articles are generated!
No URLs, no tiered links, no interlinking!
```
## What's Missing
After all articles are generated for a tier, we need to add Story 3.1-3.3:
```python
# THIS CODE DOES NOT EXIST YET!
# Needs to be added at the end of _process_tier() or _process_single_job()
# 1. Get all generated content for this batch
content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name)
# 2. Assign sites (Story 3.1)
from src.generation.site_assignment import assign_sites_to_batch
assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword)
# 3. Generate URLs (Story 3.1)
from src.generation.url_generator import generate_urls_for_batch
article_urls = generate_urls_for_batch(content_records, site_repo)
# 4. Find tiered links (Story 3.2)
from src.interlinking.tiered_links import find_tiered_links
tiered_links = find_tiered_links(
content_records, job_config, project_repo, content_repo, site_repo
)
# 5. Inject interlinks (Story 3.3)
from src.interlinking.content_injection import inject_interlinks
from src.database.repositories import ArticleLinkRepository
link_repo = ArticleLinkRepository(session)
inject_interlinks(
content_records, article_urls, tiered_links,
project, job_config, content_repo, link_repo
)
# 6. Apply templates (existing functionality)
for content in content_records:
content_generator.apply_template(content.id)
```
## Why This Matters
### Current State
✓ Articles are generated
✗ Articles have NO internal links
✗ Articles have NO tiered links
✗ Articles have NO "See Also" section
✗ Articles have NO final URLs assigned
✗ Templates are NOT applied
**Result**: Articles sit in database with raw HTML, no links, unusable for deployment
### With Integration
✓ Articles are generated
✓ Sites are assigned to articles
✓ Final URLs are generated
✓ Tiered links are found
✓ All links are injected
✓ Templates are applied
✓ Articles are ready for deployment
**Result**: Complete, interlinked articles ready for Story 4.x deployment
## Where to Add Integration
### Option 1: End of `_process_tier()` (RECOMMENDED)
Add the integration code at line 162 (after the article generation loop):
```python
def _process_tier(self, project_id, tier_name, tier_config, ...):
# ... existing article generation loop ...
# NEW: Post-generation interlinking
click.echo(f" {tier_name}: Injecting interlinks for {tier_config.count} articles...")
self._inject_tier_interlinks(project_id, tier_name, job, debug)
```
Then create new method:
```python
def _inject_tier_interlinks(self, project_id, tier_name, job, debug):
"""Inject interlinks for all articles in a tier"""
# Get all articles for this tier
content_records = self.content_repo.get_by_project_and_tier(
project_id, tier_name
)
if not content_records:
click.echo(f" Warning: No articles found for {tier_name}")
return
# Steps 1-6 from above...
```
### Option 2: End of `_process_single_job()`
Add integration after ALL tiers are generated (processes entire job at once):
```python
def _process_single_job(self, job, job_idx, debug, continue_on_error):
# ... existing tier processing ...
# NEW: Process all tiers together
click.echo(f"\nPost-processing: Injecting interlinks...")
for tier_name in job.tiers.keys():
self._inject_tier_interlinks(job.project_id, tier_name, job, debug)
```
## Why It Wasn't Integrated Yet
Looking at the story implementations, it appears:
1. **Story 3.1** (URL Generation) - Functions exist but not integrated
2. **Story 3.2** (Tiered Links) - Functions exist but not integrated
3. **Story 3.3** (Content Injection) - Functions exist but not integrated
This suggests the stories focused on **building the functionality** with the expectation that **Story 4.x (Deployment)** would integrate everything together.
## Impact of Missing Integration
### Tests Still Pass ✓
- Unit tests test functions in isolation
- Integration tests use the functions directly
- All 42 tests pass because the **functions work perfectly**
### But Real Usage Fails ✗
When you actually run `generate-batch`:
- Articles are generated
- They're saved to database
- But they have no links, no URLs, nothing
- Story 4.x deployment would fail because articles aren't ready
## Effort to Fix
**Time Estimate**: 30-60 minutes
**Tasks**:
1. Add imports to `batch_processor.py` (2 minutes)
2. Create `_inject_tier_interlinks()` method (15 minutes)
3. Add call at end of `_process_tier()` (2 minutes)
4. Test with real job file (10 minutes)
5. Debug any issues (10-20 minutes)
**Complexity**: Low - just wiring existing functions together
## Testing the Integration
After adding integration:
```bash
# 1. Run batch generation
uv run python main.py generate-batch \
--job-file jobs/test_small.json \
--username admin \
--password yourpass
# 2. Check database for links
uv run python -c "
from src.database.session import db_manager
from src.database.repositories import ArticleLinkRepository
session = db_manager.get_session()
link_repo = ArticleLinkRepository(session)
links = link_repo.get_all()
print(f'Total links: {len(links)}')
for link in links[:5]:
print(f' {link.link_type}: {link.anchor_text} -> {link.to_url or link.to_content_id}')
session.close()
"
# 3. Verify articles have links in content
uv run python -c "
from src.database.session import db_manager
from src.database.repositories import GeneratedContentRepository
session = db_manager.get_session()
content_repo = GeneratedContentRepository(session)
articles = content_repo.get_all(limit=1)
if articles:
print('Sample article content:')
print(articles[0].content[:500])
print(f'Contains links: {\"<a href=\" in articles[0].content}')
print(f'Has See Also: {\"See Also\" in articles[0].content}')
session.close()
"
```
## Summary
**The Good News**:
- All Story 3.3 code is perfect ✓
- Tests prove functionality works ✓
- No bugs, no issues ✓
**The Bad News**:
- Code isn't wired into CLI workflow ✗
- Running `generate-batch` doesn't use Story 3.1-3.3 ✗
- Articles are incomplete without integration ✗
**The Fix**:
- Add ~50 lines of integration code
- Wire existing functions into `BatchProcessor`
- Test with real job file
- Done! ✓
**When to Fix**:
- Now (before Story 4.x) - RECOMMENDED
- Or during Story 4.x (when deployment needs links)
- Not urgent if not deploying yet
---
*This explains why all tests pass but the feature "isn't done" yet - the plumbing exists, it's just not connected to the main pipeline.*