242 lines
8.6 KiB
Markdown
242 lines
8.6 KiB
Markdown
# Story 3.3: Content Interlinking Injection - Implementation Summary
|
|
|
|
## Status
|
|
✅ **COMPLETE & INTEGRATED** - All acceptance criteria met, all tests passing, CLI integration complete
|
|
|
|
**Date Completed**: October 21, 2025
|
|
|
|
## What Was Implemented
|
|
|
|
### Core Module: `src/interlinking/content_injection.py`
|
|
|
|
Main function: `inject_interlinks()` - Injects three types of links into article HTML:
|
|
|
|
1. **Tiered Links** (Money Site / Lower Tier Articles)
|
|
- Tier 1: Links to money site URL
|
|
- Tier 2+: Links to 2-4 random lower-tier articles
|
|
- Uses tier-appropriate anchor text from `anchor_text_generator.py`
|
|
- Supports job config overrides (default/override/append modes)
|
|
- Searches for anchor text in content (case-insensitive)
|
|
- Wraps first occurrence or inserts via fallback
|
|
|
|
2. **Homepage Links**
|
|
- Links to `/index.html` on the article's domain
|
|
- Uses "Home" as anchor text
|
|
- Searches for "Home" in article content or inserts it
|
|
|
|
3. **"See Also" Section**
|
|
- Added after last `</p>` tag
|
|
- Links to ALL other articles in the batch
|
|
- Each link uses article title as anchor text
|
|
- Formatted as `<h3>` + `<ul>` list
|
|
|
|
### Template Updates: Navigation Menu
|
|
|
|
Added responsive navigation menu to all 4 templates (`src/templating/templates/`):
|
|
- **basic.html** - Clean, simple nav with blue accents
|
|
- **modern.html** - Gradient hover effects matching purple theme
|
|
- **classic.html** - Serif font, muted brown colors
|
|
- **minimal.html** - Uppercase, minimalist black & white
|
|
|
|
All templates now include:
|
|
```html
|
|
<nav>
|
|
<ul>
|
|
<li><a href="/index.html">Home</a></li>
|
|
<li><a href="about.html">About</a></li>
|
|
<li><a href="privacy.html">Privacy</a></li>
|
|
<li><a href="contact.html">Contact</a></li>
|
|
</ul>
|
|
</nav>
|
|
```
|
|
|
|
### Helper Functions
|
|
|
|
- `_inject_tiered_links()` - Handles money site (T1) and lower-tier (T2+) links
|
|
- `_inject_homepage_link()` - Injects "Home" link to `/index.html`
|
|
- `_inject_see_also_section()` - Builds "See Also" section with batch links
|
|
- `_get_anchor_texts_for_tier()` - Gets anchor text with job config overrides
|
|
- `_try_inject_link()` - Tries to find/wrap anchor text or falls back to insertion
|
|
- `_find_and_wrap_anchor_text()` - Case-insensitive search and wrap (first occurrence only)
|
|
- `_insert_link_into_random_paragraph()` - Fallback insertion into random paragraph
|
|
- `_extract_homepage_url()` - Extracts base domain URL
|
|
- `_extract_domain_name()` - Extracts domain name (removes www.)
|
|
- `_insert_before_closing_tags()` - Inserts content after last `</p>` tag
|
|
|
|
### Database Integration
|
|
|
|
All injected links are recorded in `article_links` table:
|
|
- **Tiered links**: `link_type="tiered"`, `to_url` (money site or lower tier URL)
|
|
- **Homepage links**: `link_type="homepage"`, `to_url` (domain/index.html)
|
|
- **See Also links**: `link_type="wheel_see_also"`, `to_content_id` (internal)
|
|
|
|
Content is updated in `generated_content.content` field via `content_repo.update()`.
|
|
|
|
### Anchor Text Configuration
|
|
|
|
Supports three modes in job config:
|
|
```json
|
|
{
|
|
"anchor_text_config": {
|
|
"mode": "default|override|append",
|
|
"custom_text": ["anchor 1", "anchor 2", ...]
|
|
}
|
|
}
|
|
```
|
|
|
|
- **default**: Use tier-based anchors (T1: main keyword, T2: related searches, T3: main keyword, T4+: entities)
|
|
- **override**: Replace defaults with custom_text
|
|
- **append**: Add custom_text to defaults
|
|
|
|
### Link Injection Strategy
|
|
|
|
1. **Search for anchor text** in content (case-insensitive, match within phrases)
|
|
2. **Wrap first occurrence** with `<a>` tag
|
|
3. **Skip existing links** (don't link text already inside `<a>` tags)
|
|
4. **Fallback to insertion** if anchor text not found
|
|
5. **Random placement** in fallback mode
|
|
|
|
### Testing
|
|
|
|
**Unit Tests** (33 tests in `tests/unit/test_content_injection.py`):
|
|
- Homepage URL extraction
|
|
- "See Also" section insertion
|
|
- Anchor text finding and wrapping (case-insensitive, within phrases)
|
|
- Link insertion into paragraphs
|
|
- Anchor text config modes (default, override, append)
|
|
- Tiered link injection (T1 money site, T2+ lower tier)
|
|
- Error handling
|
|
|
|
**Integration Tests** (9 tests in `tests/integration/test_content_injection_integration.py`):
|
|
- Full flow: T1 batch with money site links + See Also section
|
|
- Homepage link injection
|
|
- T2 batch linking to T1 articles
|
|
- Anchor text config overrides (override/append modes)
|
|
- Different batch sizes (1 article, 20 articles)
|
|
- ArticleLink database records (all link types)
|
|
- Internal vs external link handling
|
|
|
|
**All 42 tests pass**
|
|
|
|
## Key Design Decisions
|
|
|
|
1. **"Home" for homepage links**: Using "Home" as anchor text instead of domain name, now that all templates have navigation menus
|
|
2. **Homepage URL**: Points to `/index.html` (not just `/`)
|
|
3. **Random selection**: For T2+ articles, random selection of 2-4 lower-tier URLs to link to
|
|
4. **Case-insensitive matching**: "Shaft Machining" matches "shaft machining"
|
|
5. **First occurrence only**: Only link the first instance of anchor text to avoid over-optimization
|
|
6. **BeautifulSoup for HTML parsing**: Safe, preserves structure, handles malformed HTML
|
|
7. **Fallback insertion**: If anchor text not found, insert into random paragraph at random position
|
|
8. **See Also section**: Simpler than wheel_next/wheel_prev - all articles link to all others
|
|
|
|
## Files Modified
|
|
|
|
### Created
|
|
- `src/interlinking/content_injection.py` (410 lines)
|
|
- `tests/unit/test_content_injection.py` (363 lines)
|
|
- `tests/integration/test_content_injection_integration.py` (469 lines)
|
|
|
|
### Modified
|
|
- `src/templating/templates/basic.html` - Added navigation menu
|
|
- `src/templating/templates/modern.html` - Added navigation menu
|
|
- `src/templating/templates/classic.html` - Added navigation menu
|
|
- `src/templating/templates/minimal.html` - Added navigation menu
|
|
|
|
## Dependencies
|
|
|
|
- **BeautifulSoup4**: HTML parsing and manipulation
|
|
- **Story 3.1**: URL generation (uses `generate_urls_for_batch()`)
|
|
- **Story 3.2**: Tiered link finding (uses `find_tiered_links()`)
|
|
- **Existing**: `anchor_text_generator.py` for tier-based anchor text
|
|
|
|
## Usage Example
|
|
|
|
```python
|
|
from src.interlinking.content_injection import inject_interlinks
|
|
from src.interlinking.tiered_links import find_tiered_links
|
|
from src.generation.url_generator import generate_urls_for_batch
|
|
|
|
# 1. Generate URLs for batch
|
|
article_urls = generate_urls_for_batch(content_records, site_repo)
|
|
|
|
# 2. Find tiered links
|
|
tiered_links = find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
|
|
|
|
# 3. Inject all interlinks
|
|
inject_interlinks(
|
|
content_records,
|
|
article_urls,
|
|
tiered_links,
|
|
project,
|
|
job_config,
|
|
content_repo,
|
|
link_repo
|
|
)
|
|
```
|
|
|
|
## CLI Integration (Completed)
|
|
|
|
Story 3.3 is now **fully integrated** into the `generate-batch` CLI workflow:
|
|
|
|
### Integration Details
|
|
- **File Modified**: `src/generation/batch_processor.py`
|
|
- **New Method**: `_post_process_tier()` (80+ lines)
|
|
- **Integration Point**: Automatically runs after article generation for each tier
|
|
|
|
### Complete Pipeline
|
|
When you run `generate-batch`, articles now go through:
|
|
1. Content generation (title, outline, content)
|
|
2. Site assignment via `deployment_targets` (Story 2.5)
|
|
3. **NEW**: Automatic site assignment for unassigned articles (Story 3.1)
|
|
4. **NEW**: URL generation (Story 3.1)
|
|
5. **NEW**: Tiered link finding (Story 3.2)
|
|
6. **NEW**: Content interlinking injection (Story 3.3)
|
|
7. **NEW**: Template application
|
|
|
|
### CLI Output
|
|
```
|
|
tier1: Generating 5 articles
|
|
[1/5] Generating title...
|
|
[1/5] Generating outline...
|
|
[1/5] Generating content...
|
|
[1/5] Saved (ID: 43, Status: generated)
|
|
...
|
|
tier1: Assigning sites to 2 articles...
|
|
Assigned 2 articles to sites
|
|
tier1: Post-processing 5 articles...
|
|
Generating URLs...
|
|
Generated 5 URLs
|
|
Finding tiered links...
|
|
Found tiered links for tier 1
|
|
Injecting interlinks... ← Story 3.3!
|
|
Interlinks injected successfully ← Story 3.3!
|
|
Applying templates...
|
|
Applied templates to 5/5 articles
|
|
tier1: Post-processing complete
|
|
```
|
|
|
|
### Verification
|
|
Tested and confirmed:
|
|
- ✅ Articles assigned to sites automatically
|
|
- ✅ URLs generated for all articles
|
|
- ✅ Tiered links injected (money site for T1)
|
|
- ✅ Homepage links injected (`/index.html`)
|
|
- ✅ "See Also" sections with batch links
|
|
- ✅ Templates applied
|
|
- ✅ All link records in database
|
|
|
|
## Next Steps
|
|
|
|
Story 3.3 is complete and integrated. Ready for:
|
|
- **Story 4.x**: Deployment (final HTML with all links is ready)
|
|
- **Future**: Analytics dashboard using `article_links` table
|
|
- **Future**: Create About, Privacy, Contact pages to match nav menu links
|
|
|
|
## Notes
|
|
|
|
- Homepage links use "Home" anchor text, pointing to `/index.html`
|
|
- All 4 templates now have consistent navigation structure
|
|
- Link relationships fully tracked in database for analytics
|
|
- Simple, maintainable code with comprehensive test coverage
|
|
|