Big-Link-Man/STORY_3.3_IMPLEMENTATION_SU...

189 lines
6.9 KiB
Markdown

# Story 3.3: Content Interlinking Injection - Implementation Summary
## Status
**COMPLETE** - All acceptance criteria met, all tests passing
## What Was Implemented
### Core Module: `src/interlinking/content_injection.py`
Main function: `inject_interlinks()` - Injects three types of links into article HTML:
1. **Tiered Links** (Money Site / Lower Tier Articles)
- Tier 1: Links to money site URL
- Tier 2+: Links to 2-4 random lower-tier articles
- Uses tier-appropriate anchor text from `anchor_text_generator.py`
- Supports job config overrides (default/override/append modes)
- Searches for anchor text in content (case-insensitive)
- Wraps first occurrence or inserts via fallback
2. **Homepage Links**
- Links to `/index.html` on the article's domain
- Uses "Home" as anchor text
- Searches for "Home" in article content or inserts it
3. **"See Also" Section**
- Added after last `</p>` tag
- Links to ALL other articles in the batch
- Each link uses article title as anchor text
- Formatted as `<h3>` + `<ul>` list
### Template Updates: Navigation Menu
Added responsive navigation menu to all 4 templates (`src/templating/templates/`):
- **basic.html** - Clean, simple nav with blue accents
- **modern.html** - Gradient hover effects matching purple theme
- **classic.html** - Serif font, muted brown colors
- **minimal.html** - Uppercase, minimalist black & white
All templates now include:
```html
<nav>
<ul>
<li><a href="/index.html">Home</a></li>
<li><a href="about.html">About</a></li>
<li><a href="privacy.html">Privacy</a></li>
<li><a href="contact.html">Contact</a></li>
</ul>
</nav>
```
### Helper Functions
- `_inject_tiered_links()` - Handles money site (T1) and lower-tier (T2+) links
- `_inject_homepage_link()` - Injects "Home" link to `/index.html`
- `_inject_see_also_section()` - Builds "See Also" section with batch links
- `_get_anchor_texts_for_tier()` - Gets anchor text with job config overrides
- `_try_inject_link()` - Tries to find/wrap anchor text or falls back to insertion
- `_find_and_wrap_anchor_text()` - Case-insensitive search and wrap (first occurrence only)
- `_insert_link_into_random_paragraph()` - Fallback insertion into random paragraph
- `_extract_homepage_url()` - Extracts base domain URL
- `_extract_domain_name()` - Extracts domain name (removes www.)
- `_insert_before_closing_tags()` - Inserts content after last `</p>` tag
### Database Integration
All injected links are recorded in `article_links` table:
- **Tiered links**: `link_type="tiered"`, `to_url` (money site or lower tier URL)
- **Homepage links**: `link_type="homepage"`, `to_url` (domain/index.html)
- **See Also links**: `link_type="wheel_see_also"`, `to_content_id` (internal)
Content is updated in `generated_content.content` field via `content_repo.update()`.
### Anchor Text Configuration
Supports three modes in job config:
```json
{
"anchor_text_config": {
"mode": "default|override|append",
"custom_text": ["anchor 1", "anchor 2", ...]
}
}
```
- **default**: Use tier-based anchors (T1: main keyword, T2: related searches, T3: main keyword, T4+: entities)
- **override**: Replace defaults with custom_text
- **append**: Add custom_text to defaults
### Link Injection Strategy
1. **Search for anchor text** in content (case-insensitive, match within phrases)
2. **Wrap first occurrence** with `<a>` tag
3. **Skip existing links** (don't link text already inside `<a>` tags)
4. **Fallback to insertion** if anchor text not found
5. **Random placement** in fallback mode
### Testing
**Unit Tests** (33 tests in `tests/unit/test_content_injection.py`):
- Homepage URL extraction
- "See Also" section insertion
- Anchor text finding and wrapping (case-insensitive, within phrases)
- Link insertion into paragraphs
- Anchor text config modes (default, override, append)
- Tiered link injection (T1 money site, T2+ lower tier)
- Error handling
**Integration Tests** (9 tests in `tests/integration/test_content_injection_integration.py`):
- Full flow: T1 batch with money site links + See Also section
- Homepage link injection
- T2 batch linking to T1 articles
- Anchor text config overrides (override/append modes)
- Different batch sizes (1 article, 20 articles)
- ArticleLink database records (all link types)
- Internal vs external link handling
**All 42 tests pass**
## Key Design Decisions
1. **"Home" for homepage links**: Using "Home" as anchor text instead of domain name, now that all templates have navigation menus
2. **Homepage URL**: Points to `/index.html` (not just `/`)
3. **Random selection**: For T2+ articles, random selection of 2-4 lower-tier URLs to link to
4. **Case-insensitive matching**: "Shaft Machining" matches "shaft machining"
5. **First occurrence only**: Only link the first instance of anchor text to avoid over-optimization
6. **BeautifulSoup for HTML parsing**: Safe, preserves structure, handles malformed HTML
7. **Fallback insertion**: If anchor text not found, insert into random paragraph at random position
8. **See Also section**: Simpler than wheel_next/wheel_prev - all articles link to all others
## Files Modified
### Created
- `src/interlinking/content_injection.py` (410 lines)
- `tests/unit/test_content_injection.py` (363 lines)
- `tests/integration/test_content_injection_integration.py` (469 lines)
### Modified
- `src/templating/templates/basic.html` - Added navigation menu
- `src/templating/templates/modern.html` - Added navigation menu
- `src/templating/templates/classic.html` - Added navigation menu
- `src/templating/templates/minimal.html` - Added navigation menu
## Dependencies
- **BeautifulSoup4**: HTML parsing and manipulation
- **Story 3.1**: URL generation (uses `generate_urls_for_batch()`)
- **Story 3.2**: Tiered link finding (uses `find_tiered_links()`)
- **Existing**: `anchor_text_generator.py` for tier-based anchor text
## Usage Example
```python
from src.interlinking.content_injection import inject_interlinks
from src.interlinking.tiered_links import find_tiered_links
from src.generation.url_generator import generate_urls_for_batch
# 1. Generate URLs for batch
article_urls = generate_urls_for_batch(content_records, site_repo)
# 2. Find tiered links
tiered_links = find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
# 3. Inject all interlinks
inject_interlinks(
content_records,
article_urls,
tiered_links,
project,
job_config,
content_repo,
link_repo
)
```
## Next Steps
Story 3.3 is complete and ready for:
- **Story 4.x**: Deployment (will use final HTML with all links)
- **Future**: Analytics dashboard using `article_links` table
- **Future**: Create About, Privacy, Contact pages to match nav menu links
## Notes
- Homepage links use "Home" anchor text, pointing to `/index.html`
- All 4 templates now have consistent navigation structure
- Link relationships fully tracked in database for analytics
- Simple, maintainable code with comprehensive test coverage