Big-Link-Man/STORY_3.3_IMPLEMENTATION_SU...

6.9 KiB

Story 3.3: Content Interlinking Injection - Implementation Summary

Status

COMPLETE - All acceptance criteria met, all tests passing

What Was Implemented

Core Module: src/interlinking/content_injection.py

Main function: inject_interlinks() - Injects three types of links into article HTML:

  1. Tiered Links (Money Site / Lower Tier Articles)

    • Tier 1: Links to money site URL
    • Tier 2+: Links to 2-4 random lower-tier articles
    • Uses tier-appropriate anchor text from anchor_text_generator.py
    • Supports job config overrides (default/override/append modes)
    • Searches for anchor text in content (case-insensitive)
    • Wraps first occurrence or inserts via fallback
  2. Homepage Links

    • Links to /index.html on the article's domain
    • Uses "Home" as anchor text
    • Searches for "Home" in article content or inserts it
  3. "See Also" Section

    • Added after last </p> tag
    • Links to ALL other articles in the batch
    • Each link uses article title as anchor text
    • Formatted as <h3> + <ul> list

Template Updates: Navigation Menu

Added responsive navigation menu to all 4 templates (src/templating/templates/):

  • basic.html - Clean, simple nav with blue accents
  • modern.html - Gradient hover effects matching purple theme
  • classic.html - Serif font, muted brown colors
  • minimal.html - Uppercase, minimalist black & white

All templates now include:

<nav>
  <ul>
    <li><a href="/index.html">Home</a></li>
    <li><a href="about.html">About</a></li>
    <li><a href="privacy.html">Privacy</a></li>
    <li><a href="contact.html">Contact</a></li>
  </ul>
</nav>

Helper Functions

  • _inject_tiered_links() - Handles money site (T1) and lower-tier (T2+) links
  • _inject_homepage_link() - Injects "Home" link to /index.html
  • _inject_see_also_section() - Builds "See Also" section with batch links
  • _get_anchor_texts_for_tier() - Gets anchor text with job config overrides
  • _try_inject_link() - Tries to find/wrap anchor text or falls back to insertion
  • _find_and_wrap_anchor_text() - Case-insensitive search and wrap (first occurrence only)
  • _insert_link_into_random_paragraph() - Fallback insertion into random paragraph
  • _extract_homepage_url() - Extracts base domain URL
  • _extract_domain_name() - Extracts domain name (removes www.)
  • _insert_before_closing_tags() - Inserts content after last </p> tag

Database Integration

All injected links are recorded in article_links table:

  • Tiered links: link_type="tiered", to_url (money site or lower tier URL)
  • Homepage links: link_type="homepage", to_url (domain/index.html)
  • See Also links: link_type="wheel_see_also", to_content_id (internal)

Content is updated in generated_content.content field via content_repo.update().

Anchor Text Configuration

Supports three modes in job config:

{
  "anchor_text_config": {
    "mode": "default|override|append",
    "custom_text": ["anchor 1", "anchor 2", ...]
  }
}
  • default: Use tier-based anchors (T1: main keyword, T2: related searches, T3: main keyword, T4+: entities)
  • override: Replace defaults with custom_text
  • append: Add custom_text to defaults
  1. Search for anchor text in content (case-insensitive, match within phrases)
  2. Wrap first occurrence with <a> tag
  3. Skip existing links (don't link text already inside <a> tags)
  4. Fallback to insertion if anchor text not found
  5. Random placement in fallback mode

Testing

Unit Tests (33 tests in tests/unit/test_content_injection.py):

  • Homepage URL extraction
  • "See Also" section insertion
  • Anchor text finding and wrapping (case-insensitive, within phrases)
  • Link insertion into paragraphs
  • Anchor text config modes (default, override, append)
  • Tiered link injection (T1 money site, T2+ lower tier)
  • Error handling

Integration Tests (9 tests in tests/integration/test_content_injection_integration.py):

  • Full flow: T1 batch with money site links + See Also section
  • Homepage link injection
  • T2 batch linking to T1 articles
  • Anchor text config overrides (override/append modes)
  • Different batch sizes (1 article, 20 articles)
  • ArticleLink database records (all link types)
  • Internal vs external link handling

All 42 tests pass

Key Design Decisions

  1. "Home" for homepage links: Using "Home" as anchor text instead of domain name, now that all templates have navigation menus
  2. Homepage URL: Points to /index.html (not just /)
  3. Random selection: For T2+ articles, random selection of 2-4 lower-tier URLs to link to
  4. Case-insensitive matching: "Shaft Machining" matches "shaft machining"
  5. First occurrence only: Only link the first instance of anchor text to avoid over-optimization
  6. BeautifulSoup for HTML parsing: Safe, preserves structure, handles malformed HTML
  7. Fallback insertion: If anchor text not found, insert into random paragraph at random position
  8. See Also section: Simpler than wheel_next/wheel_prev - all articles link to all others

Files Modified

Created

  • src/interlinking/content_injection.py (410 lines)
  • tests/unit/test_content_injection.py (363 lines)
  • tests/integration/test_content_injection_integration.py (469 lines)

Modified

  • src/templating/templates/basic.html - Added navigation menu
  • src/templating/templates/modern.html - Added navigation menu
  • src/templating/templates/classic.html - Added navigation menu
  • src/templating/templates/minimal.html - Added navigation menu

Dependencies

  • BeautifulSoup4: HTML parsing and manipulation
  • Story 3.1: URL generation (uses generate_urls_for_batch())
  • Story 3.2: Tiered link finding (uses find_tiered_links())
  • Existing: anchor_text_generator.py for tier-based anchor text

Usage Example

from src.interlinking.content_injection import inject_interlinks
from src.interlinking.tiered_links import find_tiered_links
from src.generation.url_generator import generate_urls_for_batch

# 1. Generate URLs for batch
article_urls = generate_urls_for_batch(content_records, site_repo)

# 2. Find tiered links
tiered_links = find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)

# 3. Inject all interlinks
inject_interlinks(
    content_records,
    article_urls,
    tiered_links,
    project,
    job_config,
    content_repo,
    link_repo
)

Next Steps

Story 3.3 is complete and ready for:

  • Story 4.x: Deployment (will use final HTML with all links)
  • Future: Analytics dashboard using article_links table
  • Future: Create About, Privacy, Contact pages to match nav menu links

Notes

  • Homepage links use "Home" anchor text, pointing to /index.html
  • All 4 templates now have consistent navigation structure
  • Link relationships fully tracked in database for analytics
  • Simple, maintainable code with comprehensive test coverage