Big-Link-Man/STORY_4.5_IMPLEMENTATION_SU...

5.9 KiB

Story 4.5: Create URL and Link Reporting Script - Implementation Summary

Status: COMPLETE
Story Points: 3
Date Completed: October 22, 2025

Overview

Implemented a CLI command to export article URLs with optional link details (anchor text and destination URLs) based on project and tier filters. Additionally enhanced the data model to store anchor text directly in the database for better performance and data integrity.

Implementation

Core Features Implemented

  1. CLI Command: get-links

    • Location: src/cli/commands.py
    • Exports article URLs in CSV format
    • Required arguments:
      • --project-id / -p: Project ID to filter
      • --tier / -t: Tier filter (supports "1", "2", or "2+" for ranges)
    • Optional flags:
      • --with-anchor-text: Include anchor text used for tiered links
      • --with-destination-url: Include destination URL that the article links to
    • Output: CSV to stdout (can be redirected to file)
  2. Database Enhancement: anchor_text Field

    • Added anchor_text column to article_links table
    • Migration script: scripts/migrate_add_anchor_text.py
    • Updated ArticleLink model with new field
    • Updated ArticleLinkRepository.create() to accept anchor_text parameter
  3. Content Injection Updates

    • Modified src/interlinking/content_injection.py to capture and store actual anchor text used
    • Updated _try_inject_link() to return the anchor text that was successfully injected
    • All link creation calls now include anchor_text:
      • Tiered links (money site and lower tier)
      • Homepage links
      • See Also section links

Files Modified

Database Layer

  • src/database/models.py - Added anchor_text field to ArticleLink model
  • src/database/repositories.py - Updated ArticleLinkRepository.create()
  • scripts/migrate_add_anchor_text.py - New migration script

Business Logic

  • src/interlinking/content_injection.py:
    • Modified _try_inject_link() signature to return anchor text
    • Updated _inject_tiered_links() to capture anchor text
    • Updated _inject_homepage_link() to capture anchor text
    • Updated _inject_see_also_section() to store article titles as anchor text

CLI

  • src/cli/commands.py:
    • Added get-links command
    • Simplified implementation (no HTML parsing needed)
    • Direct database read for anchor text

Tests

  • tests/integration/test_get_links_command.py - New comprehensive test suite (9 tests)

Documentation

  • docs/prd/epic-4-deployment.md - Updated Story 4.5 status to COMPLETE
  • docs/stories/story-3.2-find-tiered-links.md - Updated ArticleLink schema to include anchor_text field
  • docs/architecture/data-models.md - Added ArticleLink model documentation with anchor_text field
  • STORY_3.2_IMPLEMENTATION_SUMMARY.md - Updated schema to include anchor_text field

Usage Examples

Basic usage - get all tier 1 URLs

python main.py get-links --project-id 1 --tier 1

Get tier 2 and above with anchor text and destinations

python main.py get-links --project-id 1 --tier 2+ --with-anchor-text --with-destination-url

Export to file

python main.py get-links --project-id 1 --tier 1 --with-anchor-text > tier1_links.csv

CSV Output Format

Basic (no flags):

article_url,tier,title
https://example.com/article1.html,tier1,Article Title 1

With anchor text:

article_url,tier,title,anchor_text
https://example.com/article1.html,tier1,Article Title 1,expert services

With destination URL:

article_url,tier,title,destination_url
https://example.com/article1.html,tier1,Article Title 1,https://www.moneysite.com

With both flags:

article_url,tier,title,anchor_text,destination_url
https://example.com/article1.html,tier1,Article Title 1,expert services,https://www.moneysite.com

Testing

Test Coverage: 9 integration tests, all passing

Test Cases:

  1. Basic tier 1 export (no optional flags)
  2. Tier range filter (2+)
  3. Export with anchor text
  4. Export with destination URL
  5. Export with both flags
  6. Tier 2 resolves to_content_id to deployed URL
  7. Error handling - invalid project
  8. Error handling - invalid tier format
  9. Error handling - no deployed articles

Database Enhancement Benefits

The addition of the anchor_text field to the article_links table provides:

  1. Performance: No HTML parsing required - direct database read
  2. Data Integrity: Know exactly what anchor text was used for each link
  3. Auditability: Track link relationships and their anchor text
  4. Simplicity: Cleaner code without BeautifulSoup HTML parsing in CLI

Migration

To apply the database changes to existing databases:

python scripts/migrate_add_anchor_text.py

To rollback:

python scripts/migrate_add_anchor_text.py rollback

Note: Existing links will have NULL anchor_text. Re-run content injection to populate this field for existing content.

Acceptance Criteria - Verification

A new CLI command get-links is created
The script accepts a mandatory project_id
The script accepts a tier specifier supporting single tier and ranges (e.g., "2+")
Optional flag --with-anchor-text includes the anchor text
Optional flag --with-destination-url includes the destination URL
The script queries the database to retrieve link information
The output is well-formatted CSV printed to stdout

Known Limitations

  • Only reports tiered links (excludes homepage and see also links)
  • Existing article_links records created before migration will have NULL anchor_text
  • CSV output goes to stdout only (user must redirect to file)

Future Enhancements

Potential improvements for future stories:

  • Add --link-type flag to filter by link type (tiered, homepage, wheel_see_also)
  • Add --output flag to write directly to file
  • Add JSON output format option
  • Add summary statistics (total links, link types breakdown)