5.9 KiB
Story 4.5: Create URL and Link Reporting Script - Implementation Summary
Status: ✅ COMPLETE
Story Points: 3
Date Completed: October 22, 2025
Overview
Implemented a CLI command to export article URLs with optional link details (anchor text and destination URLs) based on project and tier filters. Additionally enhanced the data model to store anchor text directly in the database for better performance and data integrity.
Implementation
Core Features Implemented
-
CLI Command:
get-links- Location:
src/cli/commands.py - Exports article URLs in CSV format
- Required arguments:
--project-id/-p: Project ID to filter--tier/-t: Tier filter (supports "1", "2", or "2+" for ranges)
- Optional flags:
--with-anchor-text: Include anchor text used for tiered links--with-destination-url: Include destination URL that the article links to
- Output: CSV to stdout (can be redirected to file)
- Location:
-
Database Enhancement: anchor_text Field
- Added
anchor_textcolumn toarticle_linkstable - Migration script:
scripts/migrate_add_anchor_text.py - Updated
ArticleLinkmodel with new field - Updated
ArticleLinkRepository.create()to accept anchor_text parameter
- Added
-
Content Injection Updates
- Modified
src/interlinking/content_injection.pyto capture and store actual anchor text used - Updated
_try_inject_link()to return the anchor text that was successfully injected - All link creation calls now include anchor_text:
- Tiered links (money site and lower tier)
- Homepage links
- See Also section links
- Modified
Files Modified
Database Layer
src/database/models.py- Addedanchor_textfield to ArticleLink modelsrc/database/repositories.py- Updated ArticleLinkRepository.create()scripts/migrate_add_anchor_text.py- New migration script
Business Logic
src/interlinking/content_injection.py:- Modified
_try_inject_link()signature to return anchor text - Updated
_inject_tiered_links()to capture anchor text - Updated
_inject_homepage_link()to capture anchor text - Updated
_inject_see_also_section()to store article titles as anchor text
- Modified
CLI
src/cli/commands.py:- Added
get-linkscommand - Simplified implementation (no HTML parsing needed)
- Direct database read for anchor text
- Added
Tests
tests/integration/test_get_links_command.py- New comprehensive test suite (9 tests)
Documentation
docs/prd/epic-4-deployment.md- Updated Story 4.5 status to COMPLETEdocs/stories/story-3.2-find-tiered-links.md- Updated ArticleLink schema to include anchor_text fielddocs/architecture/data-models.md- Added ArticleLink model documentation with anchor_text fieldSTORY_3.2_IMPLEMENTATION_SUMMARY.md- Updated schema to include anchor_text field
Usage Examples
Basic usage - get all tier 1 URLs
python main.py get-links --project-id 1 --tier 1
Get tier 2 and above with anchor text and destinations
python main.py get-links --project-id 1 --tier 2+ --with-anchor-text --with-destination-url
Export to file
python main.py get-links --project-id 1 --tier 1 --with-anchor-text > tier1_links.csv
CSV Output Format
Basic (no flags):
article_url,tier,title
https://example.com/article1.html,tier1,Article Title 1
With anchor text:
article_url,tier,title,anchor_text
https://example.com/article1.html,tier1,Article Title 1,expert services
With destination URL:
article_url,tier,title,destination_url
https://example.com/article1.html,tier1,Article Title 1,https://www.moneysite.com
With both flags:
article_url,tier,title,anchor_text,destination_url
https://example.com/article1.html,tier1,Article Title 1,expert services,https://www.moneysite.com
Testing
Test Coverage: 9 integration tests, all passing
Test Cases:
- Basic tier 1 export (no optional flags)
- Tier range filter (2+)
- Export with anchor text
- Export with destination URL
- Export with both flags
- Tier 2 resolves to_content_id to deployed URL
- Error handling - invalid project
- Error handling - invalid tier format
- Error handling - no deployed articles
Database Enhancement Benefits
The addition of the anchor_text field to the article_links table provides:
- Performance: No HTML parsing required - direct database read
- Data Integrity: Know exactly what anchor text was used for each link
- Auditability: Track link relationships and their anchor text
- Simplicity: Cleaner code without BeautifulSoup HTML parsing in CLI
Migration
To apply the database changes to existing databases:
python scripts/migrate_add_anchor_text.py
To rollback:
python scripts/migrate_add_anchor_text.py rollback
Note: Existing links will have NULL anchor_text. Re-run content injection to populate this field for existing content.
Acceptance Criteria - Verification
✅ A new CLI command get-links is created
✅ The script accepts a mandatory project_id
✅ The script accepts a tier specifier supporting single tier and ranges (e.g., "2+")
✅ Optional flag --with-anchor-text includes the anchor text
✅ Optional flag --with-destination-url includes the destination URL
✅ The script queries the database to retrieve link information
✅ The output is well-formatted CSV printed to stdout
Known Limitations
- Only reports tiered links (excludes homepage and see also links)
- Existing article_links records created before migration will have NULL anchor_text
- CSV output goes to stdout only (user must redirect to file)
Future Enhancements
Potential improvements for future stories:
- Add
--link-typeflag to filter by link type (tiered, homepage, wheel_see_also) - Add
--outputflag to write directly to file - Add JSON output format option
- Add summary statistics (total links, link types breakdown)