12 KiB
Story 3.3: Content Interlinking Injection
Status
Pending - Ready to Implement
Summary
This story injects three types of links into article HTML:
- Tiered Links - T1 articles link to money site, T2+ link to lower-tier articles
- Homepage Links - Link to the site's homepage (base domain)
- "See Also" Section - Links to all other articles in the batch
Uses existing anchor_text_generator.py for tier-based anchor text with support for job config overrides (default/override/append modes).
Story
As a developer, I want to inject all required links (batch "wheel", home page, and tiered/money site) into each new article's HTML content, so that the articles are fully interlinked and ready for deployment.
Context
- Story 3.1 generates final URLs for all articles in the batch
- Story 3.2 finds the required tiered links (money site or lower-tier URLs)
- Articles have raw HTML content from Epic 2 (h2, h3, p tags)
- Project contains anchor text lists for each tier
- Articles need wheel links (next/previous), homepage links, and tiered links
Acceptance Criteria
Core Functionality
- A function takes raw HTML content, URL list, tiered links, and project data
- Wheel Links: Each article gets "next" and "previous" links to other articles in the batch
- Last article's "next" links to first article (circular)
- First article's "previous" links to last article (circular)
- Homepage Links: Each article gets a link to its site's homepage
- Tiered Links: Articles get links based on their tier
- Tier 1: Links to money site using T1 anchor text
- Tier 2+: Links to lower-tier articles using appropriate tier anchor text
Input Requirements
- Raw HTML content (from Epic 2)
- List of article URLs with titles (from Story 3.1)
- Tiered links object (from Story 3.2)
- Project data (for anchor text lists)
- Batch tier information
Output Requirements
- Final HTML content with all links injected
- Updated content stored in database
- Link relationships recorded in
article_linkstable
Implementation Details
Anchor Text Generation
RESOLVED: Use existing src/interlinking/anchor_text_generator.py with job config overrides
- Default tier-based anchor text:
- Tier 1: Uses main keyword variations
- Tier 2: Uses related searches
- Tier 3: Uses main keyword variations
- Tier 4+: Uses entities
- Job config overrides via
anchor_text_config:mode: "default"- Use tier-based defaultsmode: "override"- Replace defaults withcustom_textlistmode: "append"- Addcustom_textto tier-based defaults
- Import and use
get_anchor_text_for_tier()function
Homepage URL Generation
RESOLVED: Remove the slug after / from the article URL
- Example:
https://site.com/article-slug.html→https://site.com/ - Use base domain as homepage URL
Link Placement Strategy
Tiered Links (Money Site / Lower Tier)
- First Priority: Find anchor text already in the document
- Search for anchor text in HTML content
- Add link to FIRST match only (prevent duplicate links)
- Case-insensitive matching
- Fallback: If anchor text not found in document
- Insert anchor text into a sentence in the article
- Make it a link to the target URL
Wheel Links (See Also Section)
- Add a "See Also" section after the last paragraph
- Format as heading + unordered list
- Include ALL other articles in the batch (excluding current article)
- Each list item is an article title as a link
- Example:
<h3>See Also</h3> <ul> <li><a href="url1">Article Title 1</a></li> <li><a href="url2">Article Title 2</a></li> <li><a href="url3">Article Title 3</a></li> </ul>
Homepage Links
- Same as tiered links: find anchor text in content or insert it
- Link to site homepage (base domain)
Implementation Approach
Function Signature
def inject_interlinks(
content_records: List[GeneratedContent],
article_urls: List[Dict], # [{content_id, title, url}, ...]
tiered_links: Dict, # From Story 3.2
project: Project,
content_repo: GeneratedContentRepository,
link_repo: ArticleLinkRepository
) -> None: # Updates content in database
Processing Flow
- For each article in the batch:
a. Load its raw HTML content
b. Generate tier-appropriate anchor text using
get_anchor_text_for_tier()c. Inject tiered links (money site or lower tier) d. Inject homepage link e. Inject wheel links ("See Also" section) f. Update content in database g. Record all links inarticle_linkstable
Link Injection Details
Tiered Link Injection
# Get anchor text for this tier
from src.interlinking.anchor_text_generator import get_anchor_text_for_tier
# Get default tier-based anchor text
default_anchors = get_anchor_text_for_tier(tier, project, count=5)
# Apply job config overrides if present
if job_config.anchor_text_config:
if job_config.anchor_text_config.mode == "override":
anchor_texts = job_config.anchor_text_config.custom_text or default_anchors
elif job_config.anchor_text_config.mode == "append":
anchor_texts = default_anchors + (job_config.anchor_text_config.custom_text or [])
else: # "default"
anchor_texts = default_anchors
else:
anchor_texts = default_anchors
# For each anchor text:
for anchor_text in anchor_texts:
if anchor_text in html_content (case-insensitive):
# Wrap FIRST occurrence with link
html_content = wrap_first_occurrence(html_content, anchor_text, target_url)
break
else:
# Insert anchor text + link into a paragraph
html_content = insert_link_into_content(html_content, anchor_text, target_url)
Homepage Link Injection
# Derive homepage URL
homepage_url = extract_base_url(article_url) # https://site.com/article.html → https://site.com/
# Use main keyword as anchor text
anchor_text = project.main_keyword
# Find or insert link (same strategy as tiered links)
Wheel Link Injection
# Build "See Also" section with ALL other articles in batch
other_articles = [a for a in article_urls if a['content_id'] != current_article.id]
see_also_html = "<h3>See Also</h3>\n<ul>\n"
for article in other_articles:
see_also_html += f' <li><a href="{article["url"]}">{article["title"]}</a></li>\n'
see_also_html += "</ul>\n"
# Append after last paragraph (before closing tags)
html_content = insert_before_closing_tags(html_content, see_also_html)
Database Updates
- Update
GeneratedContent.contentwith final HTML - Create
ArticleLinkrecords for all injected links:link_type="tiered"for money site / lower tier linkslink_type="homepage"for homepage linkslink_type="wheel_see_also"for "See Also" section links
- Track both internal (
to_content_id) and external (to_url) links
Note: The "See Also" section replaces the previous wheel_next/wheel_prev concept. Each article links to all other articles in the batch via the "See Also" section.
Tasks / Subtasks
1. Create Content Injection Module
Effort: 3 story points
- Create
src/interlinking/content_injection.py - Implement
inject_interlinks()main function - Implement "See Also" section builder (all batch articles)
- Implement homepage URL extraction (base domain)
- Implement tiered link injection with anchor text matching
2. Anchor Text Processing
Effort: 2 story points
- Import
get_anchor_text_for_tier()from existing module - Apply job config
anchor_text_configoverrides (default/override/append) - Implement case-insensitive anchor text search in HTML
- Wrap first occurrence of anchor text with link
- Implement fallback: insert anchor text + link if not found in content
3. HTML Link Injection
Effort: 2 story points
- Implement safe HTML parsing (avoid breaking existing tags)
- Implement link insertion before closing article/body tags
- Ensure proper link formatting (
<a href="...">text</a>) - Handle edge cases (empty content, malformed HTML)
- Preserve HTML structure and formatting
4. Database Integration
Effort: 2 story points
- Update
GeneratedContent.contentwith final HTML - Create
ArticleLinkrecords for all links - Handle both internal (content_id) and external (URL) links
- Ensure proper link type categorization
5. Unit Tests
Effort: 3 story points
- Test "See Also" section generation (all batch articles)
- Test homepage URL extraction (remove slug after
/) - Test tiered link injection for T1 (money site) and T2+ (lower tier)
- Test anchor text config modes: default, override, append
- Test case-insensitive anchor text matching (first occurrence only)
- Test fallback anchor text insertion when not found in content
- Test HTML structure preservation after link injection
- Test database record creation (ArticleLink for all link types)
- Test with different tier configurations (T1, T2, T3, T4+)
6. Integration Tests
Effort: 2 story points
- Test full flow: Story 3.1 URLs → Story 3.2 tiered links → Story 3.3 injection
- Test with different batch sizes (5, 10, 20 articles)
- Test with various HTML content structures
- Verify link relationships in
article_linkstable - Test with different tiers and project configurations
- Verify final HTML is deployable (well-formed)
Dependencies
- Story 3.1: URL generation must be complete
- Story 3.2: Tiered link finding must be complete
- Story 2.3: Generated content must exist
- Story 1.x: Project and database models must exist
Future Considerations
- Story 4.x will use the final HTML content for deployment
- Analytics dashboard will use
article_linksdata - Future: Advanced link placement strategies
- Future: Link density optimization
Total Effort
14 story points
Technical Notes
Existing Code to Use
# Use existing anchor text generator
from src.interlinking.anchor_text_generator import get_anchor_text_for_tier
# Example usage - Default tier-based
anchor_texts = get_anchor_text_for_tier("tier1", project, count=5)
# Returns: ["shaft machining", "learn about shaft machining", "shaft machining guide", ...]
# Example usage - With job config override
if job_config.anchor_text_config:
if job_config.anchor_text_config.mode == "override":
anchor_texts = job_config.anchor_text_config.custom_text
# Returns: ["click here for more info", "learn more about this topic", ...]
elif job_config.anchor_text_config.mode == "append":
anchor_texts = default_anchors + job_config.anchor_text_config.custom_text
# Returns: ["shaft machining", "learn about...", "click here...", ...]
Anchor Text Configuration (Job Config)
Job configuration supports three modes for anchor text:
{
"anchor_text_config": {
"mode": "default|override|append",
"custom_text": ["anchor 1", "anchor 2", ...]
}
}
Modes:
default: Use tier-based anchor text fromanchor_text_generator.pyoverride: Replace tier-based anchors withcustom_textlistappend: Addcustom_textto tier-based anchors
Example - Override Mode:
{
"anchor_text_config": {
"mode": "override",
"custom_text": [
"click here for more info",
"learn more about this topic",
"discover the best practices"
]
}
}
Link Injection Rules
- One link per anchor text - Only link the FIRST occurrence
- Case-insensitive search - Match "Shaft Machining" with "shaft machining"
- Preserve HTML structure - Don't break existing tags
- Fallback insertion - If anchor text not in content, insert it naturally
- Config overrides - Job config can override/append to tier-based defaults
"See Also" Section Format
<!-- Appended after last paragraph -->
<h3>See Also</h3>
<ul>
<li><a href="https://site1.com/article1.html">Article Title 1</a></li>
<li><a href="https://site2.com/article2.html">Article Title 2</a></li>
<li><a href="https://site3.com/article3.html">Article Title 3</a></li>
</ul>
Homepage URL Examples
https://example.com/article-slug.html → https://example.com/
https://site.b-cdn.net/my-article.html → https://site.b-cdn.net/
https://www.custom.com/path/to/article.html → https://www.custom.com/
Notes
This story uses existing tier-based anchor text generation. No need to implement anchor text logic from scratch - just import and use the existing functions that handle all edge cases automatically.