# Story 3.3: Content Interlinking Injection
## Status
✅ **COMPLETE** - Implemented, Integrated, and Tested
## Summary
This story injects three types of links into article HTML:
1. **Tiered Links** - T1 articles link to money site, T2+ link to lower-tier articles
2. **Homepage Links** - Link to the site's homepage (base domain)
3. **"See Also" Section** - Links to all other articles in the batch
Uses existing `anchor_text_generator.py` for tier-based anchor text with support for job config overrides (default/override/append modes).
## Story
**As a developer**, I want to inject all required links (batch "wheel", home page, and tiered/money site) into each new article's HTML content, so that the articles are fully interlinked and ready for deployment.
## Context
- Story 3.1 generates final URLs for all articles in the batch
- Story 3.2 finds the required tiered links (money site or lower-tier URLs)
- Articles have raw HTML content from Epic 2 (h2, h3, p tags)
- Project contains anchor text lists for each tier
- Articles need wheel links (next/previous), homepage links, and tiered links
## Acceptance Criteria
### Core Functionality
- A function takes raw HTML content, URL list, tiered links, and project data
- **Wheel Links:** Each article gets "next" and "previous" links to other articles in the batch
- Last article's "next" links to first article (circular)
- First article's "previous" links to last article (circular)
- **Homepage Links:** Each article gets a link to its site's homepage
- **Tiered Links:** Articles get links based on their tier
- Tier 1: Links to money site using T1 anchor text
- Tier 2+: Links to lower-tier articles using appropriate tier anchor text
### Input Requirements
- Raw HTML content (from Epic 2)
- List of article URLs with titles (from Story 3.1)
- Tiered links object (from Story 3.2)
- Project data (for anchor text lists)
- Batch tier information
### Output Requirements
- Final HTML content with all links injected
- Updated content stored in database
- Link relationships recorded in `article_links` table
## Implementation Details
### Anchor Text Generation
**RESOLVED:** Use existing `src/interlinking/anchor_text_generator.py` with job config overrides
- **Default tier-based anchor text:**
- Tier 1: Uses main keyword variations
- Tier 2: Uses related searches
- Tier 3: Uses main keyword variations
- Tier 4+: Uses entities
- **Job config overrides via `anchor_text_config`:**
- `mode: "default"` - Use tier-based defaults
- `mode: "override"` - Replace defaults with `custom_text` list
- `mode: "append"` - Add `custom_text` to tier-based defaults
- Import and use `get_anchor_text_for_tier()` function
### Homepage URL Generation
**RESOLVED:** Remove the slug after `/` from the article URL
- Example: `https://site.com/article-slug.html` → `https://site.com/`
- Use base domain as homepage URL
### Link Placement Strategy
#### Tiered Links (Money Site / Lower Tier)
1. **First Priority:** Find anchor text already in the document
- Search for anchor text in HTML content
- Add link to FIRST match only (prevent duplicate links)
- Case-insensitive matching
2. **Fallback:** If anchor text not found in document
- Insert anchor text into a sentence in the article
- Make it a link to the target URL
#### Wheel Links (See Also Section)
- Add a "See Also" section after the last paragraph
- Format as heading + unordered list
- Include ALL other articles in the batch (excluding current article)
- Each list item is an article title as a link
- Example:
```html
See Also
```
#### Homepage Links
- Same as tiered links: find anchor text in content or insert it
- Link to site homepage (base domain)
## Implementation Approach
### Function Signature
```python
def inject_interlinks(
content_records: List[GeneratedContent],
article_urls: List[Dict], # [{content_id, title, url}, ...]
tiered_links: Dict, # From Story 3.2
project: Project,
content_repo: GeneratedContentRepository,
link_repo: ArticleLinkRepository
) -> None: # Updates content in database
```
### Processing Flow
1. For each article in the batch:
a. Load its raw HTML content
b. Generate tier-appropriate anchor text using `get_anchor_text_for_tier()`
c. Inject tiered links (money site or lower tier)
d. Inject homepage link
e. Inject wheel links ("See Also" section)
f. Update content in database
g. Record all links in `article_links` table
### Link Injection Details
#### Tiered Link Injection
```python
# Get anchor text for this tier
from src.interlinking.anchor_text_generator import get_anchor_text_for_tier
# Get default tier-based anchor text
default_anchors = get_anchor_text_for_tier(tier, project, count=5)
# Apply job config overrides if present
if job_config.anchor_text_config:
if job_config.anchor_text_config.mode == "override":
anchor_texts = job_config.anchor_text_config.custom_text or default_anchors
elif job_config.anchor_text_config.mode == "append":
anchor_texts = default_anchors + (job_config.anchor_text_config.custom_text or [])
else: # "default"
anchor_texts = default_anchors
else:
anchor_texts = default_anchors
# For each anchor text:
for anchor_text in anchor_texts:
if anchor_text in html_content (case-insensitive):
# Wrap FIRST occurrence with link
html_content = wrap_first_occurrence(html_content, anchor_text, target_url)
break
else:
# Insert anchor text + link into a paragraph
html_content = insert_link_into_content(html_content, anchor_text, target_url)
```
#### Homepage Link Injection
```python
# Derive homepage URL
homepage_url = extract_base_url(article_url) # https://site.com/article.html → https://site.com/
# Use main keyword as anchor text
anchor_text = project.main_keyword
# Find or insert link (same strategy as tiered links)
```
#### Wheel Link Injection
```python
# Build "See Also" section with ALL other articles in batch
other_articles = [a for a in article_urls if a['content_id'] != current_article.id]
see_also_html = "See Also
\n\n"
for article in other_articles:
see_also_html += f' - {article["title"]}
\n'
see_also_html += "
\n"
# Append after last paragraph (before closing tags)
html_content = insert_before_closing_tags(html_content, see_also_html)
```
### Database Updates
- Update `GeneratedContent.content` with final HTML
- Create `ArticleLink` records for all injected links:
- `link_type="tiered"` for money site / lower tier links
- `link_type="homepage"` for homepage links
- `link_type="wheel_see_also"` for "See Also" section links
- Track both internal (`to_content_id`) and external (`to_url`) links
**Note:** The "See Also" section replaces the previous wheel_next/wheel_prev concept. Each article links to all other articles in the batch via the "See Also" section.
## Tasks / Subtasks
### 1. Create Content Injection Module
**Effort:** 3 story points
- [ ] Create `src/interlinking/content_injection.py`
- [ ] Implement `inject_interlinks()` main function
- [ ] Implement "See Also" section builder (all batch articles)
- [ ] Implement homepage URL extraction (base domain)
- [ ] Implement tiered link injection with anchor text matching
### 2. Anchor Text Processing
**Effort:** 2 story points
- [ ] Import `get_anchor_text_for_tier()` from existing module
- [ ] Apply job config `anchor_text_config` overrides (default/override/append)
- [ ] Implement case-insensitive anchor text search in HTML
- [ ] Wrap first occurrence of anchor text with link
- [ ] Implement fallback: insert anchor text + link if not found in content
### 3. HTML Link Injection
**Effort:** 2 story points
- [ ] Implement safe HTML parsing (avoid breaking existing tags)
- [ ] Implement link insertion before closing article/body tags
- [ ] Ensure proper link formatting (`text`)
- [ ] Handle edge cases (empty content, malformed HTML)
- [ ] Preserve HTML structure and formatting
### 4. Database Integration
**Effort:** 2 story points
- [ ] Update `GeneratedContent.content` with final HTML
- [ ] Create `ArticleLink` records for all links
- [ ] Handle both internal (content_id) and external (URL) links
- [ ] Ensure proper link type categorization
### 5. Unit Tests
**Effort:** 3 story points
- [ ] Test "See Also" section generation (all batch articles)
- [ ] Test homepage URL extraction (remove slug after `/`)
- [ ] Test tiered link injection for T1 (money site) and T2+ (lower tier)
- [ ] Test anchor text config modes: default, override, append
- [ ] Test case-insensitive anchor text matching (first occurrence only)
- [ ] Test fallback anchor text insertion when not found in content
- [ ] Test HTML structure preservation after link injection
- [ ] Test database record creation (ArticleLink for all link types)
- [ ] Test with different tier configurations (T1, T2, T3, T4+)
### 6. Integration Tests
**Effort:** 2 story points
- [ ] Test full flow: Story 3.1 URLs → Story 3.2 tiered links → Story 3.3 injection
- [ ] Test with different batch sizes (5, 10, 20 articles)
- [ ] Test with various HTML content structures
- [ ] Verify link relationships in `article_links` table
- [ ] Test with different tiers and project configurations
- [ ] Verify final HTML is deployable (well-formed)
## Dependencies
- Story 3.1: URL generation must be complete
- Story 3.2: Tiered link finding must be complete
- Story 2.3: Generated content must exist
- Story 1.x: Project and database models must exist
## Future Considerations
- Story 4.x will use the final HTML content for deployment
- Analytics dashboard will use `article_links` data
- Future: Advanced link placement strategies
- Future: Link density optimization
## Total Effort
14 story points
## Technical Notes
### Existing Code to Use
```python
# Use existing anchor text generator
from src.interlinking.anchor_text_generator import get_anchor_text_for_tier
# Example usage - Default tier-based
anchor_texts = get_anchor_text_for_tier("tier1", project, count=5)
# Returns: ["shaft machining", "learn about shaft machining", "shaft machining guide", ...]
# Example usage - With job config override
if job_config.anchor_text_config:
if job_config.anchor_text_config.mode == "override":
anchor_texts = job_config.anchor_text_config.custom_text
# Returns: ["click here for more info", "learn more about this topic", ...]
elif job_config.anchor_text_config.mode == "append":
anchor_texts = default_anchors + job_config.anchor_text_config.custom_text
# Returns: ["shaft machining", "learn about...", "click here...", ...]
```
### Anchor Text Configuration (Job Config)
Job configuration supports three modes for anchor text:
```json
{
"anchor_text_config": {
"mode": "default|override|append",
"custom_text": ["anchor 1", "anchor 2", ...]
}
}
```
**Modes:**
- `default`: Use tier-based anchor text from `anchor_text_generator.py`
- `override`: Replace tier-based anchors with `custom_text` list
- `append`: Add `custom_text` to tier-based anchors
**Example - Override Mode:**
```json
{
"anchor_text_config": {
"mode": "override",
"custom_text": [
"click here for more info",
"learn more about this topic",
"discover the best practices"
]
}
}
```
### Link Injection Rules
1. **One link per anchor text** - Only link the FIRST occurrence
2. **Case-insensitive search** - Match "Shaft Machining" with "shaft machining"
3. **Preserve HTML structure** - Don't break existing tags
4. **Fallback insertion** - If anchor text not in content, insert it naturally
5. **Config overrides** - Job config can override/append to tier-based defaults
### "See Also" Section Format
```html
See Also
```
### Homepage URL Examples
```
https://example.com/article-slug.html → https://example.com/
https://site.b-cdn.net/my-article.html → https://site.b-cdn.net/
https://www.custom.com/path/to/article.html → https://www.custom.com/
```
## Notes
This story uses existing tier-based anchor text generation. No need to implement anchor text logic from scratch - just import and use the existing functions that handle all edge cases automatically.