# Story 3.3: Content Interlinking Injection ## Status ✅ **COMPLETE** - Implemented, Integrated, and Tested ## Summary This story injects three types of links into article HTML: 1. **Tiered Links** - T1 articles link to money site, T2+ link to lower-tier articles 2. **Homepage Links** - Link to the site's homepage (base domain) 3. **"See Also" Section** - Links to all other articles in the batch Uses existing `anchor_text_generator.py` for tier-based anchor text with support for job config overrides (default/override/append modes). ## Story **As a developer**, I want to inject all required links (batch "wheel", home page, and tiered/money site) into each new article's HTML content, so that the articles are fully interlinked and ready for deployment. ## Context - Story 3.1 generates final URLs for all articles in the batch - Story 3.2 finds the required tiered links (money site or lower-tier URLs) - Articles have raw HTML content from Epic 2 (h2, h3, p tags) - Project contains anchor text lists for each tier - Articles need wheel links (next/previous), homepage links, and tiered links ## Acceptance Criteria ### Core Functionality - A function takes raw HTML content, URL list, tiered links, and project data - **Wheel Links:** Each article gets "next" and "previous" links to other articles in the batch - Last article's "next" links to first article (circular) - First article's "previous" links to last article (circular) - **Homepage Links:** Each article gets a link to its site's homepage - **Tiered Links:** Articles get links based on their tier - Tier 1: Links to money site using T1 anchor text - Tier 2+: Links to lower-tier articles using appropriate tier anchor text ### Input Requirements - Raw HTML content (from Epic 2) - List of article URLs with titles (from Story 3.1) - Tiered links object (from Story 3.2) - Project data (for anchor text lists) - Batch tier information ### Output Requirements - Final HTML content with all links injected - Updated content stored in database - Link relationships recorded in `article_links` table ## Implementation Details ### Anchor Text Generation **RESOLVED:** Use existing `src/interlinking/anchor_text_generator.py` with job config overrides - **Default tier-based anchor text:** - Tier 1: Uses main keyword variations - Tier 2: Uses related searches - Tier 3: Uses main keyword variations - Tier 4+: Uses entities - **Job config overrides via `anchor_text_config`:** - `mode: "default"` - Use tier-based defaults - `mode: "override"` - Replace defaults with `custom_text` list - `mode: "append"` - Add `custom_text` to tier-based defaults - Import and use `get_anchor_text_for_tier()` function ### Homepage URL Generation **RESOLVED:** Remove the slug after `/` from the article URL - Example: `https://site.com/article-slug.html` → `https://site.com/` - Use base domain as homepage URL ### Link Placement Strategy #### Tiered Links (Money Site / Lower Tier) 1. **First Priority:** Find anchor text already in the document - Search for anchor text in HTML content - Add link to FIRST match only (prevent duplicate links) - Case-insensitive matching 2. **Fallback:** If anchor text not found in document - Insert anchor text into a sentence in the article - Make it a link to the target URL #### Wheel Links (See Also Section) - Add a "See Also" section after the last paragraph - Format as heading + unordered list - Include ALL other articles in the batch (excluding current article) - Each list item is an article title as a link - Example: ```html

See Also

``` #### Homepage Links - Same as tiered links: find anchor text in content or insert it - Link to site homepage (base domain) ## Implementation Approach ### Function Signature ```python def inject_interlinks( content_records: List[GeneratedContent], article_urls: List[Dict], # [{content_id, title, url}, ...] tiered_links: Dict, # From Story 3.2 project: Project, content_repo: GeneratedContentRepository, link_repo: ArticleLinkRepository ) -> None: # Updates content in database ``` ### Processing Flow 1. For each article in the batch: a. Load its raw HTML content b. Generate tier-appropriate anchor text using `get_anchor_text_for_tier()` c. Inject tiered links (money site or lower tier) d. Inject homepage link e. Inject wheel links ("See Also" section) f. Update content in database g. Record all links in `article_links` table ### Link Injection Details #### Tiered Link Injection ```python # Get anchor text for this tier from src.interlinking.anchor_text_generator import get_anchor_text_for_tier # Get default tier-based anchor text default_anchors = get_anchor_text_for_tier(tier, project, count=5) # Apply job config overrides if present if job_config.anchor_text_config: if job_config.anchor_text_config.mode == "override": anchor_texts = job_config.anchor_text_config.custom_text or default_anchors elif job_config.anchor_text_config.mode == "append": anchor_texts = default_anchors + (job_config.anchor_text_config.custom_text or []) else: # "default" anchor_texts = default_anchors else: anchor_texts = default_anchors # For each anchor text: for anchor_text in anchor_texts: if anchor_text in html_content (case-insensitive): # Wrap FIRST occurrence with link html_content = wrap_first_occurrence(html_content, anchor_text, target_url) break else: # Insert anchor text + link into a paragraph html_content = insert_link_into_content(html_content, anchor_text, target_url) ``` #### Homepage Link Injection ```python # Derive homepage URL homepage_url = extract_base_url(article_url) # https://site.com/article.html → https://site.com/ # Use main keyword as anchor text anchor_text = project.main_keyword # Find or insert link (same strategy as tiered links) ``` #### Wheel Link Injection ```python # Build "See Also" section with ALL other articles in batch other_articles = [a for a in article_urls if a['content_id'] != current_article.id] see_also_html = "

See Also

\n\n" # Append after last paragraph (before closing tags) html_content = insert_before_closing_tags(html_content, see_also_html) ``` ### Database Updates - Update `GeneratedContent.content` with final HTML - Create `ArticleLink` records for all injected links: - `link_type="tiered"` for money site / lower tier links - `link_type="homepage"` for homepage links - `link_type="wheel_see_also"` for "See Also" section links - Track both internal (`to_content_id`) and external (`to_url`) links **Note:** The "See Also" section replaces the previous wheel_next/wheel_prev concept. Each article links to all other articles in the batch via the "See Also" section. ## Tasks / Subtasks ### 1. Create Content Injection Module **Effort:** 3 story points - [ ] Create `src/interlinking/content_injection.py` - [ ] Implement `inject_interlinks()` main function - [ ] Implement "See Also" section builder (all batch articles) - [ ] Implement homepage URL extraction (base domain) - [ ] Implement tiered link injection with anchor text matching ### 2. Anchor Text Processing **Effort:** 2 story points - [ ] Import `get_anchor_text_for_tier()` from existing module - [ ] Apply job config `anchor_text_config` overrides (default/override/append) - [ ] Implement case-insensitive anchor text search in HTML - [ ] Wrap first occurrence of anchor text with link - [ ] Implement fallback: insert anchor text + link if not found in content ### 3. HTML Link Injection **Effort:** 2 story points - [ ] Implement safe HTML parsing (avoid breaking existing tags) - [ ] Implement link insertion before closing article/body tags - [ ] Ensure proper link formatting (`text`) - [ ] Handle edge cases (empty content, malformed HTML) - [ ] Preserve HTML structure and formatting ### 4. Database Integration **Effort:** 2 story points - [ ] Update `GeneratedContent.content` with final HTML - [ ] Create `ArticleLink` records for all links - [ ] Handle both internal (content_id) and external (URL) links - [ ] Ensure proper link type categorization ### 5. Unit Tests **Effort:** 3 story points - [ ] Test "See Also" section generation (all batch articles) - [ ] Test homepage URL extraction (remove slug after `/`) - [ ] Test tiered link injection for T1 (money site) and T2+ (lower tier) - [ ] Test anchor text config modes: default, override, append - [ ] Test case-insensitive anchor text matching (first occurrence only) - [ ] Test fallback anchor text insertion when not found in content - [ ] Test HTML structure preservation after link injection - [ ] Test database record creation (ArticleLink for all link types) - [ ] Test with different tier configurations (T1, T2, T3, T4+) ### 6. Integration Tests **Effort:** 2 story points - [ ] Test full flow: Story 3.1 URLs → Story 3.2 tiered links → Story 3.3 injection - [ ] Test with different batch sizes (5, 10, 20 articles) - [ ] Test with various HTML content structures - [ ] Verify link relationships in `article_links` table - [ ] Test with different tiers and project configurations - [ ] Verify final HTML is deployable (well-formed) ## Dependencies - Story 3.1: URL generation must be complete - Story 3.2: Tiered link finding must be complete - Story 2.3: Generated content must exist - Story 1.x: Project and database models must exist ## Future Considerations - Story 4.x will use the final HTML content for deployment - Analytics dashboard will use `article_links` data - Future: Advanced link placement strategies - Future: Link density optimization ## Total Effort 14 story points ## Technical Notes ### Existing Code to Use ```python # Use existing anchor text generator from src.interlinking.anchor_text_generator import get_anchor_text_for_tier # Example usage - Default tier-based anchor_texts = get_anchor_text_for_tier("tier1", project, count=5) # Returns: ["shaft machining", "learn about shaft machining", "shaft machining guide", ...] # Example usage - With job config override if job_config.anchor_text_config: if job_config.anchor_text_config.mode == "override": anchor_texts = job_config.anchor_text_config.custom_text # Returns: ["click here for more info", "learn more about this topic", ...] elif job_config.anchor_text_config.mode == "append": anchor_texts = default_anchors + job_config.anchor_text_config.custom_text # Returns: ["shaft machining", "learn about...", "click here...", ...] ``` ### Anchor Text Configuration (Job Config) Job configuration supports three modes for anchor text: ```json { "anchor_text_config": { "mode": "default|override|append", "custom_text": ["anchor 1", "anchor 2", ...] } } ``` **Modes:** - `default`: Use tier-based anchor text from `anchor_text_generator.py` - `override`: Replace tier-based anchors with `custom_text` list - `append`: Add `custom_text` to tier-based anchors **Example - Override Mode:** ```json { "anchor_text_config": { "mode": "override", "custom_text": [ "click here for more info", "learn more about this topic", "discover the best practices" ] } } ``` ### Link Injection Rules 1. **One link per anchor text** - Only link the FIRST occurrence 2. **Case-insensitive search** - Match "Shaft Machining" with "shaft machining" 3. **Preserve HTML structure** - Don't break existing tags 4. **Fallback insertion** - If anchor text not in content, insert it naturally 5. **Config overrides** - Job config can override/append to tier-based defaults ### "See Also" Section Format ```html

See Also

``` ### Homepage URL Examples ``` https://example.com/article-slug.html → https://example.com/ https://site.b-cdn.net/my-article.html → https://site.b-cdn.net/ https://www.custom.com/path/to/article.html → https://www.custom.com/ ``` ## Notes This story uses existing tier-based anchor text generation. No need to implement anchor text logic from scratch - just import and use the existing functions that handle all edge cases automatically.