Big-Link-Man/docs/stories/story-3.2-find-tiered-links.md

16 KiB

Story 3.2: Find Tiered Links

Status

Accepted

Story

As a developer, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection.

Context

  • Story 3.1 generates URLs for articles in the current batch
  • Articles are organized in tiers (T1, T2, T3, etc.) where higher tiers link to lower tiers
  • Tier 1 articles link to the money site (client's actual website)
  • Tier 2+ articles link to random articles from the tier immediately below
  • All articles in a batch are from the same project and tier
  • URLs are generated on-the-fly from GeneratedContent records (not stored in DB yet)
  • The link relationships (which article links to which) will be tracked in Story 4.2

Acceptance Criteria

Core Functionality

  • A function accepts a batch of GeneratedContent records and job configuration
  • It determines the tier of the batch (all articles in batch are same tier)
  • If Tier 1:
    • It retrieves the money_site_url from the project settings
    • Returns a single money site URL
  • If Tier 2 or higher:
    • It queries GeneratedContent table for articles from the tier immediately below (e.g., T2 queries T1)
    • Filters to same project only
    • Selects random articles from the lower tier
    • Generates URLs for those articles using generate_urls_for_batch()
    • Returns list of lower-tier URLs
  • Function signature: find_tiered_links(content_records: List[GeneratedContent], job_config, project_repo, content_repo, site_repo) -> Dict
  • By default: select 2-4 random lower-tier URLs (random count between 2 and 4)
  • Job config supports optional tiered_link_count_range: {min: int, max: int}
  • If min == max, always returns exactly that many links (e.g., {min: 8, max: 8} returns 8 links)
  • If min < max, returns random count between min and max (inclusive)
  • Default if not specified: {min: 2, max: 4}

Return Format

  • Tier 1 batches: {tier: 1, money_site_url: "https://example.com"}
  • Tier 2+ batches: {tier: N, lower_tier_urls: ["https://...", "https://..."], lower_tier: N-1}

Error Handling

  • Tier 2+ with no lower-tier articles: Raise error and quit
    • Error message: "Cannot generate tier {N} batch: no tier {N-1} articles found in project {project_id}"
  • Tier 1 with no money_site_url: Raise error and quit
    • Error message: "Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
  • Fewer lower-tier URLs than min requested: Log warning and continue
    • Warning: "Only {count} tier {N-1} articles available, requested min {min}. Using all available."
    • Returns all available lower-tier URLs even if less than min
  • Empty content_records list: Raise ValueError
  • Mixed tiers in content_records: Raise ValueError

Logging

  • INFO: Log tier detection (e.g., "Batch is tier 2, querying tier 1 articles")
  • INFO: Log link selection (e.g., "Selected 3 random tier 1 URLs from 15 available")
  • WARNING: If fewer articles available than requested minimum
  • ERROR: If no lower-tier articles found or money_site_url missing

Tasks / Subtasks

1. Create Article Links Table

Effort: 2 story points

  • Create migration script for article_links table:
    • id (primary key, auto-increment)
    • from_content_id (foreign key to generated_content.id, indexed)
    • to_content_id (foreign key to generated_content.id, indexed)
    • to_url (text, nullable - for money site URLs that aren't in our DB)
    • link_type (varchar: "tiered", "wheel_next", "wheel_prev", "homepage")
    • created_at (timestamp)
  • Add unique constraint on (from_content_id, to_content_id, link_type) to prevent duplicates
  • Create ArticleLink model in src/database/models.py
  • Test migration on development database

2. Create Article Links Repository

Effort: 2 story points

  • Create IArticleLinkRepository interface in src/database/interfaces.py:
    • create(from_content_id, to_content_id, to_url, link_type) -> ArticleLink
    • get_by_source_article(from_content_id) -> List[ArticleLink]
    • get_by_target_article(to_content_id) -> List[ArticleLink]
    • get_by_link_type(link_type) -> List[ArticleLink]
    • delete(link_id) -> bool
  • Implement ArticleLinkRepository in src/database/repositories.py
  • Handle both internal links (to_content_id) and external links (to_url for money site)

3. Extend Job Configuration Schema

Effort: 1 story point

  • Add tiered_link_count_range: Optional[Dict] to job config schema
  • Default: {min: 2, max: 4} if not specified
  • Validation: min >= 1, max >= min
  • Example: {"tiered_link_count_range": {"min": 3, "max": 6}}

4. Add Money Site URL to Project

Effort: 1 story point

  • Add money_site_url field to Project model (nullable string, indexed)
  • Create migration script to add column to existing projects table
  • Update ProjectRepository.create() to accept money_site_url parameter
  • Test migration on development database

Effort: 3 story points

  • Create new module: src/interlinking/tiered_links.py
  • Implement find_tiered_links() function:
    • Validate content_records is not empty
    • Validate all records are same tier
    • Detect tier from first record
    • Handle Tier 1 case (money site)
    • Handle Tier 2+ case (lower-tier articles)
    • Apply link count range configuration
    • Generate URLs using url_generator.generate_urls_for_batch()
    • Return formatted result
  • Implement _select_random_count(min_count: int, max_count: int) -> int helper
  • Implement _validate_batch_tier(content_records: List[GeneratedContent]) -> int helper

6. Unit Tests

Effort: 4 story points

  • Test ArticleLink model creation and relationships
  • Test ArticleLinkRepository CRUD operations
  • Test duplicate link prevention (unique constraint)
  • Test Tier 1 batch returns money_site_url
  • Test Tier 1 batch with missing money_site_url raises error
  • Test Tier 2 batch queries Tier 1 articles from same project only
  • Test Tier 3 batch queries Tier 2 articles
  • Test random selection with default range (2-4)
  • Test custom link count range from job config
  • Test exact count (min == max)
  • Test empty content_records raises error
  • Test mixed tiers in batch raises error
  • Test no lower-tier articles available raises error
  • Test fewer lower-tier articles than min logs warning and continues
  • Mock GeneratedContent, Project, and URL generation
  • Achieve >85% code coverage

7. Integration Tests

Effort: 2 story points

  • Test article_links table migration and constraints
  • Test full flow with real database: create T1 articles, then query for T2 batch
  • Test with multiple projects to verify same-project filtering
  • Test URL generation integration with Story 3.1 url_generator
  • Test with different link count configurations
  • Verify lower-tier article selection is truly random
  • Test storing links in article_links table (for Story 3.3/4.2 usage)

Technical Notes

Article Links Table Schema

CREATE TABLE article_links (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    from_content_id INTEGER NOT NULL,
    to_content_id INTEGER NULL,
    to_url TEXT NULL,
    link_type VARCHAR(20) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
    FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
    UNIQUE (from_content_id, to_content_id, link_type),
    CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL)
);

CREATE INDEX idx_article_links_from ON article_links(from_content_id);
CREATE INDEX idx_article_links_to ON article_links(to_content_id);
CREATE INDEX idx_article_links_type ON article_links(link_type);

Link Types:

  • tiered: Link from tier N article to tier N-1 article (or money site for tier 1)
  • wheel_next: Link to next article in batch wheel
  • wheel_prev: Link to previous article in batch wheel
  • homepage: Link to site homepage

Usage:

  • For tier 1 articles linking to money site: to_content_id = NULL, to_url = money_site_url
  • For tier 2+ linking to lower tiers: to_content_id = lower_tier_article.id, to_url = NULL
  • For wheel/homepage links: to_content_id = other_article.id, to_url = NULL
class ArticleLink(Base):
    __tablename__ = "article_links"
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    from_content_id: Mapped[int] = mapped_column(
        Integer, 
        ForeignKey('generated_content.id', ondelete='CASCADE'), 
        nullable=False, 
        index=True
    )
    to_content_id: Mapped[Optional[int]] = mapped_column(
        Integer, 
        ForeignKey('generated_content.id', ondelete='CASCADE'), 
        nullable=True, 
        index=True
    )
    to_url: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
    link_type: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
    created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)

Project Model Extension

# Add to Project model in src/database/models.py
class Project(Base):
    # ... existing fields ...
    money_site_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True, index=True)
-- Migration script to add money_site_url to projects table
ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL;
CREATE INDEX idx_projects_money_site_url ON projects(money_site_url);
# Story 3.3: Record wheel link
link_repo.create(
    from_content_id=article_a.id,
    to_content_id=article_b.id,
    to_url=None,
    link_type="wheel_next"
)

# Story 4.2: Record tier 1 article linking to money site
link_repo.create(
    from_content_id=tier1_article.id,
    to_content_id=None,
    to_url="https://www.moneysite.com",
    link_type="tiered"
)

# Story 4.2: Record tier 2 article linking to tier 1 article
link_repo.create(
    from_content_id=tier2_article.id,
    to_content_id=tier1_article.id,
    to_url=None,
    link_type="tiered"
)

# Query all outbound links from an article
outbound_links = link_repo.get_by_source_article(article.id)

# Query all articles that link TO a specific article
inbound_links = link_repo.get_by_target_article(article.id)

Job Configuration Example

{
  "job_name": "Test Batch",
  "project_id": 2,
  "tiered_link_count_range": {
    "min": 3,
    "max": 5
  },
  "tiers": [
    {
      "tier": 2,
      "article_count": 20
    }
  ]
}

Function Signature

def find_tiered_links(
    content_records: List[GeneratedContent],
    job_config: JobConfig,
    project_repo: IProjectRepository,
    content_repo: IGeneratedContentRepository,
    site_repo: ISiteDeploymentRepository
) -> Dict:
    """
    Find tiered links for a batch of articles
    
    Args:
        content_records: Batch of articles (all same tier, same project)
        job_config: Job configuration with optional link count range
        project_repo: For retrieving money_site_url
        content_repo: For querying lower-tier articles
        site_repo: For URL generation
    
    Returns:
        Tier 1: {tier: 1, money_site_url: "https://..."}
        Tier 2+: {tier: N, lower_tier_urls: [...], lower_tier: N-1}
    
    Raises:
        ValueError: If batch is invalid or required data is missing
    """
    pass

Implementation Example

import random
import logging
from typing import List, Dict
from src.database.models import GeneratedContent
from src.generation.url_generator import generate_urls_for_batch

logger = logging.getLogger(__name__)

def find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo):
    if not content_records:
        raise ValueError("content_records cannot be empty")
    
    tier = _validate_batch_tier(content_records)
    project_id = content_records[0].project_id
    
    logger.info(f"Finding tiered links for tier {tier} batch (project {project_id})")
    
    if tier == 1:
        project = project_repo.get_by_id(project_id)
        if not project or not project.money_site_url:
            raise ValueError(
                f"Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
            )
        return {
            "tier": 1,
            "money_site_url": project.money_site_url
        }
    
    lower_tier = tier - 1
    logger.info(f"Batch is tier {tier}, querying tier {lower_tier} articles")
    
    lower_tier_articles = content_repo.get_by_project_and_tier(project_id, lower_tier)
    
    if not lower_tier_articles:
        raise ValueError(
            f"Cannot generate tier {tier} batch: no tier {lower_tier} articles found in project {project_id}"
        )
    
    link_range = job_config.get("tiered_link_count_range", {"min": 2, "max": 4})
    min_count = link_range["min"]
    max_count = link_range["max"]
    
    available_count = len(lower_tier_articles)
    desired_count = random.randint(min_count, max_count)
    
    if available_count < min_count:
        logger.warning(
            f"Only {available_count} tier {lower_tier} articles available, "
            f"requested min {min_count}. Using all available."
        )
        selected_articles = lower_tier_articles
    else:
        actual_count = min(desired_count, available_count)
        selected_articles = random.sample(lower_tier_articles, actual_count)
    
    logger.info(
        f"Selected {len(selected_articles)} random tier {lower_tier} URLs "
        f"from {available_count} available"
    )
    
    url_mappings = generate_urls_for_batch(selected_articles, site_repo)
    lower_tier_urls = [mapping["url"] for mapping in url_mappings]
    
    return {
        "tier": tier,
        "lower_tier": lower_tier,
        "lower_tier_urls": lower_tier_urls
    }

def _validate_batch_tier(content_records: List[GeneratedContent]) -> int:
    tiers = set(record.tier for record in content_records)
    if len(tiers) > 1:
        raise ValueError(f"All articles in batch must be same tier, found: {tiers}")
    return int(list(tiers)[0])

Database Queries Needed

def get_by_project_and_tier(self, project_id: int, tier: int) -> List[GeneratedContent]:
    """
    Get all articles for a specific project and tier
    
    Returns articles that have site_deployment_id set (from Story 3.1)
    """
    return self.session.query(GeneratedContent)\
        .filter(
            GeneratedContent.project_id == project_id,
            GeneratedContent.tier == tier,
            GeneratedContent.site_deployment_id.isnot(None)
        )\
        .all()

Return Value Examples

# Tier 1 batch
{
    "tier": 1,
    "money_site_url": "https://www.mymoneysite.com"
}

# Tier 2 batch
{
    "tier": 2,
    "lower_tier": 1,
    "lower_tier_urls": [
        "https://site1.b-cdn.net/article-title-1.html",
        "https://www.customdomain.com/article-title-2.html",
        "https://site2.b-cdn.net/article-title-3.html"
    ]
}

# Tier 3 batch with custom range (8 links)
{
    "tier": 3,
    "lower_tier": 2,
    "lower_tier_urls": [
        "https://site3.b-cdn.net/...",
        "https://site4.b-cdn.net/...",
        # ... 6 more URLs
    ]
}

Dependencies

  • Story 3.1: Site assignment and URL generation must be complete
  • Story 2.3: GeneratedContent records exist in database
  • Story 1.x: Project and GeneratedContent tables exist

Future Considerations

  • Story 3.3 will use the tiered links found by this module for actual content injection
  • Story 3.3 will populate article_links table with wheel and homepage link relationships
  • Story 4.2 will use article_links table to log tiered link relationships after deployment
  • Future: Intelligent link distribution (ensure even link spread across lower-tier articles)
  • Future: Analytics dashboard showing link structure and tier relationships using article_links table

This story creates the article_links table infrastructure. The actual population of link relationships will happen in:

  • Story 3.3: Stores wheel and homepage links when injecting them into content
  • Story 4.2: Stores tiered links when logging final URLs after deployment
  • The table enables future analytics on link distribution, tier structure, and interlinking patterns

Total Effort

16 story points