# Story 3.2: Find Tiered Links ## Status Complete - QA Approved ## Story **As a developer**, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection. ## Context - Story 3.1 generates URLs for articles in the current batch - Articles are organized in tiers (T1, T2, T3, etc.) where higher tiers link to lower tiers - Tier 1 articles link to the money site (client's actual website) - Tier 2+ articles link to random articles from the tier immediately below - All articles in a batch are from the same project and tier - URLs are generated on-the-fly from `GeneratedContent` records (not stored in DB yet) - The link relationships (which article links to which) will be tracked in Story 4.2 ## Acceptance Criteria ### Core Functionality - A function accepts a batch of `GeneratedContent` records and job configuration - It determines the tier of the batch (all articles in batch are same tier) - **If Tier 1:** - It retrieves the `money_site_url` from the project settings - Returns a single money site URL - **If Tier 2 or higher:** - It queries `GeneratedContent` table for articles from the tier immediately below (e.g., T2 queries T1) - Filters to same project only - Selects random articles from the lower tier - Generates URLs for those articles using `generate_urls_for_batch()` - Returns list of lower-tier URLs - Function signature: `find_tiered_links(content_records: List[GeneratedContent], job_config, project_repo, content_repo, site_repo) -> Dict` ### Link Count Configuration - By default: select 2-4 random lower-tier URLs (random count between 2 and 4) - Job config supports optional `tiered_link_count_range: {min: int, max: int}` - If min == max, always returns exactly that many links (e.g., `{min: 8, max: 8}` returns 8 links) - If min < max, returns random count between min and max (inclusive) - Default if not specified: `{min: 2, max: 4}` ### Return Format - **Tier 1 batches:** `{tier: 1, money_site_url: "https://example.com"}` - **Tier 2+ batches:** `{tier: N, lower_tier_urls: ["https://...", "https://..."], lower_tier: N-1}` ### Error Handling - **Tier 2+ with no lower-tier articles:** Raise error and quit - Error message: "Cannot generate tier {N} batch: no tier {N-1} articles found in project {project_id}" - **Tier 1 with no money_site_url:** Raise error and quit - Error message: "Cannot generate tier 1 batch: money_site_url not set in project {project_id}" - **Fewer lower-tier URLs than min requested:** Log warning and continue - Warning: "Only {count} tier {N-1} articles available, requested min {min}. Using all available." - Returns all available lower-tier URLs even if less than min - **Empty content_records list:** Raise ValueError - **Mixed tiers in content_records:** Raise ValueError ### Logging - INFO: Log tier detection (e.g., "Batch is tier 2, querying tier 1 articles") - INFO: Log link selection (e.g., "Selected 3 random tier 1 URLs from 15 available") - WARNING: If fewer articles available than requested minimum - ERROR: If no lower-tier articles found or money_site_url missing ## Tasks / Subtasks ### 1. Create Article Links Table **Effort:** 2 story points - [ ] Create migration script for `article_links` table: - `id` (primary key, auto-increment) - `from_content_id` (foreign key to generated_content.id, indexed) - `to_content_id` (foreign key to generated_content.id, indexed) - `to_url` (text, nullable - for money site URLs that aren't in our DB) - `link_type` (varchar: "tiered", "wheel_next", "wheel_prev", "homepage") - `created_at` (timestamp) - [ ] Add unique constraint on (from_content_id, to_content_id, link_type) to prevent duplicates - [ ] Create `ArticleLink` model in `src/database/models.py` - [ ] Test migration on development database ### 2. Create Article Links Repository **Effort:** 2 story points - [ ] Create `IArticleLinkRepository` interface in `src/database/interfaces.py`: - `create(from_content_id, to_content_id, to_url, link_type) -> ArticleLink` - `get_by_source_article(from_content_id) -> List[ArticleLink]` - `get_by_target_article(to_content_id) -> List[ArticleLink]` - `get_by_link_type(link_type) -> List[ArticleLink]` - `delete(link_id) -> bool` - [ ] Implement `ArticleLinkRepository` in `src/database/repositories.py` - [ ] Handle both internal links (to_content_id) and external links (to_url for money site) ### 3. Extend Job Configuration Schema **Effort:** 1 story point - [ ] Add `tiered_link_count_range: Optional[Dict]` to job config schema - [ ] Default: `{min: 2, max: 4}` if not specified - [ ] Validation: min >= 1, max >= min - [ ] Example: `{"tiered_link_count_range": {"min": 3, "max": 6}}` ### 4. Add Money Site URL to Project **Effort:** 1 story point - [ ] Add `money_site_url` field to Project model (nullable string, indexed) - [ ] Create migration script to add column to existing projects table - [ ] Update ProjectRepository.create() to accept money_site_url parameter - [ ] Test migration on development database ### 5. Implement Tiered Link Finder **Effort:** 3 story points - [ ] Create new module: `src/interlinking/tiered_links.py` - [ ] Implement `find_tiered_links()` function: - Validate content_records is not empty - Validate all records are same tier - Detect tier from first record - Handle Tier 1 case (money site) - Handle Tier 2+ case (lower-tier articles) - Apply link count range configuration - Generate URLs using `url_generator.generate_urls_for_batch()` - Return formatted result - [ ] Implement `_select_random_count(min_count: int, max_count: int) -> int` helper - [ ] Implement `_validate_batch_tier(content_records: List[GeneratedContent]) -> int` helper ### 6. Unit Tests **Effort:** 4 story points - [ ] Test ArticleLink model creation and relationships - [ ] Test ArticleLinkRepository CRUD operations - [ ] Test duplicate link prevention (unique constraint) - [ ] Test Tier 1 batch returns money_site_url - [ ] Test Tier 1 batch with missing money_site_url raises error - [ ] Test Tier 2 batch queries Tier 1 articles from same project only - [ ] Test Tier 3 batch queries Tier 2 articles - [ ] Test random selection with default range (2-4) - [ ] Test custom link count range from job config - [ ] Test exact count (min == max) - [ ] Test empty content_records raises error - [ ] Test mixed tiers in batch raises error - [ ] Test no lower-tier articles available raises error - [ ] Test fewer lower-tier articles than min logs warning and continues - [ ] Mock GeneratedContent, Project, and URL generation - [ ] Achieve >85% code coverage ### 7. Integration Tests **Effort:** 2 story points - [ ] Test article_links table migration and constraints - [ ] Test full flow with real database: create T1 articles, then query for T2 batch - [ ] Test with multiple projects to verify same-project filtering - [ ] Test URL generation integration with Story 3.1 url_generator - [ ] Test with different link count configurations - [ ] Verify lower-tier article selection is truly random - [ ] Test storing links in article_links table (for Story 3.3/4.2 usage) ## Technical Notes ### Article Links Table Schema ```sql CREATE TABLE article_links ( id INTEGER PRIMARY KEY AUTOINCREMENT, from_content_id INTEGER NOT NULL, to_content_id INTEGER NULL, to_url TEXT NULL, anchor_text TEXT NULL, link_type VARCHAR(20) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE, FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE, UNIQUE (from_content_id, to_content_id, link_type), CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL) ); CREATE INDEX idx_article_links_from ON article_links(from_content_id); CREATE INDEX idx_article_links_to ON article_links(to_content_id); CREATE INDEX idx_article_links_type ON article_links(link_type); ``` **Note:** The `anchor_text` field was added in Story 4.5 to store the actual anchor text used for each link, improving query performance and data integrity. **Link Types:** - `tiered`: Link from tier N article to tier N-1 article (or money site for tier 1) - `wheel_next`: Link to next article in batch wheel - `wheel_prev`: Link to previous article in batch wheel - `homepage`: Link to site homepage **Usage:** - For tier 1 articles linking to money site: `to_content_id = NULL`, `to_url = money_site_url` - For tier 2+ linking to lower tiers: `to_content_id = lower_tier_article.id`, `to_url = NULL` - For wheel/homepage links: `to_content_id = other_article.id`, `to_url = NULL` ### ArticleLink Model ```python class ArticleLink(Base): __tablename__ = "article_links" id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True) from_content_id: Mapped[int] = mapped_column( Integer, ForeignKey('generated_content.id', ondelete='CASCADE'), nullable=False, index=True ) to_content_id: Mapped[Optional[int]] = mapped_column( Integer, ForeignKey('generated_content.id', ondelete='CASCADE'), nullable=True, index=True ) to_url: Mapped[Optional[str]] = mapped_column(Text, nullable=True) anchor_text: Mapped[Optional[str]] = mapped_column(Text, nullable=True) # Added in Story 4.5 link_type: Mapped[str] = mapped_column(String(20), nullable=False, index=True) created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False) ``` ### Project Model Extension ```python # Add to Project model in src/database/models.py class Project(Base): # ... existing fields ... money_site_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True, index=True) ``` ```sql -- Migration script to add money_site_url to projects table ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL; CREATE INDEX idx_projects_money_site_url ON projects(money_site_url); ``` ### ArticleLink Repository Usage Examples ```python # Story 3.3: Record wheel link link_repo.create( from_content_id=article_a.id, to_content_id=article_b.id, to_url=None, anchor_text="Next Article", link_type="wheel_next" ) # Story 4.2: Record tier 1 article linking to money site link_repo.create( from_content_id=tier1_article.id, to_content_id=None, to_url="https://www.moneysite.com", anchor_text="expert services", # Added in Story 4.5 link_type="tiered" ) # Story 4.2: Record tier 2 article linking to tier 1 article link_repo.create( from_content_id=tier2_article.id, to_content_id=tier1_article.id, to_url=None, anchor_text="learn more", # Added in Story 4.5 link_type="tiered" ) # Query all outbound links from an article outbound_links = link_repo.get_by_source_article(article.id) # Query all articles that link TO a specific article inbound_links = link_repo.get_by_target_article(article.id) ``` ### Job Configuration Example ```json { "job_name": "Test Batch", "project_id": 2, "tiered_link_count_range": { "min": 3, "max": 5 }, "tiers": [ { "tier": 2, "article_count": 20 } ] } ``` ### Function Signature ```python def find_tiered_links( content_records: List[GeneratedContent], job_config: JobConfig, project_repo: IProjectRepository, content_repo: IGeneratedContentRepository, site_repo: ISiteDeploymentRepository ) -> Dict: """ Find tiered links for a batch of articles Args: content_records: Batch of articles (all same tier, same project) job_config: Job configuration with optional link count range project_repo: For retrieving money_site_url content_repo: For querying lower-tier articles site_repo: For URL generation Returns: Tier 1: {tier: 1, money_site_url: "https://..."} Tier 2+: {tier: N, lower_tier_urls: [...], lower_tier: N-1} Raises: ValueError: If batch is invalid or required data is missing """ pass ``` ### Implementation Example ```python import random import logging from typing import List, Dict from src.database.models import GeneratedContent from src.generation.url_generator import generate_urls_for_batch logger = logging.getLogger(__name__) def find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo): if not content_records: raise ValueError("content_records cannot be empty") tier = _validate_batch_tier(content_records) project_id = content_records[0].project_id logger.info(f"Finding tiered links for tier {tier} batch (project {project_id})") if tier == 1: project = project_repo.get_by_id(project_id) if not project or not project.money_site_url: raise ValueError( f"Cannot generate tier 1 batch: money_site_url not set in project {project_id}" ) return { "tier": 1, "money_site_url": project.money_site_url } lower_tier = tier - 1 logger.info(f"Batch is tier {tier}, querying tier {lower_tier} articles") lower_tier_articles = content_repo.get_by_project_and_tier(project_id, lower_tier) if not lower_tier_articles: raise ValueError( f"Cannot generate tier {tier} batch: no tier {lower_tier} articles found in project {project_id}" ) link_range = job_config.get("tiered_link_count_range", {"min": 2, "max": 4}) min_count = link_range["min"] max_count = link_range["max"] available_count = len(lower_tier_articles) desired_count = random.randint(min_count, max_count) if available_count < min_count: logger.warning( f"Only {available_count} tier {lower_tier} articles available, " f"requested min {min_count}. Using all available." ) selected_articles = lower_tier_articles else: actual_count = min(desired_count, available_count) selected_articles = random.sample(lower_tier_articles, actual_count) logger.info( f"Selected {len(selected_articles)} random tier {lower_tier} URLs " f"from {available_count} available" ) url_mappings = generate_urls_for_batch(selected_articles, site_repo) lower_tier_urls = [mapping["url"] for mapping in url_mappings] return { "tier": tier, "lower_tier": lower_tier, "lower_tier_urls": lower_tier_urls } def _validate_batch_tier(content_records: List[GeneratedContent]) -> int: tiers = set(record.tier for record in content_records) if len(tiers) > 1: raise ValueError(f"All articles in batch must be same tier, found: {tiers}") return int(list(tiers)[0]) ``` ### Database Queries Needed ```python def get_by_project_and_tier(self, project_id: int, tier: int) -> List[GeneratedContent]: """ Get all articles for a specific project and tier Returns articles that have site_deployment_id set (from Story 3.1) """ return self.session.query(GeneratedContent)\ .filter( GeneratedContent.project_id == project_id, GeneratedContent.tier == tier, GeneratedContent.site_deployment_id.isnot(None) )\ .all() ``` ### Return Value Examples ```python # Tier 1 batch { "tier": 1, "money_site_url": "https://www.mymoneysite.com" } # Tier 2 batch { "tier": 2, "lower_tier": 1, "lower_tier_urls": [ "https://site1.b-cdn.net/article-title-1.html", "https://www.customdomain.com/article-title-2.html", "https://site2.b-cdn.net/article-title-3.html" ] } # Tier 3 batch with custom range (8 links) { "tier": 3, "lower_tier": 2, "lower_tier_urls": [ "https://site3.b-cdn.net/...", "https://site4.b-cdn.net/...", # ... 6 more URLs ] } ``` ## Dependencies - Story 3.1: Site assignment and URL generation must be complete - Story 2.3: GeneratedContent records exist in database - Story 1.x: Project and GeneratedContent tables exist ## Future Considerations - Story 3.3 will use the tiered links found by this module for actual content injection - Story 3.3 will populate article_links table with wheel and homepage link relationships - Story 4.2 will use article_links table to log tiered link relationships after deployment - Future: Intelligent link distribution (ensure even link spread across lower-tier articles) - Future: Analytics dashboard showing link structure and tier relationships using article_links table ## Link Relationship Tracking This story creates the `article_links` table infrastructure. The actual population of link relationships will happen in: - **Story 3.3**: Stores wheel and homepage links when injecting them into content - **Story 4.2**: Stores tiered links when logging final URLs after deployment - The table enables future analytics on link distribution, tier structure, and interlinking patterns ## Total Effort 16 story points