17 KiB
17 KiB
Story 3.2: Find Tiered Links
Status
Complete - QA Approved
Story
As a developer, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection.
Context
- Story 3.1 generates URLs for articles in the current batch
- Articles are organized in tiers (T1, T2, T3, etc.) where higher tiers link to lower tiers
- Tier 1 articles link to the money site (client's actual website)
- Tier 2+ articles link to random articles from the tier immediately below
- All articles in a batch are from the same project and tier
- URLs are generated on-the-fly from
GeneratedContentrecords (not stored in DB yet) - The link relationships (which article links to which) will be tracked in Story 4.2
Acceptance Criteria
Core Functionality
- A function accepts a batch of
GeneratedContentrecords and job configuration - It determines the tier of the batch (all articles in batch are same tier)
- If Tier 1:
- It retrieves the
money_site_urlfrom the project settings - Returns a single money site URL
- It retrieves the
- If Tier 2 or higher:
- It queries
GeneratedContenttable for articles from the tier immediately below (e.g., T2 queries T1) - Filters to same project only
- Selects random articles from the lower tier
- Generates URLs for those articles using
generate_urls_for_batch() - Returns list of lower-tier URLs
- It queries
- Function signature:
find_tiered_links(content_records: List[GeneratedContent], job_config, project_repo, content_repo, site_repo) -> Dict
Link Count Configuration
- By default: select 2-4 random lower-tier URLs (random count between 2 and 4)
- Job config supports optional
tiered_link_count_range: {min: int, max: int} - If min == max, always returns exactly that many links (e.g.,
{min: 8, max: 8}returns 8 links) - If min < max, returns random count between min and max (inclusive)
- Default if not specified:
{min: 2, max: 4}
Return Format
- Tier 1 batches:
{tier: 1, money_site_url: "https://example.com"} - Tier 2+ batches:
{tier: N, lower_tier_urls: ["https://...", "https://..."], lower_tier: N-1}
Error Handling
- Tier 2+ with no lower-tier articles: Raise error and quit
- Error message: "Cannot generate tier {N} batch: no tier {N-1} articles found in project {project_id}"
- Tier 1 with no money_site_url: Raise error and quit
- Error message: "Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
- Fewer lower-tier URLs than min requested: Log warning and continue
- Warning: "Only {count} tier {N-1} articles available, requested min {min}. Using all available."
- Returns all available lower-tier URLs even if less than min
- Empty content_records list: Raise ValueError
- Mixed tiers in content_records: Raise ValueError
Logging
- INFO: Log tier detection (e.g., "Batch is tier 2, querying tier 1 articles")
- INFO: Log link selection (e.g., "Selected 3 random tier 1 URLs from 15 available")
- WARNING: If fewer articles available than requested minimum
- ERROR: If no lower-tier articles found or money_site_url missing
Tasks / Subtasks
1. Create Article Links Table
Effort: 2 story points
- Create migration script for
article_linkstable:id(primary key, auto-increment)from_content_id(foreign key to generated_content.id, indexed)to_content_id(foreign key to generated_content.id, indexed)to_url(text, nullable - for money site URLs that aren't in our DB)link_type(varchar: "tiered", "wheel_next", "wheel_prev", "homepage")created_at(timestamp)
- Add unique constraint on (from_content_id, to_content_id, link_type) to prevent duplicates
- Create
ArticleLinkmodel insrc/database/models.py - Test migration on development database
2. Create Article Links Repository
Effort: 2 story points
- Create
IArticleLinkRepositoryinterface insrc/database/interfaces.py:create(from_content_id, to_content_id, to_url, link_type) -> ArticleLinkget_by_source_article(from_content_id) -> List[ArticleLink]get_by_target_article(to_content_id) -> List[ArticleLink]get_by_link_type(link_type) -> List[ArticleLink]delete(link_id) -> bool
- Implement
ArticleLinkRepositoryinsrc/database/repositories.py - Handle both internal links (to_content_id) and external links (to_url for money site)
3. Extend Job Configuration Schema
Effort: 1 story point
- Add
tiered_link_count_range: Optional[Dict]to job config schema - Default:
{min: 2, max: 4}if not specified - Validation: min >= 1, max >= min
- Example:
{"tiered_link_count_range": {"min": 3, "max": 6}}
4. Add Money Site URL to Project
Effort: 1 story point
- Add
money_site_urlfield to Project model (nullable string, indexed) - Create migration script to add column to existing projects table
- Update ProjectRepository.create() to accept money_site_url parameter
- Test migration on development database
5. Implement Tiered Link Finder
Effort: 3 story points
- Create new module:
src/interlinking/tiered_links.py - Implement
find_tiered_links()function:- Validate content_records is not empty
- Validate all records are same tier
- Detect tier from first record
- Handle Tier 1 case (money site)
- Handle Tier 2+ case (lower-tier articles)
- Apply link count range configuration
- Generate URLs using
url_generator.generate_urls_for_batch() - Return formatted result
- Implement
_select_random_count(min_count: int, max_count: int) -> inthelper - Implement
_validate_batch_tier(content_records: List[GeneratedContent]) -> inthelper
6. Unit Tests
Effort: 4 story points
- Test ArticleLink model creation and relationships
- Test ArticleLinkRepository CRUD operations
- Test duplicate link prevention (unique constraint)
- Test Tier 1 batch returns money_site_url
- Test Tier 1 batch with missing money_site_url raises error
- Test Tier 2 batch queries Tier 1 articles from same project only
- Test Tier 3 batch queries Tier 2 articles
- Test random selection with default range (2-4)
- Test custom link count range from job config
- Test exact count (min == max)
- Test empty content_records raises error
- Test mixed tiers in batch raises error
- Test no lower-tier articles available raises error
- Test fewer lower-tier articles than min logs warning and continues
- Mock GeneratedContent, Project, and URL generation
- Achieve >85% code coverage
7. Integration Tests
Effort: 2 story points
- Test article_links table migration and constraints
- Test full flow with real database: create T1 articles, then query for T2 batch
- Test with multiple projects to verify same-project filtering
- Test URL generation integration with Story 3.1 url_generator
- Test with different link count configurations
- Verify lower-tier article selection is truly random
- Test storing links in article_links table (for Story 3.3/4.2 usage)
Technical Notes
Article Links Table Schema
CREATE TABLE article_links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_content_id INTEGER NOT NULL,
to_content_id INTEGER NULL,
to_url TEXT NULL,
anchor_text TEXT NULL,
link_type VARCHAR(20) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
UNIQUE (from_content_id, to_content_id, link_type),
CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL)
);
CREATE INDEX idx_article_links_from ON article_links(from_content_id);
CREATE INDEX idx_article_links_to ON article_links(to_content_id);
CREATE INDEX idx_article_links_type ON article_links(link_type);
Note: The anchor_text field was added in Story 4.5 to store the actual anchor text used for each link, improving query performance and data integrity.
Link Types:
tiered: Link from tier N article to tier N-1 article (or money site for tier 1)wheel_next: Link to next article in batch wheelwheel_prev: Link to previous article in batch wheelhomepage: Link to site homepage
Usage:
- For tier 1 articles linking to money site:
to_content_id = NULL,to_url = money_site_url - For tier 2+ linking to lower tiers:
to_content_id = lower_tier_article.id,to_url = NULL - For wheel/homepage links:
to_content_id = other_article.id,to_url = NULL
ArticleLink Model
class ArticleLink(Base):
__tablename__ = "article_links"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
from_content_id: Mapped[int] = mapped_column(
Integer,
ForeignKey('generated_content.id', ondelete='CASCADE'),
nullable=False,
index=True
)
to_content_id: Mapped[Optional[int]] = mapped_column(
Integer,
ForeignKey('generated_content.id', ondelete='CASCADE'),
nullable=True,
index=True
)
to_url: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
anchor_text: Mapped[Optional[str]] = mapped_column(Text, nullable=True) # Added in Story 4.5
link_type: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
Project Model Extension
# Add to Project model in src/database/models.py
class Project(Base):
# ... existing fields ...
money_site_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True, index=True)
-- Migration script to add money_site_url to projects table
ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL;
CREATE INDEX idx_projects_money_site_url ON projects(money_site_url);
ArticleLink Repository Usage Examples
# Story 3.3: Record wheel link
link_repo.create(
from_content_id=article_a.id,
to_content_id=article_b.id,
to_url=None,
anchor_text="Next Article",
link_type="wheel_next"
)
# Story 4.2: Record tier 1 article linking to money site
link_repo.create(
from_content_id=tier1_article.id,
to_content_id=None,
to_url="https://www.moneysite.com",
anchor_text="expert services", # Added in Story 4.5
link_type="tiered"
)
# Story 4.2: Record tier 2 article linking to tier 1 article
link_repo.create(
from_content_id=tier2_article.id,
to_content_id=tier1_article.id,
to_url=None,
anchor_text="learn more", # Added in Story 4.5
link_type="tiered"
)
# Query all outbound links from an article
outbound_links = link_repo.get_by_source_article(article.id)
# Query all articles that link TO a specific article
inbound_links = link_repo.get_by_target_article(article.id)
Job Configuration Example
{
"job_name": "Test Batch",
"project_id": 2,
"tiered_link_count_range": {
"min": 3,
"max": 5
},
"tiers": [
{
"tier": 2,
"article_count": 20
}
]
}
Function Signature
def find_tiered_links(
content_records: List[GeneratedContent],
job_config: JobConfig,
project_repo: IProjectRepository,
content_repo: IGeneratedContentRepository,
site_repo: ISiteDeploymentRepository
) -> Dict:
"""
Find tiered links for a batch of articles
Args:
content_records: Batch of articles (all same tier, same project)
job_config: Job configuration with optional link count range
project_repo: For retrieving money_site_url
content_repo: For querying lower-tier articles
site_repo: For URL generation
Returns:
Tier 1: {tier: 1, money_site_url: "https://..."}
Tier 2+: {tier: N, lower_tier_urls: [...], lower_tier: N-1}
Raises:
ValueError: If batch is invalid or required data is missing
"""
pass
Implementation Example
import random
import logging
from typing import List, Dict
from src.database.models import GeneratedContent
from src.generation.url_generator import generate_urls_for_batch
logger = logging.getLogger(__name__)
def find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo):
if not content_records:
raise ValueError("content_records cannot be empty")
tier = _validate_batch_tier(content_records)
project_id = content_records[0].project_id
logger.info(f"Finding tiered links for tier {tier} batch (project {project_id})")
if tier == 1:
project = project_repo.get_by_id(project_id)
if not project or not project.money_site_url:
raise ValueError(
f"Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
)
return {
"tier": 1,
"money_site_url": project.money_site_url
}
lower_tier = tier - 1
logger.info(f"Batch is tier {tier}, querying tier {lower_tier} articles")
lower_tier_articles = content_repo.get_by_project_and_tier(project_id, lower_tier)
if not lower_tier_articles:
raise ValueError(
f"Cannot generate tier {tier} batch: no tier {lower_tier} articles found in project {project_id}"
)
link_range = job_config.get("tiered_link_count_range", {"min": 2, "max": 4})
min_count = link_range["min"]
max_count = link_range["max"]
available_count = len(lower_tier_articles)
desired_count = random.randint(min_count, max_count)
if available_count < min_count:
logger.warning(
f"Only {available_count} tier {lower_tier} articles available, "
f"requested min {min_count}. Using all available."
)
selected_articles = lower_tier_articles
else:
actual_count = min(desired_count, available_count)
selected_articles = random.sample(lower_tier_articles, actual_count)
logger.info(
f"Selected {len(selected_articles)} random tier {lower_tier} URLs "
f"from {available_count} available"
)
url_mappings = generate_urls_for_batch(selected_articles, site_repo)
lower_tier_urls = [mapping["url"] for mapping in url_mappings]
return {
"tier": tier,
"lower_tier": lower_tier,
"lower_tier_urls": lower_tier_urls
}
def _validate_batch_tier(content_records: List[GeneratedContent]) -> int:
tiers = set(record.tier for record in content_records)
if len(tiers) > 1:
raise ValueError(f"All articles in batch must be same tier, found: {tiers}")
return int(list(tiers)[0])
Database Queries Needed
def get_by_project_and_tier(self, project_id: int, tier: int) -> List[GeneratedContent]:
"""
Get all articles for a specific project and tier
Returns articles that have site_deployment_id set (from Story 3.1)
"""
return self.session.query(GeneratedContent)\
.filter(
GeneratedContent.project_id == project_id,
GeneratedContent.tier == tier,
GeneratedContent.site_deployment_id.isnot(None)
)\
.all()
Return Value Examples
# Tier 1 batch
{
"tier": 1,
"money_site_url": "https://www.mymoneysite.com"
}
# Tier 2 batch
{
"tier": 2,
"lower_tier": 1,
"lower_tier_urls": [
"https://site1.b-cdn.net/article-title-1.html",
"https://www.customdomain.com/article-title-2.html",
"https://site2.b-cdn.net/article-title-3.html"
]
}
# Tier 3 batch with custom range (8 links)
{
"tier": 3,
"lower_tier": 2,
"lower_tier_urls": [
"https://site3.b-cdn.net/...",
"https://site4.b-cdn.net/...",
# ... 6 more URLs
]
}
Dependencies
- Story 3.1: Site assignment and URL generation must be complete
- Story 2.3: GeneratedContent records exist in database
- Story 1.x: Project and GeneratedContent tables exist
Future Considerations
- Story 3.3 will use the tiered links found by this module for actual content injection
- Story 3.3 will populate article_links table with wheel and homepage link relationships
- Story 4.2 will use article_links table to log tiered link relationships after deployment
- Future: Intelligent link distribution (ensure even link spread across lower-tier articles)
- Future: Analytics dashboard showing link structure and tier relationships using article_links table
Link Relationship Tracking
This story creates the article_links table infrastructure. The actual population of link relationships will happen in:
- Story 3.3: Stores wheel and homepage links when injecting them into content
- Story 4.2: Stores tiered links when logging final URLs after deployment
- The table enables future analytics on link distribution, tier structure, and interlinking patterns
Total Effort
16 story points