Story 3.2: Link tier strategy implemented

main
PeninsulaInd 2025-10-21 10:56:35 -05:00
parent ee573fb948
commit 87bf317207
12 changed files with 1511 additions and 324 deletions

View File

@ -0,0 +1,186 @@
# Story 3.2: Find Tiered Links - Implementation Summary
## Status
Completed
## What Was Implemented
### 1. Database Models
- **Added `money_site_url` to Project model** - Stores the client's actual website URL for tier 1 articles to link to
- **Created `ArticleLink` model** - Tracks all link relationships between articles (tiered, wheel, homepage)
### 2. Database Repositories
- **Extended `ProjectRepository`** - Now accepts `money_site_url` in the data dict during creation
- **Extended `GeneratedContentRepository`** - Added filter for site_deployment_id in `get_by_project_and_tier()`
- **Created `ArticleLinkRepository`** - Full CRUD operations for article link tracking
- `create()` - Create internal or external links
- `get_by_source_article()` - Get all outbound links from an article
- `get_by_target_article()` - Get all inbound links to an article
- `get_by_link_type()` - Get all links of a specific type
- `delete()` - Remove a link
### 3. Job Configuration
- **Extended `Job` dataclass** - Added optional `tiered_link_count_range` field
- **Validation** - Validates that min >= 1 and max >= min
- **Defaults** - If not specified, uses `{min: 2, max: 4}`
### 4. Core Functionality
Created `src/interlinking/tiered_links.py` with:
- **`find_tiered_links()`** - Main function to find tiered links for a batch
- For tier 1: Returns the money site URL
- For tier 2+: Returns random selection of lower-tier article URLs
- Respects project boundaries (only queries same project)
- Applies link count configuration
- Handles edge cases (insufficient articles, missing money site URL)
### 5. Tests
- **22 unit tests** in `tests/unit/test_tiered_links.py` - All passing
- **8 unit tests** in `tests/unit/test_article_link_repository.py` - All passing
- **9 integration tests** in `tests/integration/test_story_3_2_integration.py` - All passing
- **Total: 39 tests, all passing**
## Usage Examples
### Finding Tiered Links for Tier 1 Batch
```python
from src.interlinking.tiered_links import find_tiered_links
# Tier 1 articles link to the money site
result = find_tiered_links(tier1_content_records, job, project_repo, content_repo, site_repo)
# Returns: {
# "tier": 1,
# "money_site_url": "https://www.mymoneysite.com"
# }
```
### Finding Tiered Links for Tier 2 Batch
```python
# Tier 2 articles link to random tier 1 articles
result = find_tiered_links(tier2_content_records, job, project_repo, content_repo, site_repo)
# Returns: {
# "tier": 2,
# "lower_tier": 1,
# "lower_tier_urls": [
# "https://site1.b-cdn.net/article-1.html",
# "https://site2.b-cdn.net/article-2.html",
# "https://site3.b-cdn.net/article-3.html"
# ]
# }
```
### Job Config with Custom Link Count
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier1": {"count": 5},
"tier2": {"count": 10}
},
"tiered_link_count_range": {
"min": 3,
"max": 5
}
}]
}
```
### Recording Links in Database
```python
from src.database.repositories import ArticleLinkRepository
link_repo = ArticleLinkRepository(session)
# Record tier 1 article linking to money site
link_repo.create(
from_content_id=tier1_article.id,
to_content_id=None,
to_url="https://www.moneysite.com",
link_type="tiered"
)
# Record tier 2 article linking to tier 1 article
link_repo.create(
from_content_id=tier2_article.id,
to_content_id=tier1_article.id,
to_url=None,
link_type="tiered"
)
# Query all links from an article
outbound_links = link_repo.get_by_source_article(article.id)
```
## Database Schema Changes
### Project Table
```sql
ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL;
CREATE INDEX idx_projects_money_site_url ON projects(money_site_url);
```
### Article Links Table (New)
```sql
CREATE TABLE article_links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_content_id INTEGER NOT NULL,
to_content_id INTEGER NULL,
to_url TEXT NULL,
link_type VARCHAR(20) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL)
);
CREATE INDEX idx_article_links_from ON article_links(from_content_id);
CREATE INDEX idx_article_links_to ON article_links(to_content_id);
CREATE INDEX idx_article_links_type ON article_links(link_type);
```
## Link Types
- `tiered` - Link from tier N to tier N-1 (or money site for tier 1)
- `wheel_next` - Link to next article in wheel structure
- `wheel_prev` - Link to previous article in wheel structure
- `homepage` - Link to site homepage
## Key Features
1. **Project Isolation** - Only queries articles from the same project
2. **Random Selection** - Randomly selects articles within configured range
3. **Flexible Configuration** - Supports both range (min-max) and exact counts
4. **Error Handling** - Clear error messages for missing data
5. **Warning Logs** - Logs warnings when fewer articles available than requested
6. **URL Generation** - Integrates with Story 3.1 URL generation
## Next Steps (Future Stories)
- Story 3.3 will use `find_tiered_links()` for actual content injection
- Story 3.3 will populate `article_links` table with wheel and homepage links
- Story 4.2 will log tiered links after deployment
- Future: Analytics dashboard using `article_links` data
## Files Created/Modified
### Created
- `src/interlinking/tiered_links.py`
- `tests/unit/test_tiered_links.py`
- `tests/unit/test_article_link_repository.py`
- `tests/integration/test_story_3_2_integration.py`
- `jobs/example_story_3.2_tiered_links.json`
- `STORY_3.2_IMPLEMENTATION_SUMMARY.md` (this file)
### Modified
- `src/database/models.py` - Added `money_site_url` to Project, added `ArticleLink` model
- `src/database/interfaces.py` - Added `IArticleLinkRepository` interface
- `src/database/repositories.py` - Extended `ProjectRepository`, added `ArticleLinkRepository`
- `src/generation/job_config.py` - Added `tiered_link_count_range` to Job config
## Test Coverage
All acceptance criteria from the story are covered by tests:
- Tier 1 returns money site URL
- Tier 2+ queries lower tier from same project
- Custom link count ranges work
- Error handling for missing data
- Warning logs for insufficient articles
- ArticleLink CRUD operations
- Integration with URL generation

View File

@ -1,7 +1,7 @@
# Story 3.2: Find Tiered Links # Story 3.2: Find Tiered Links
## Status ## Status
Accepted Review
## Story ## Story
**As a developer**, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection. **As a developer**, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection.

View File

@ -0,0 +1,23 @@
{
"jobs": [
{
"project_id": 1,
"tiers": {
"tier1": {
"count": 5
},
"tier2": {
"count": 10
},
"tier3": {
"count": 20
}
},
"tiered_link_count_range": {
"min": 3,
"max": 5
}
}
]
}

View File

@ -1,317 +0,0 @@
#!/usr/bin/env python
"""
Dry-run test for Story 3.1 features
Tests all functionality without creating real bunny.net sites
"""
import sys
from pathlib import Path
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent))
from unittest.mock import Mock
from src.database.session import db_manager
from src.database.repositories import SiteDeploymentRepository, GeneratedContentRepository, ProjectRepository, UserRepository
from src.generation.url_generator import generate_slug, generate_urls_for_batch
from src.generation.job_config import Job
def print_section(title):
print(f"\n{'='*80}")
print(f" {title}")
print(f"{'='*80}\n")
def test_slug_generation():
print_section("TEST 1: Slug Generation")
test_cases = [
("How to Fix Your Engine", "how-to-fix-your-engine"),
("10 Best SEO Tips for 2024!", "10-best-seo-tips-for-2024"),
("C++ Programming Guide", "c-programming-guide"),
("Multiple Spaces Here", "multiple-spaces-here"),
("!!!Special Characters!!!", "special-characters"),
]
for title, expected in test_cases:
slug = generate_slug(title)
status = "[PASS]" if slug == expected else "[FAIL]"
print(f"{status} '{title}'")
print(f" -> {slug}")
if slug != expected:
print(f" Expected: {expected}")
print("\nSlug generation: PASSED")
def test_site_assignment_priority():
print_section("TEST 2: Site Assignment Priority Logic")
# Create mock sites
preferred_site = Mock()
preferred_site.id = 1
preferred_site.site_name = "preferred-site"
preferred_site.custom_hostname = "www.premium.com"
preferred_site.pull_zone_bcdn_hostname = "premium.b-cdn.net"
keyword_site = Mock()
keyword_site.id = 2
keyword_site.site_name = "engine-repair-abc"
keyword_site.custom_hostname = None
keyword_site.pull_zone_bcdn_hostname = "engine-repair-abc.b-cdn.net"
random_site = Mock()
random_site.id = 3
random_site.site_name = "random-site-xyz"
random_site.custom_hostname = None
random_site.pull_zone_bcdn_hostname = "random-site-xyz.b-cdn.net"
print("Available sites:")
print(f" 1. {preferred_site.custom_hostname} (preferred)")
print(f" 2. {keyword_site.pull_zone_bcdn_hostname} (keyword: 'engine-repair')")
print(f" 3. {random_site.pull_zone_bcdn_hostname} (random)")
print("\nTier1 article with keyword 'engine':")
print(" Priority: preferred -> keyword -> random")
print(" [PASS] Should get: preferred site (www.premium.com)")
print("\nTier2 article with keyword 'car':")
print(" Priority: keyword -> random (no preferred for tier2)")
print(" [PASS] Should get: random site or keyword if matching")
print("\nPriority logic: PASSED")
def test_url_generation():
print_section("TEST 3: URL Generation")
# Test with custom domain
print("Test 3a: Custom domain")
print(" Hostname: www.example.com")
print(" Title: How to Fix Your Engine")
print(" [PASS] URL: https://www.example.com/how-to-fix-your-engine.html")
# Test with bcdn only
print("\nTest 3b: Bunny CDN hostname only")
print(" Hostname: mysite123.b-cdn.net")
print(" Title: SEO Best Practices")
print(" [PASS] URL: https://mysite123.b-cdn.net/seo-best-practices.html")
print("\nURL generation: PASSED")
def test_job_config_parsing():
print_section("TEST 4: Job Config Extensions")
job = Job(
project_id=1,
tiers={"tier1": Mock(count=10)},
tier1_preferred_sites=["www.premium1.com", "www.premium2.com"],
auto_create_sites=True,
create_sites_for_keywords=[
{"keyword": "engine repair", "count": 3},
{"keyword": "car maintenance", "count": 2}
]
)
print("Job configuration loaded:")
print(f" [PASS] project_id: {job.project_id}")
print(f" [PASS] tier1_preferred_sites: {job.tier1_preferred_sites}")
print(f" [PASS] auto_create_sites: {job.auto_create_sites}")
print(f" [PASS] create_sites_for_keywords: {len(job.create_sites_for_keywords)} keywords")
for kw in job.create_sites_for_keywords:
print(f" - {kw['keyword']}: {kw['count']} sites")
print("\nJob config parsing: PASSED")
def test_database_schema():
print_section("TEST 5: Database Schema Validation")
session = db_manager.get_session()
try:
site_repo = SiteDeploymentRepository(session)
# Create a test site without custom hostname
print("Creating test site without custom hostname...")
test_site = site_repo.create(
site_name="test-dryrun-site",
storage_zone_id=999,
storage_zone_name="test-zone",
storage_zone_password="test-pass",
storage_zone_region="DE",
pull_zone_id=888,
pull_zone_bcdn_hostname=f"test-dryrun-{id(session)}.b-cdn.net",
custom_hostname=None # This is the key test
)
print(f" [PASS] Created site with id={test_site.id}")
print(f" [PASS] custom_hostname: {test_site.custom_hostname} (None = nullable works!)")
print(f" [PASS] pull_zone_bcdn_hostname: {test_site.pull_zone_bcdn_hostname}")
# Test get_by_bcdn_hostname
found = site_repo.get_by_bcdn_hostname(test_site.pull_zone_bcdn_hostname)
print(f" [PASS] get_by_bcdn_hostname() works: {found is not None}")
# Clean up
site_repo.delete(test_site.id)
print(f" [PASS] Test site deleted (cleanup)")
session.commit()
print("\nDatabase schema: PASSED")
except Exception as e:
session.rollback()
print(f"\n[FAILED] Database schema test FAILED: {e}")
return False
finally:
session.close()
return True
def test_full_workflow_simulation():
print_section("TEST 6: Full Workflow Simulation (Simplified)")
session = db_manager.get_session()
try:
# Create repositories
site_repo = SiteDeploymentRepository(session)
print("Testing Story 3.1 core features...")
# Create test sites (2 sites)
site1 = site_repo.create(
site_name="test-site-1",
storage_zone_id=101,
storage_zone_name="test-site-1",
storage_zone_password="pass1",
storage_zone_region="DE",
pull_zone_id=201,
pull_zone_bcdn_hostname=f"test-site-1-{id(session)}.b-cdn.net",
custom_hostname="www.test-custom1.com"
)
site2 = site_repo.create(
site_name="test-site-2",
storage_zone_id=102,
storage_zone_name="test-site-2",
storage_zone_password="pass2",
storage_zone_region="NY",
pull_zone_id=202,
pull_zone_bcdn_hostname=f"test-site-2-{id(session)}.b-cdn.net",
custom_hostname=None # bcdn-only site
)
print(f" [PASS] Created 2 test sites")
# Create mock content objects
from unittest.mock import Mock
content1 = Mock()
content1.id = 999
content1.project_id = 1
content1.tier = "tier1"
content1.keyword = "engine repair"
content1.title = "How to Fix Your Car Engine"
content1.outline = {"sections": []}
content1.content = "<p>Test content</p>"
content1.word_count = 500
content1.status = "generated"
content1.site_deployment_id = site1.id
content2 = Mock()
content2.id = 1000
content2.project_id = 1
content2.tier = "tier2"
content2.keyword = "car maintenance"
content2.title = "Essential Car Maintenance Tips"
content2.outline = {"sections": []}
content2.content = "<p>Test content 2</p>"
content2.word_count = 400
content2.status = "generated"
content2.site_deployment_id = site2.id
print(f" [PASS] Created 2 mock articles")
# Generate URLs
print("\nGenerating URLs...")
urls = generate_urls_for_batch([content1, content2], site_repo)
for url_info in urls:
print(f"\n Article: {url_info['title']}")
print(f" Tier: {url_info['tier']}")
print(f" Slug: {url_info['slug']}")
print(f" Hostname: {url_info['hostname']}")
print(f" [PASS] URL: {url_info['url']}")
# Cleanup (only delete sites, mock content wasn't saved)
print("\nCleaning up test data...")
site_repo.delete(site1.id)
site_repo.delete(site2.id)
session.commit()
print(" [PASS] Test data cleaned up")
print("\nFull workflow simulation: PASSED")
except Exception as e:
session.rollback()
print(f"\n[FAILED] Full workflow FAILED: {e}")
import traceback
traceback.print_exc()
return False
finally:
session.close()
return True
def main():
print("\n" + "="*80)
print(" STORY 3.1 DRY-RUN TEST SUITE")
print(" Testing all features without creating real bunny.net sites")
print("="*80)
tests = [
("Slug Generation", test_slug_generation),
("Priority Logic", test_site_assignment_priority),
("URL Generation", test_url_generation),
("Job Config", test_job_config_parsing),
("Database Schema", test_database_schema),
("Full Workflow", test_full_workflow_simulation),
]
passed = 0
failed = 0
for name, test_func in tests:
try:
result = test_func()
if result is None or result is True:
passed += 1
else:
failed += 1
except Exception as e:
print(f"\n[FAILED] {name} FAILED with exception: {e}")
import traceback
traceback.print_exc()
failed += 1
print_section("SUMMARY")
print(f"Tests Passed: {passed}/{len(tests)}")
print(f"Tests Failed: {failed}/{len(tests)}")
if failed == 0:
print("\n[SUCCESS] ALL TESTS PASSED - Story 3.1 is ready to use!")
return 0
else:
print(f"\n[FAILED] {failed} test(s) failed - please review errors above")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -4,7 +4,7 @@ Abstract repository interfaces for data access layer
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from typing import Optional, List, Dict, Any from typing import Optional, List, Dict, Any
from src.database.models import User, SiteDeployment, Project, GeneratedContent from src.database.models import User, SiteDeployment, Project, GeneratedContent, ArticleLink
class IUserRepository(ABC): class IUserRepository(ABC):
@ -176,3 +176,38 @@ class IGeneratedContentRepository(ABC):
def delete(self, content_id: int) -> bool: def delete(self, content_id: int) -> bool:
"""Delete a generated content record by ID""" """Delete a generated content record by ID"""
pass pass
class IArticleLinkRepository(ABC):
"""Interface for ArticleLink data access"""
@abstractmethod
def create(
self,
from_content_id: int,
to_content_id: Optional[int] = None,
to_url: Optional[str] = None,
link_type: str = "tiered"
) -> ArticleLink:
"""Create a new article link"""
pass
@abstractmethod
def get_by_source_article(self, from_content_id: int) -> List[ArticleLink]:
"""Get all outbound links from an article"""
pass
@abstractmethod
def get_by_target_article(self, to_content_id: int) -> List[ArticleLink]:
"""Get all inbound links to an article"""
pass
@abstractmethod
def get_by_link_type(self, link_type: str) -> List[ArticleLink]:
"""Get all links of a specific type"""
pass
@abstractmethod
def delete(self, link_id: int) -> bool:
"""Delete an article link by ID"""
pass

View File

@ -72,6 +72,7 @@ class Project(Base):
name: Mapped[str] = mapped_column(String(255), nullable=False) name: Mapped[str] = mapped_column(String(255), nullable=False)
main_keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True) main_keyword: Mapped[str] = mapped_column(String(255), nullable=False, index=True)
tier: Mapped[int] = mapped_column(Integer, nullable=False, default=1, index=True) tier: Mapped[int] = mapped_column(Integer, nullable=False, default=1, index=True)
money_site_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True, index=True)
word_count: Mapped[int] = mapped_column(Integer, nullable=False, default=1250) word_count: Mapped[int] = mapped_column(Integer, nullable=False, default=1250)
term_frequency: Mapped[int] = mapped_column(Integer, nullable=False, default=3) term_frequency: Mapped[int] = mapped_column(Integer, nullable=False, default=3)
@ -146,3 +147,29 @@ class GeneratedContent(Base):
def __repr__(self) -> str: def __repr__(self) -> str:
return f"<GeneratedContent(id={self.id}, project_id={self.project_id}, tier='{self.tier}', status='{self.status}')>" return f"<GeneratedContent(id={self.id}, project_id={self.project_id}, tier='{self.tier}', status='{self.status}')>"
class ArticleLink(Base):
"""Article link tracking model for tiered linking, wheel links, etc."""
__tablename__ = "article_links"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
from_content_id: Mapped[int] = mapped_column(
Integer,
ForeignKey('generated_content.id', ondelete='CASCADE'),
nullable=False,
index=True
)
to_content_id: Mapped[Optional[int]] = mapped_column(
Integer,
ForeignKey('generated_content.id', ondelete='CASCADE'),
nullable=True,
index=True
)
to_url: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
link_type: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
def __repr__(self) -> str:
target = f"content_id={self.to_content_id}" if self.to_content_id else f"url={self.to_url}"
return f"<ArticleLink(id={self.id}, from={self.from_content_id}, to={target}, type='{self.link_type}')>"

View File

@ -6,8 +6,8 @@ from typing import Optional, List, Dict, Any
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
from sqlalchemy.exc import IntegrityError from sqlalchemy.exc import IntegrityError
from src.core.config import get_config from src.core.config import get_config
from src.database.interfaces import IUserRepository, ISiteDeploymentRepository, IProjectRepository from src.database.interfaces import IUserRepository, ISiteDeploymentRepository, IProjectRepository, IArticleLinkRepository
from src.database.models import User, SiteDeployment, Project, GeneratedContent from src.database.models import User, SiteDeployment, Project, GeneratedContent, ArticleLink
class UserRepository(IUserRepository): class UserRepository(IUserRepository):
@ -316,6 +316,7 @@ class ProjectRepository(IProjectRepository):
entities=data.get("entities", []), entities=data.get("entities", []),
related_searches=data.get("related_searches", []), related_searches=data.get("related_searches", []),
custom_anchor_text=data.get("custom_anchor_text", []), custom_anchor_text=data.get("custom_anchor_text", []),
money_site_url=data.get("money_site_url"),
) )
try: try:
@ -454,10 +455,15 @@ class GeneratedContentRepository:
return self.session.query(GeneratedContent).filter(GeneratedContent.project_id == project_id).all() return self.session.query(GeneratedContent).filter(GeneratedContent.project_id == project_id).all()
def get_by_project_and_tier(self, project_id: int, tier: str) -> List[GeneratedContent]: def get_by_project_and_tier(self, project_id: int, tier: str) -> List[GeneratedContent]:
"""Get content for a project and tier""" """
Get content for a project and tier with site assignment
Returns only articles that have been assigned to a site (site_deployment_id is not None)
"""
return self.session.query(GeneratedContent).filter( return self.session.query(GeneratedContent).filter(
GeneratedContent.project_id == project_id, GeneratedContent.project_id == project_id,
GeneratedContent.tier == tier GeneratedContent.tier == tier,
GeneratedContent.site_deployment_id.isnot(None)
).all() ).all()
def get_by_keyword(self, keyword: str) -> List[GeneratedContent]: def get_by_keyword(self, keyword: str) -> List[GeneratedContent]:
@ -483,3 +489,78 @@ class GeneratedContentRepository:
self.session.commit() self.session.commit()
return True return True
return False return False
class ArticleLinkRepository(IArticleLinkRepository):
"""Repository for ArticleLink data access"""
def __init__(self, session: Session):
self.session = session
def create(
self,
from_content_id: int,
to_content_id: Optional[int] = None,
to_url: Optional[str] = None,
link_type: str = "tiered"
) -> ArticleLink:
"""
Create a new article link
Args:
from_content_id: Source article ID
to_content_id: Target article ID (for internal links)
to_url: Target URL (for external links like money site)
link_type: Type of link (tiered, wheel_next, wheel_prev, homepage)
Returns:
The created ArticleLink object
Raises:
ValueError: If neither to_content_id nor to_url is provided
"""
if to_content_id is None and to_url is None:
raise ValueError("Either to_content_id or to_url must be provided")
link = ArticleLink(
from_content_id=from_content_id,
to_content_id=to_content_id,
to_url=to_url,
link_type=link_type
)
try:
self.session.add(link)
self.session.commit()
self.session.refresh(link)
return link
except IntegrityError as e:
self.session.rollback()
raise ValueError(f"Failed to create article link: {e}")
def get_by_source_article(self, from_content_id: int) -> List[ArticleLink]:
"""Get all outbound links from an article"""
return self.session.query(ArticleLink).filter(
ArticleLink.from_content_id == from_content_id
).all()
def get_by_target_article(self, to_content_id: int) -> List[ArticleLink]:
"""Get all inbound links to an article"""
return self.session.query(ArticleLink).filter(
ArticleLink.to_content_id == to_content_id
).all()
def get_by_link_type(self, link_type: str) -> List[ArticleLink]:
"""Get all links of a specific type"""
return self.session.query(ArticleLink).filter(
ArticleLink.link_type == link_type
).all()
def delete(self, link_id: int) -> bool:
"""Delete an article link by ID"""
link = self.session.query(ArticleLink).filter(ArticleLink.id == link_id).first()
if link:
self.session.delete(link)
self.session.commit()
return True
return False

View File

@ -56,6 +56,7 @@ class Job:
tier1_preferred_sites: Optional[List[str]] = None tier1_preferred_sites: Optional[List[str]] = None
auto_create_sites: bool = False auto_create_sites: bool = False
create_sites_for_keywords: Optional[List[Dict[str, any]]] = None create_sites_for_keywords: Optional[List[Dict[str, any]]] = None
tiered_link_count_range: Optional[Dict[str, int]] = None
class JobConfig: class JobConfig:
@ -136,13 +137,29 @@ class JobConfig:
if "keyword" not in kw_config or "count" not in kw_config: if "keyword" not in kw_config or "count" not in kw_config:
raise ValueError("Each item in 'create_sites_for_keywords' must have 'keyword' and 'count'") raise ValueError("Each item in 'create_sites_for_keywords' must have 'keyword' and 'count'")
tiered_link_count_range = job_data.get("tiered_link_count_range")
if tiered_link_count_range is not None:
if not isinstance(tiered_link_count_range, dict):
raise ValueError("'tiered_link_count_range' must be an object")
if "min" not in tiered_link_count_range or "max" not in tiered_link_count_range:
raise ValueError("'tiered_link_count_range' must have 'min' and 'max' fields")
min_val = tiered_link_count_range["min"]
max_val = tiered_link_count_range["max"]
if not isinstance(min_val, int) or not isinstance(max_val, int):
raise ValueError("'tiered_link_count_range' min and max must be integers")
if min_val < 1:
raise ValueError("'tiered_link_count_range' min must be >= 1")
if max_val < min_val:
raise ValueError("'tiered_link_count_range' max must be >= min")
return Job( return Job(
project_id=project_id, project_id=project_id,
tiers=tiers, tiers=tiers,
deployment_targets=deployment_targets, deployment_targets=deployment_targets,
tier1_preferred_sites=tier1_preferred_sites, tier1_preferred_sites=tier1_preferred_sites,
auto_create_sites=auto_create_sites, auto_create_sites=auto_create_sites,
create_sites_for_keywords=create_sites_for_keywords create_sites_for_keywords=create_sites_for_keywords,
tiered_link_count_range=tiered_link_count_range
) )
def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig: def _parse_tier(self, tier_name: str, tier_data: dict) -> TierConfig:

View File

@ -0,0 +1,165 @@
"""
Tiered link finder for article interlinking
"""
import random
import logging
from typing import List, Dict
from src.database.models import GeneratedContent
from src.database.repositories import ProjectRepository, GeneratedContentRepository, SiteDeploymentRepository
from src.generation.url_generator import generate_urls_for_batch
logger = logging.getLogger(__name__)
def find_tiered_links(
content_records: List[GeneratedContent],
job_config,
project_repo: ProjectRepository,
content_repo: GeneratedContentRepository,
site_repo: SiteDeploymentRepository
) -> Dict:
"""
Find tiered links for a batch of articles
Args:
content_records: Batch of articles (all same tier, same project)
job_config: Job configuration (Job object or dict) with optional link count range
project_repo: For retrieving money_site_url
content_repo: For querying lower-tier articles
site_repo: For URL generation
Returns:
Tier 1: {tier: 1, money_site_url: "https://..."}
Tier 2+: {tier: N, lower_tier_urls: [...], lower_tier: N-1}
Raises:
ValueError: If batch is invalid or required data is missing
"""
if not content_records:
raise ValueError("content_records cannot be empty")
tier = _validate_batch_tier(content_records)
tier_int = _extract_tier_number(tier)
project_id = content_records[0].project_id
logger.info(f"Finding tiered links for tier {tier_int} batch (project {project_id})")
if tier_int == 1:
project = project_repo.get_by_id(project_id)
if not project or not project.money_site_url:
raise ValueError(
f"Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
)
return {
"tier": tier_int,
"money_site_url": project.money_site_url
}
lower_tier_int = tier_int - 1
lower_tier = f"tier{lower_tier_int}"
logger.info(f"Batch is tier {tier_int}, querying tier {lower_tier_int} articles")
lower_tier_articles = content_repo.get_by_project_and_tier(project_id, lower_tier)
if not lower_tier_articles:
raise ValueError(
f"Cannot generate tier {tier_int} batch: no tier {lower_tier_int} articles found in project {project_id}"
)
link_range = _get_link_count_range(job_config)
min_count = link_range["min"]
max_count = link_range["max"]
available_count = len(lower_tier_articles)
desired_count = random.randint(min_count, max_count)
if available_count < min_count:
logger.warning(
f"Only {available_count} tier {lower_tier_int} articles available, "
f"requested min {min_count}. Using all available."
)
selected_articles = lower_tier_articles
else:
actual_count = min(desired_count, available_count)
selected_articles = random.sample(lower_tier_articles, actual_count)
logger.info(
f"Selected {len(selected_articles)} random tier {lower_tier_int} URLs "
f"from {available_count} available"
)
url_mappings = generate_urls_for_batch(selected_articles, site_repo)
lower_tier_urls = [mapping["url"] for mapping in url_mappings]
return {
"tier": tier_int,
"lower_tier": lower_tier_int,
"lower_tier_urls": lower_tier_urls
}
def _validate_batch_tier(content_records: List[GeneratedContent]) -> str:
"""
Validate that all articles in batch are the same tier
Args:
content_records: List of GeneratedContent records
Returns:
The tier string (e.g., "tier1", "tier2")
Raises:
ValueError: If articles have different tiers
"""
tiers = set(record.tier for record in content_records)
if len(tiers) > 1:
raise ValueError(f"All articles in batch must be same tier, found: {tiers}")
return list(tiers)[0]
def _extract_tier_number(tier: str) -> int:
"""
Extract tier number from tier string
Args:
tier: Tier string like "tier1", "tier2"
Returns:
Integer tier number (1, 2, 3, etc.)
"""
if tier.startswith("tier"):
try:
return int(tier[4:])
except (ValueError, IndexError):
pass
raise ValueError(f"Invalid tier format: {tier}")
def _get_link_count_range(job_config) -> Dict[str, int]:
"""
Get link count range from job config with defaults
Args:
job_config: Job object or dict
Returns:
Dict with 'min' and 'max' keys, defaults to {min: 2, max: 4}
"""
default_range = {"min": 2, "max": 4}
if job_config is None:
return default_range
if hasattr(job_config, 'tiered_link_count_range'):
link_range = job_config.tiered_link_count_range
elif isinstance(job_config, dict):
link_range = job_config.get('tiered_link_count_range')
else:
return default_range
if link_range is None:
return default_range
return link_range

View File

@ -0,0 +1,522 @@
"""
Integration tests for Story 3.2: Find Tiered Links
"""
import pytest
from src.database.models import GeneratedContent, SiteDeployment, Project, ArticleLink
from src.database.repositories import (
SiteDeploymentRepository,
GeneratedContentRepository,
ProjectRepository,
ArticleLinkRepository
)
from src.generation.job_config import Job
from src.interlinking.tiered_links import find_tiered_links
class TestTieredLinksFindingIntegration:
"""Integration tests for tiered link finding with real database"""
def test_tier1_returns_money_site_url(self, db_session):
"""Test tier 1 batch returns money site URL from project"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
# Create project with money site URL
project = project_repo.create(
user_id=1,
name="Test Project",
data={
"main_keyword": "test keyword",
"money_site_url": "https://www.mymoneysite.com"
}
)
# Create tier 1 content
content = content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title="Tier 1 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
result = find_tiered_links([content], None, project_repo, content_repo, site_repo)
assert result["tier"] == 1
assert result["money_site_url"] == "https://www.mymoneysite.com"
def test_tier2_queries_tier1_articles_same_project(self, db_session):
"""Test tier 2 batch queries tier 1 articles from same project only"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
# Create site for tier 1 articles
site = site_repo.create(
site_name="test-site",
storage_zone_id=1,
storage_zone_name="test",
storage_zone_password="pass",
storage_zone_region="DE",
pull_zone_id=10,
pull_zone_bcdn_hostname="test.b-cdn.net"
)
# Create two projects
project1 = project_repo.create(
user_id=1,
name="Project 1",
data={"main_keyword": "test1"}
)
project2 = project_repo.create(
user_id=1,
name="Project 2",
data={"main_keyword": "test2"}
)
# Create tier 1 articles for project 1
tier1_p1_articles = []
for i in range(5):
article = content_repo.create(
project_id=project1.id,
tier="tier1",
keyword="test1",
title=f"Project 1 Tier 1 Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
tier1_p1_articles.append(article)
# Create tier 1 articles for project 2 (should not be selected)
for i in range(3):
content_repo.create(
project_id=project2.id,
tier="tier1",
keyword="test2",
title=f"Project 2 Tier 1 Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
# Create tier 2 article for project 1
tier2_article = content_repo.create(
project_id=project1.id,
tier="tier2",
keyword="test1",
title="Project 1 Tier 2 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
result = find_tiered_links([tier2_article], None, project_repo, content_repo, site_repo)
assert result["tier"] == 2
assert result["lower_tier"] == 1
assert len(result["lower_tier_urls"]) >= 2
assert len(result["lower_tier_urls"]) <= 4
# Verify URLs are from tier 1 project 1 articles only
for url in result["lower_tier_urls"]:
assert "test.b-cdn.net" in url
assert any(f"project-1-tier-1-article-{i}" in url.lower() for i in range(5))
def test_tier3_queries_tier2_articles(self, db_session):
"""Test tier 3 batch queries tier 2 articles"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
site = site_repo.create(
site_name="test-site",
storage_zone_id=1,
storage_zone_name="test",
storage_zone_password="pass",
storage_zone_region="DE",
pull_zone_id=10,
pull_zone_bcdn_hostname="tier2site.b-cdn.net"
)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create tier 2 articles
for i in range(10):
content_repo.create(
project_id=project.id,
tier="tier2",
keyword="test",
title=f"Tier 2 Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
# Create tier 3 article
tier3_article = content_repo.create(
project_id=project.id,
tier="tier3",
keyword="test",
title="Tier 3 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
result = find_tiered_links([tier3_article], None, project_repo, content_repo, site_repo)
assert result["tier"] == 3
assert result["lower_tier"] == 2
assert len(result["lower_tier_urls"]) >= 2
assert len(result["lower_tier_urls"]) <= 4
def test_custom_link_count_range(self, db_session):
"""Test custom link count range from job config"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
site = site_repo.create(
site_name="test-site",
storage_zone_id=1,
storage_zone_name="test",
storage_zone_password="pass",
storage_zone_region="DE",
pull_zone_id=10,
pull_zone_bcdn_hostname="test.b-cdn.net"
)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create 15 tier 1 articles
for i in range(15):
content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title=f"Tier 1 Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
# Create tier 2 article
tier2_article = content_repo.create(
project_id=project.id,
tier="tier2",
keyword="test",
title="Tier 2 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
# Test with custom range (min=5, max=8)
job = Job(
project_id=project.id,
tiers={},
tiered_link_count_range={"min": 5, "max": 8}
)
result = find_tiered_links([tier2_article], job, project_repo, content_repo, site_repo)
url_count = len(result["lower_tier_urls"])
assert 5 <= url_count <= 8
def test_exact_count_when_min_equals_max(self, db_session):
"""Test exact link count when min equals max"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
site = site_repo.create(
site_name="test-site",
storage_zone_id=1,
storage_zone_name="test",
storage_zone_password="pass",
storage_zone_region="DE",
pull_zone_id=10,
pull_zone_bcdn_hostname="test.b-cdn.net"
)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create 20 tier 1 articles
for i in range(20):
content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title=f"Tier 1 Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
tier2_article = content_repo.create(
project_id=project.id,
tier="tier2",
keyword="test",
title="Tier 2 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
# Test with exact count (min=7, max=7)
job = Job(
project_id=project.id,
tiers={},
tiered_link_count_range={"min": 7, "max": 7}
)
result = find_tiered_links([tier2_article], job, project_repo, content_repo, site_repo)
assert len(result["lower_tier_urls"]) == 7
def test_insufficient_lower_tier_articles_uses_all(self, db_session):
"""Test that all available articles are used when fewer than min requested"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
site_repo = SiteDeploymentRepository(db_session)
site = site_repo.create(
site_name="test-site",
storage_zone_id=1,
storage_zone_name="test",
storage_zone_password="pass",
storage_zone_region="DE",
pull_zone_id=10,
pull_zone_bcdn_hostname="test.b-cdn.net"
)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create only 1 tier 1 article
content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title="Only Tier 1 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated",
site_deployment_id=site.id
)
tier2_article = content_repo.create(
project_id=project.id,
tier="tier2",
keyword="test",
title="Tier 2 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
result = find_tiered_links([tier2_article], None, project_repo, content_repo, site_repo)
# Should return the 1 available article even though min is 2
assert len(result["lower_tier_urls"]) == 1
class TestArticleLinkRepositoryIntegration:
"""Integration tests for ArticleLink repository with database constraints"""
def test_create_and_query_tiered_links(self, db_session):
"""Test creating and querying tiered links"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
link_repo = ArticleLinkRepository(db_session)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create tier 1 and tier 2 articles
tier1_article = content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title="Tier 1 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
tier2_article = content_repo.create(
project_id=project.id,
tier="tier2",
keyword="test",
title="Tier 2 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
# Create tiered link from tier 2 to tier 1
link = link_repo.create(
from_content_id=tier2_article.id,
to_content_id=tier1_article.id,
link_type="tiered"
)
assert link.id is not None
# Query links
outbound = link_repo.get_by_source_article(tier2_article.id)
assert len(outbound) == 1
assert outbound[0].to_content_id == tier1_article.id
inbound = link_repo.get_by_target_article(tier1_article.id)
assert len(inbound) == 1
assert inbound[0].from_content_id == tier2_article.id
def test_create_money_site_link(self, db_session):
"""Test creating external link to money site"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
link_repo = ArticleLinkRepository(db_session)
project = project_repo.create(
user_id=1,
name="Test Project",
data={
"main_keyword": "test",
"money_site_url": "https://www.moneysite.com"
}
)
tier1_article = content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title="Tier 1 Article",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
# Create link to money site
link = link_repo.create(
from_content_id=tier1_article.id,
to_content_id=None,
to_url="https://www.moneysite.com",
link_type="tiered"
)
assert link.to_content_id is None
assert link.to_url == "https://www.moneysite.com"
# Query
links = link_repo.get_by_source_article(tier1_article.id)
assert len(links) == 1
assert links[0].to_url == "https://www.moneysite.com"
def test_multiple_link_types(self, db_session):
"""Test different link types (tiered, wheel_next, wheel_prev, homepage)"""
project_repo = ProjectRepository(db_session)
content_repo = GeneratedContentRepository(db_session)
link_repo = ArticleLinkRepository(db_session)
project = project_repo.create(
user_id=1,
name="Test Project",
data={"main_keyword": "test"}
)
# Create 3 articles
articles = []
for i in range(3):
article = content_repo.create(
project_id=project.id,
tier="tier1",
keyword="test",
title=f"Article {i}",
outline={},
content="<p>Test</p>",
word_count=100,
status="generated"
)
articles.append(article)
# Create different link types
tiered_link = link_repo.create(
from_content_id=articles[0].id,
to_content_id=articles[1].id,
link_type="tiered"
)
wheel_next_link = link_repo.create(
from_content_id=articles[0].id,
to_content_id=articles[1].id,
link_type="wheel_next"
)
wheel_prev_link = link_repo.create(
from_content_id=articles[1].id,
to_content_id=articles[0].id,
link_type="wheel_prev"
)
homepage_link = link_repo.create(
from_content_id=articles[2].id,
to_content_id=articles[0].id,
link_type="homepage"
)
# Query by type
tiered_links = link_repo.get_by_link_type("tiered")
assert len(tiered_links) == 1
wheel_links = link_repo.get_by_link_type("wheel_next")
assert len(wheel_links) == 1
# Article 0 should have multiple outbound links
outbound = link_repo.get_by_source_article(articles[0].id)
assert len(outbound) == 2 # tiered and wheel_next

View File

@ -0,0 +1,118 @@
"""
Unit tests for ArticleLink repository
"""
import pytest
from src.database.repositories import ArticleLinkRepository
from src.database.models import ArticleLink, GeneratedContent, Project
class TestArticleLinkRepository:
"""Tests for ArticleLinkRepository"""
def test_create_internal_link(self, db_session):
repo = ArticleLinkRepository(db_session)
link = repo.create(
from_content_id=1,
to_content_id=2,
to_url=None,
link_type="tiered"
)
assert link.id is not None
assert link.from_content_id == 1
assert link.to_content_id == 2
assert link.to_url is None
assert link.link_type == "tiered"
def test_create_external_link(self, db_session):
repo = ArticleLinkRepository(db_session)
link = repo.create(
from_content_id=1,
to_content_id=None,
to_url="https://www.moneysite.com",
link_type="tiered"
)
assert link.id is not None
assert link.from_content_id == 1
assert link.to_content_id is None
assert link.to_url == "https://www.moneysite.com"
assert link.link_type == "tiered"
def test_create_without_target_raises_error(self, db_session):
repo = ArticleLinkRepository(db_session)
with pytest.raises(ValueError, match="Either to_content_id or to_url must be provided"):
repo.create(
from_content_id=1,
to_content_id=None,
to_url=None,
link_type="tiered"
)
def test_get_by_source_article(self, db_session):
repo = ArticleLinkRepository(db_session)
link1 = repo.create(from_content_id=1, to_content_id=2, link_type="tiered")
link2 = repo.create(from_content_id=1, to_content_id=3, link_type="wheel_next")
link3 = repo.create(from_content_id=2, to_content_id=4, link_type="tiered")
links = repo.get_by_source_article(1)
assert len(links) == 2
link_ids = [link.id for link in links]
assert link1.id in link_ids
assert link2.id in link_ids
assert link3.id not in link_ids
def test_get_by_target_article(self, db_session):
repo = ArticleLinkRepository(db_session)
link1 = repo.create(from_content_id=1, to_content_id=2, link_type="tiered")
link2 = repo.create(from_content_id=3, to_content_id=2, link_type="tiered")
link3 = repo.create(from_content_id=1, to_content_id=4, link_type="tiered")
links = repo.get_by_target_article(2)
assert len(links) == 2
link_ids = [link.id for link in links]
assert link1.id in link_ids
assert link2.id in link_ids
assert link3.id not in link_ids
def test_get_by_link_type(self, db_session):
repo = ArticleLinkRepository(db_session)
link1 = repo.create(from_content_id=1, to_content_id=2, link_type="tiered")
link2 = repo.create(from_content_id=2, to_content_id=3, link_type="wheel_next")
link3 = repo.create(from_content_id=3, to_content_id=4, link_type="tiered")
links = repo.get_by_link_type("tiered")
assert len(links) == 2
link_ids = [link.id for link in links]
assert link1.id in link_ids
assert link3.id in link_ids
assert link2.id not in link_ids
def test_delete(self, db_session):
repo = ArticleLinkRepository(db_session)
link = repo.create(from_content_id=1, to_content_id=2, link_type="tiered")
link_id = link.id
result = repo.delete(link_id)
assert result is True
links = repo.get_by_source_article(1)
assert len(links) == 0
def test_delete_nonexistent_returns_false(self, db_session):
repo = ArticleLinkRepository(db_session)
result = repo.delete(999)
assert result is False

View File

@ -0,0 +1,330 @@
"""
Unit tests for tiered link finder
"""
import pytest
from unittest.mock import Mock, MagicMock, patch
from src.interlinking.tiered_links import (
find_tiered_links,
_validate_batch_tier,
_extract_tier_number,
_get_link_count_range
)
from src.database.models import GeneratedContent, Project
from src.generation.job_config import Job
class TestExtractTierNumber:
"""Tests for _extract_tier_number helper"""
def test_tier1(self):
assert _extract_tier_number("tier1") == 1
def test_tier2(self):
assert _extract_tier_number("tier2") == 2
def test_tier3(self):
assert _extract_tier_number("tier3") == 3
def test_invalid_format(self):
with pytest.raises(ValueError, match="Invalid tier format"):
_extract_tier_number("invalid")
def test_tier_without_number(self):
with pytest.raises(ValueError, match="Invalid tier format"):
_extract_tier_number("tier")
class TestValidateBatchTier:
"""Tests for _validate_batch_tier helper"""
def test_single_tier_batch(self):
content1 = Mock(spec=GeneratedContent)
content1.tier = "tier1"
content2 = Mock(spec=GeneratedContent)
content2.tier = "tier1"
result = _validate_batch_tier([content1, content2])
assert result == "tier1"
def test_mixed_tiers_raises_error(self):
content1 = Mock(spec=GeneratedContent)
content1.tier = "tier1"
content2 = Mock(spec=GeneratedContent)
content2.tier = "tier2"
with pytest.raises(ValueError, match="All articles in batch must be same tier"):
_validate_batch_tier([content1, content2])
class TestGetLinkCountRange:
"""Tests for _get_link_count_range helper"""
def test_default_range(self):
result = _get_link_count_range(None)
assert result == {"min": 2, "max": 4}
def test_job_object_with_range(self):
job = Job(
project_id=1,
tiers={},
tiered_link_count_range={"min": 3, "max": 6}
)
result = _get_link_count_range(job)
assert result == {"min": 3, "max": 6}
def test_job_object_without_range(self):
job = Job(
project_id=1,
tiers={},
tiered_link_count_range=None
)
result = _get_link_count_range(job)
assert result == {"min": 2, "max": 4}
def test_dict_with_range(self):
job_dict = {"tiered_link_count_range": {"min": 5, "max": 8}}
result = _get_link_count_range(job_dict)
assert result == {"min": 5, "max": 8}
def test_dict_without_range(self):
job_dict = {}
result = _get_link_count_range(job_dict)
assert result == {"min": 2, "max": 4}
class TestFindTieredLinks:
"""Tests for find_tiered_links main function"""
def test_empty_content_records_raises_error(self):
project_repo = Mock()
content_repo = Mock()
site_repo = Mock()
with pytest.raises(ValueError, match="content_records cannot be empty"):
find_tiered_links([], None, project_repo, content_repo, site_repo)
def test_tier1_returns_money_site_url(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier1"
content.project_id = 1
project = Mock(spec=Project)
project.money_site_url = "https://www.mymoneysite.com"
project_repo = Mock()
project_repo.get_by_id.return_value = project
content_repo = Mock()
site_repo = Mock()
result = find_tiered_links([content], None, project_repo, content_repo, site_repo)
assert result["tier"] == 1
assert result["money_site_url"] == "https://www.mymoneysite.com"
project_repo.get_by_id.assert_called_once_with(1)
def test_tier1_missing_money_site_url_raises_error(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier1"
content.project_id = 1
project = Mock(spec=Project)
project.money_site_url = None
project_repo = Mock()
project_repo.get_by_id.return_value = project
content_repo = Mock()
site_repo = Mock()
with pytest.raises(ValueError, match="money_site_url not set in project 1"):
find_tiered_links([content], None, project_repo, content_repo, site_repo)
def test_tier1_missing_project_raises_error(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier1"
content.project_id = 999
project_repo = Mock()
project_repo.get_by_id.return_value = None
content_repo = Mock()
site_repo = Mock()
with pytest.raises(ValueError, match="money_site_url not set in project 999"):
find_tiered_links([content], None, project_repo, content_repo, site_repo)
def test_tier2_queries_tier1_articles(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier2"
content.project_id = 1
lower_tier_article = Mock(spec=GeneratedContent)
lower_tier_article.id = 10
lower_tier_article.tier = "tier1"
lower_tier_article.title = "Lower Tier Article"
lower_tier_article.site_deployment_id = 5
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = [lower_tier_article]
site_repo = Mock()
with patch('src.interlinking.tiered_links.generate_urls_for_batch') as mock_gen:
mock_gen.return_value = [{
"content_id": 10,
"url": "https://example.com/article.html",
"title": "Lower Tier Article"
}]
result = find_tiered_links([content], None, project_repo, content_repo, site_repo)
assert result["tier"] == 2
assert result["lower_tier"] == 1
assert len(result["lower_tier_urls"]) == 1
assert result["lower_tier_urls"][0] == "https://example.com/article.html"
content_repo.get_by_project_and_tier.assert_called_once_with(1, "tier1")
def test_tier3_queries_tier2_articles(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier3"
content.project_id = 2
lower_tier_articles = [
Mock(id=i, tier="tier2", site_deployment_id=5) for i in range(10)
]
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = lower_tier_articles
site_repo = Mock()
with patch('src.interlinking.tiered_links.generate_urls_for_batch') as mock_gen:
mock_gen.return_value = [
{"content_id": i, "url": f"https://example.com/article-{i}.html"}
for i in range(10)
]
result = find_tiered_links([content], None, project_repo, content_repo, site_repo)
assert result["tier"] == 3
assert result["lower_tier"] == 2
content_repo.get_by_project_and_tier.assert_called_once_with(2, "tier2")
def test_no_lower_tier_articles_raises_error(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier2"
content.project_id = 1
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = []
site_repo = Mock()
with pytest.raises(ValueError, match="no tier 1 articles found in project 1"):
find_tiered_links([content], None, project_repo, content_repo, site_repo)
def test_custom_link_count_range(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier2"
content.project_id = 1
lower_tier_articles = [
Mock(id=i, tier="tier1", site_deployment_id=5) for i in range(20)
]
job = Job(
project_id=1,
tiers={},
tiered_link_count_range={"min": 5, "max": 8}
)
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = lower_tier_articles
site_repo = Mock()
with patch('src.interlinking.tiered_links.generate_urls_for_batch') as mock_gen:
# Mock should return URLs based on how many articles were passed to it
def mock_url_gen(articles, site_repo):
return [
{"content_id": i, "url": f"https://example.com/article-{i}.html"}
for i in range(len(articles))
]
mock_gen.side_effect = mock_url_gen
result = find_tiered_links([content], job, project_repo, content_repo, site_repo)
url_count = len(result["lower_tier_urls"])
assert 5 <= url_count <= 8
def test_fewer_articles_than_min_uses_all_available(self, caplog):
content = Mock(spec=GeneratedContent)
content.tier = "tier2"
content.project_id = 1
lower_tier_articles = [
Mock(id=1, tier="tier1", site_deployment_id=5)
]
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = lower_tier_articles
site_repo = Mock()
with patch('src.interlinking.tiered_links.generate_urls_for_batch') as mock_gen:
mock_gen.return_value = [
{"content_id": 1, "url": "https://example.com/article.html"}
]
result = find_tiered_links([content], None, project_repo, content_repo, site_repo)
assert len(result["lower_tier_urls"]) == 1
assert "Only 1 tier 1 articles available" in caplog.text
def test_exact_count_when_min_equals_max(self):
content = Mock(spec=GeneratedContent)
content.tier = "tier2"
content.project_id = 1
lower_tier_articles = [
Mock(id=i, tier="tier1", site_deployment_id=5) for i in range(20)
]
job = Job(
project_id=1,
tiers={},
tiered_link_count_range={"min": 8, "max": 8}
)
project_repo = Mock()
content_repo = Mock()
content_repo.get_by_project_and_tier.return_value = lower_tier_articles
site_repo = Mock()
with patch('src.interlinking.tiered_links.generate_urls_for_batch') as mock_gen:
# Mock should return URLs based on how many articles were passed to it
def mock_url_gen(articles, site_repo):
return [
{"content_id": i, "url": f"https://example.com/article-{i}.html"}
for i in range(len(articles))
]
mock_gen.side_effect = mock_url_gen
result = find_tiered_links([content], job, project_repo, content_repo, site_repo)
assert len(result["lower_tier_urls"]) == 8