457 lines
17 KiB
Markdown
457 lines
17 KiB
Markdown
# Story 3.2: Find Tiered Links
|
|
|
|
## Status
|
|
Complete - QA Approved
|
|
|
|
## Story
|
|
**As a developer**, I want a module that finds all required tiered links (money site or lower-tier) based on the current batch's tier, so I have them ready for injection.
|
|
|
|
## Context
|
|
- Story 3.1 generates URLs for articles in the current batch
|
|
- Articles are organized in tiers (T1, T2, T3, etc.) where higher tiers link to lower tiers
|
|
- Tier 1 articles link to the money site (client's actual website)
|
|
- Tier 2+ articles link to random articles from the tier immediately below
|
|
- All articles in a batch are from the same project and tier
|
|
- URLs are generated on-the-fly from `GeneratedContent` records (not stored in DB yet)
|
|
- The link relationships (which article links to which) will be tracked in Story 4.2
|
|
|
|
## Acceptance Criteria
|
|
|
|
### Core Functionality
|
|
- A function accepts a batch of `GeneratedContent` records and job configuration
|
|
- It determines the tier of the batch (all articles in batch are same tier)
|
|
- **If Tier 1:**
|
|
- It retrieves the `money_site_url` from the project settings
|
|
- Returns a single money site URL
|
|
- **If Tier 2 or higher:**
|
|
- It queries `GeneratedContent` table for articles from the tier immediately below (e.g., T2 queries T1)
|
|
- Filters to same project only
|
|
- Selects random articles from the lower tier
|
|
- Generates URLs for those articles using `generate_urls_for_batch()`
|
|
- Returns list of lower-tier URLs
|
|
- Function signature: `find_tiered_links(content_records: List[GeneratedContent], job_config, project_repo, content_repo, site_repo) -> Dict`
|
|
|
|
### Link Count Configuration
|
|
- By default: select 2-4 random lower-tier URLs (random count between 2 and 4)
|
|
- Job config supports optional `tiered_link_count_range: {min: int, max: int}`
|
|
- If min == max, always returns exactly that many links (e.g., `{min: 8, max: 8}` returns 8 links)
|
|
- If min < max, returns random count between min and max (inclusive)
|
|
- Default if not specified: `{min: 2, max: 4}`
|
|
|
|
### Return Format
|
|
- **Tier 1 batches:** `{tier: 1, money_site_url: "https://example.com"}`
|
|
- **Tier 2+ batches:** `{tier: N, lower_tier_urls: ["https://...", "https://..."], lower_tier: N-1}`
|
|
|
|
### Error Handling
|
|
- **Tier 2+ with no lower-tier articles:** Raise error and quit
|
|
- Error message: "Cannot generate tier {N} batch: no tier {N-1} articles found in project {project_id}"
|
|
- **Tier 1 with no money_site_url:** Raise error and quit
|
|
- Error message: "Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
|
|
- **Fewer lower-tier URLs than min requested:** Log warning and continue
|
|
- Warning: "Only {count} tier {N-1} articles available, requested min {min}. Using all available."
|
|
- Returns all available lower-tier URLs even if less than min
|
|
- **Empty content_records list:** Raise ValueError
|
|
- **Mixed tiers in content_records:** Raise ValueError
|
|
|
|
### Logging
|
|
- INFO: Log tier detection (e.g., "Batch is tier 2, querying tier 1 articles")
|
|
- INFO: Log link selection (e.g., "Selected 3 random tier 1 URLs from 15 available")
|
|
- WARNING: If fewer articles available than requested minimum
|
|
- ERROR: If no lower-tier articles found or money_site_url missing
|
|
|
|
## Tasks / Subtasks
|
|
|
|
### 1. Create Article Links Table
|
|
**Effort:** 2 story points
|
|
|
|
- [ ] Create migration script for `article_links` table:
|
|
- `id` (primary key, auto-increment)
|
|
- `from_content_id` (foreign key to generated_content.id, indexed)
|
|
- `to_content_id` (foreign key to generated_content.id, indexed)
|
|
- `to_url` (text, nullable - for money site URLs that aren't in our DB)
|
|
- `link_type` (varchar: "tiered", "wheel_next", "wheel_prev", "homepage")
|
|
- `created_at` (timestamp)
|
|
- [ ] Add unique constraint on (from_content_id, to_content_id, link_type) to prevent duplicates
|
|
- [ ] Create `ArticleLink` model in `src/database/models.py`
|
|
- [ ] Test migration on development database
|
|
|
|
### 2. Create Article Links Repository
|
|
**Effort:** 2 story points
|
|
|
|
- [ ] Create `IArticleLinkRepository` interface in `src/database/interfaces.py`:
|
|
- `create(from_content_id, to_content_id, to_url, link_type) -> ArticleLink`
|
|
- `get_by_source_article(from_content_id) -> List[ArticleLink]`
|
|
- `get_by_target_article(to_content_id) -> List[ArticleLink]`
|
|
- `get_by_link_type(link_type) -> List[ArticleLink]`
|
|
- `delete(link_id) -> bool`
|
|
- [ ] Implement `ArticleLinkRepository` in `src/database/repositories.py`
|
|
- [ ] Handle both internal links (to_content_id) and external links (to_url for money site)
|
|
|
|
### 3. Extend Job Configuration Schema
|
|
**Effort:** 1 story point
|
|
|
|
- [ ] Add `tiered_link_count_range: Optional[Dict]` to job config schema
|
|
- [ ] Default: `{min: 2, max: 4}` if not specified
|
|
- [ ] Validation: min >= 1, max >= min
|
|
- [ ] Example: `{"tiered_link_count_range": {"min": 3, "max": 6}}`
|
|
|
|
### 4. Add Money Site URL to Project
|
|
**Effort:** 1 story point
|
|
|
|
- [ ] Add `money_site_url` field to Project model (nullable string, indexed)
|
|
- [ ] Create migration script to add column to existing projects table
|
|
- [ ] Update ProjectRepository.create() to accept money_site_url parameter
|
|
- [ ] Test migration on development database
|
|
|
|
### 5. Implement Tiered Link Finder
|
|
**Effort:** 3 story points
|
|
|
|
- [ ] Create new module: `src/interlinking/tiered_links.py`
|
|
- [ ] Implement `find_tiered_links()` function:
|
|
- Validate content_records is not empty
|
|
- Validate all records are same tier
|
|
- Detect tier from first record
|
|
- Handle Tier 1 case (money site)
|
|
- Handle Tier 2+ case (lower-tier articles)
|
|
- Apply link count range configuration
|
|
- Generate URLs using `url_generator.generate_urls_for_batch()`
|
|
- Return formatted result
|
|
- [ ] Implement `_select_random_count(min_count: int, max_count: int) -> int` helper
|
|
- [ ] Implement `_validate_batch_tier(content_records: List[GeneratedContent]) -> int` helper
|
|
|
|
### 6. Unit Tests
|
|
**Effort:** 4 story points
|
|
|
|
- [ ] Test ArticleLink model creation and relationships
|
|
- [ ] Test ArticleLinkRepository CRUD operations
|
|
- [ ] Test duplicate link prevention (unique constraint)
|
|
- [ ] Test Tier 1 batch returns money_site_url
|
|
- [ ] Test Tier 1 batch with missing money_site_url raises error
|
|
- [ ] Test Tier 2 batch queries Tier 1 articles from same project only
|
|
- [ ] Test Tier 3 batch queries Tier 2 articles
|
|
- [ ] Test random selection with default range (2-4)
|
|
- [ ] Test custom link count range from job config
|
|
- [ ] Test exact count (min == max)
|
|
- [ ] Test empty content_records raises error
|
|
- [ ] Test mixed tiers in batch raises error
|
|
- [ ] Test no lower-tier articles available raises error
|
|
- [ ] Test fewer lower-tier articles than min logs warning and continues
|
|
- [ ] Mock GeneratedContent, Project, and URL generation
|
|
- [ ] Achieve >85% code coverage
|
|
|
|
### 7. Integration Tests
|
|
**Effort:** 2 story points
|
|
|
|
- [ ] Test article_links table migration and constraints
|
|
- [ ] Test full flow with real database: create T1 articles, then query for T2 batch
|
|
- [ ] Test with multiple projects to verify same-project filtering
|
|
- [ ] Test URL generation integration with Story 3.1 url_generator
|
|
- [ ] Test with different link count configurations
|
|
- [ ] Verify lower-tier article selection is truly random
|
|
- [ ] Test storing links in article_links table (for Story 3.3/4.2 usage)
|
|
|
|
## Technical Notes
|
|
|
|
### Article Links Table Schema
|
|
```sql
|
|
CREATE TABLE article_links (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
from_content_id INTEGER NOT NULL,
|
|
to_content_id INTEGER NULL,
|
|
to_url TEXT NULL,
|
|
anchor_text TEXT NULL,
|
|
link_type VARCHAR(20) NOT NULL,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
FOREIGN KEY (from_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
|
|
FOREIGN KEY (to_content_id) REFERENCES generated_content(id) ON DELETE CASCADE,
|
|
UNIQUE (from_content_id, to_content_id, link_type),
|
|
CHECK (to_content_id IS NOT NULL OR to_url IS NOT NULL)
|
|
);
|
|
|
|
CREATE INDEX idx_article_links_from ON article_links(from_content_id);
|
|
CREATE INDEX idx_article_links_to ON article_links(to_content_id);
|
|
CREATE INDEX idx_article_links_type ON article_links(link_type);
|
|
```
|
|
|
|
**Note:** The `anchor_text` field was added in Story 4.5 to store the actual anchor text used for each link, improving query performance and data integrity.
|
|
|
|
**Link Types:**
|
|
- `tiered`: Link from tier N article to tier N-1 article (or money site for tier 1)
|
|
- `wheel_next`: Link to next article in batch wheel
|
|
- `wheel_prev`: Link to previous article in batch wheel
|
|
- `homepage`: Link to site homepage
|
|
|
|
**Usage:**
|
|
- For tier 1 articles linking to money site: `to_content_id = NULL`, `to_url = money_site_url`
|
|
- For tier 2+ linking to lower tiers: `to_content_id = lower_tier_article.id`, `to_url = NULL`
|
|
- For wheel/homepage links: `to_content_id = other_article.id`, `to_url = NULL`
|
|
|
|
### ArticleLink Model
|
|
```python
|
|
class ArticleLink(Base):
|
|
__tablename__ = "article_links"
|
|
|
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
|
from_content_id: Mapped[int] = mapped_column(
|
|
Integer,
|
|
ForeignKey('generated_content.id', ondelete='CASCADE'),
|
|
nullable=False,
|
|
index=True
|
|
)
|
|
to_content_id: Mapped[Optional[int]] = mapped_column(
|
|
Integer,
|
|
ForeignKey('generated_content.id', ondelete='CASCADE'),
|
|
nullable=True,
|
|
index=True
|
|
)
|
|
to_url: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
|
|
anchor_text: Mapped[Optional[str]] = mapped_column(Text, nullable=True) # Added in Story 4.5
|
|
link_type: Mapped[str] = mapped_column(String(20), nullable=False, index=True)
|
|
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow, nullable=False)
|
|
```
|
|
|
|
### Project Model Extension
|
|
```python
|
|
# Add to Project model in src/database/models.py
|
|
class Project(Base):
|
|
# ... existing fields ...
|
|
money_site_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True, index=True)
|
|
```
|
|
|
|
```sql
|
|
-- Migration script to add money_site_url to projects table
|
|
ALTER TABLE projects ADD COLUMN money_site_url VARCHAR(500) NULL;
|
|
CREATE INDEX idx_projects_money_site_url ON projects(money_site_url);
|
|
```
|
|
|
|
### ArticleLink Repository Usage Examples
|
|
```python
|
|
# Story 3.3: Record wheel link
|
|
link_repo.create(
|
|
from_content_id=article_a.id,
|
|
to_content_id=article_b.id,
|
|
to_url=None,
|
|
anchor_text="Next Article",
|
|
link_type="wheel_next"
|
|
)
|
|
|
|
# Story 4.2: Record tier 1 article linking to money site
|
|
link_repo.create(
|
|
from_content_id=tier1_article.id,
|
|
to_content_id=None,
|
|
to_url="https://www.moneysite.com",
|
|
anchor_text="expert services", # Added in Story 4.5
|
|
link_type="tiered"
|
|
)
|
|
|
|
# Story 4.2: Record tier 2 article linking to tier 1 article
|
|
link_repo.create(
|
|
from_content_id=tier2_article.id,
|
|
to_content_id=tier1_article.id,
|
|
to_url=None,
|
|
anchor_text="learn more", # Added in Story 4.5
|
|
link_type="tiered"
|
|
)
|
|
|
|
# Query all outbound links from an article
|
|
outbound_links = link_repo.get_by_source_article(article.id)
|
|
|
|
# Query all articles that link TO a specific article
|
|
inbound_links = link_repo.get_by_target_article(article.id)
|
|
```
|
|
|
|
### Job Configuration Example
|
|
```json
|
|
{
|
|
"job_name": "Test Batch",
|
|
"project_id": 2,
|
|
"tiered_link_count_range": {
|
|
"min": 3,
|
|
"max": 5
|
|
},
|
|
"tiers": [
|
|
{
|
|
"tier": 2,
|
|
"article_count": 20
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Function Signature
|
|
```python
|
|
def find_tiered_links(
|
|
content_records: List[GeneratedContent],
|
|
job_config: JobConfig,
|
|
project_repo: IProjectRepository,
|
|
content_repo: IGeneratedContentRepository,
|
|
site_repo: ISiteDeploymentRepository
|
|
) -> Dict:
|
|
"""
|
|
Find tiered links for a batch of articles
|
|
|
|
Args:
|
|
content_records: Batch of articles (all same tier, same project)
|
|
job_config: Job configuration with optional link count range
|
|
project_repo: For retrieving money_site_url
|
|
content_repo: For querying lower-tier articles
|
|
site_repo: For URL generation
|
|
|
|
Returns:
|
|
Tier 1: {tier: 1, money_site_url: "https://..."}
|
|
Tier 2+: {tier: N, lower_tier_urls: [...], lower_tier: N-1}
|
|
|
|
Raises:
|
|
ValueError: If batch is invalid or required data is missing
|
|
"""
|
|
pass
|
|
```
|
|
|
|
### Implementation Example
|
|
```python
|
|
import random
|
|
import logging
|
|
from typing import List, Dict
|
|
from src.database.models import GeneratedContent
|
|
from src.generation.url_generator import generate_urls_for_batch
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
def find_tiered_links(content_records, job_config, project_repo, content_repo, site_repo):
|
|
if not content_records:
|
|
raise ValueError("content_records cannot be empty")
|
|
|
|
tier = _validate_batch_tier(content_records)
|
|
project_id = content_records[0].project_id
|
|
|
|
logger.info(f"Finding tiered links for tier {tier} batch (project {project_id})")
|
|
|
|
if tier == 1:
|
|
project = project_repo.get_by_id(project_id)
|
|
if not project or not project.money_site_url:
|
|
raise ValueError(
|
|
f"Cannot generate tier 1 batch: money_site_url not set in project {project_id}"
|
|
)
|
|
return {
|
|
"tier": 1,
|
|
"money_site_url": project.money_site_url
|
|
}
|
|
|
|
lower_tier = tier - 1
|
|
logger.info(f"Batch is tier {tier}, querying tier {lower_tier} articles")
|
|
|
|
lower_tier_articles = content_repo.get_by_project_and_tier(project_id, lower_tier)
|
|
|
|
if not lower_tier_articles:
|
|
raise ValueError(
|
|
f"Cannot generate tier {tier} batch: no tier {lower_tier} articles found in project {project_id}"
|
|
)
|
|
|
|
link_range = job_config.get("tiered_link_count_range", {"min": 2, "max": 4})
|
|
min_count = link_range["min"]
|
|
max_count = link_range["max"]
|
|
|
|
available_count = len(lower_tier_articles)
|
|
desired_count = random.randint(min_count, max_count)
|
|
|
|
if available_count < min_count:
|
|
logger.warning(
|
|
f"Only {available_count} tier {lower_tier} articles available, "
|
|
f"requested min {min_count}. Using all available."
|
|
)
|
|
selected_articles = lower_tier_articles
|
|
else:
|
|
actual_count = min(desired_count, available_count)
|
|
selected_articles = random.sample(lower_tier_articles, actual_count)
|
|
|
|
logger.info(
|
|
f"Selected {len(selected_articles)} random tier {lower_tier} URLs "
|
|
f"from {available_count} available"
|
|
)
|
|
|
|
url_mappings = generate_urls_for_batch(selected_articles, site_repo)
|
|
lower_tier_urls = [mapping["url"] for mapping in url_mappings]
|
|
|
|
return {
|
|
"tier": tier,
|
|
"lower_tier": lower_tier,
|
|
"lower_tier_urls": lower_tier_urls
|
|
}
|
|
|
|
def _validate_batch_tier(content_records: List[GeneratedContent]) -> int:
|
|
tiers = set(record.tier for record in content_records)
|
|
if len(tiers) > 1:
|
|
raise ValueError(f"All articles in batch must be same tier, found: {tiers}")
|
|
return int(list(tiers)[0])
|
|
```
|
|
|
|
### Database Queries Needed
|
|
```python
|
|
def get_by_project_and_tier(self, project_id: int, tier: int) -> List[GeneratedContent]:
|
|
"""
|
|
Get all articles for a specific project and tier
|
|
|
|
Returns articles that have site_deployment_id set (from Story 3.1)
|
|
"""
|
|
return self.session.query(GeneratedContent)\
|
|
.filter(
|
|
GeneratedContent.project_id == project_id,
|
|
GeneratedContent.tier == tier,
|
|
GeneratedContent.site_deployment_id.isnot(None)
|
|
)\
|
|
.all()
|
|
```
|
|
|
|
### Return Value Examples
|
|
```python
|
|
# Tier 1 batch
|
|
{
|
|
"tier": 1,
|
|
"money_site_url": "https://www.mymoneysite.com"
|
|
}
|
|
|
|
# Tier 2 batch
|
|
{
|
|
"tier": 2,
|
|
"lower_tier": 1,
|
|
"lower_tier_urls": [
|
|
"https://site1.b-cdn.net/article-title-1.html",
|
|
"https://www.customdomain.com/article-title-2.html",
|
|
"https://site2.b-cdn.net/article-title-3.html"
|
|
]
|
|
}
|
|
|
|
# Tier 3 batch with custom range (8 links)
|
|
{
|
|
"tier": 3,
|
|
"lower_tier": 2,
|
|
"lower_tier_urls": [
|
|
"https://site3.b-cdn.net/...",
|
|
"https://site4.b-cdn.net/...",
|
|
# ... 6 more URLs
|
|
]
|
|
}
|
|
```
|
|
|
|
## Dependencies
|
|
- Story 3.1: Site assignment and URL generation must be complete
|
|
- Story 2.3: GeneratedContent records exist in database
|
|
- Story 1.x: Project and GeneratedContent tables exist
|
|
|
|
## Future Considerations
|
|
- Story 3.3 will use the tiered links found by this module for actual content injection
|
|
- Story 3.3 will populate article_links table with wheel and homepage link relationships
|
|
- Story 4.2 will use article_links table to log tiered link relationships after deployment
|
|
- Future: Intelligent link distribution (ensure even link spread across lower-tier articles)
|
|
- Future: Analytics dashboard showing link structure and tier relationships using article_links table
|
|
|
|
## Link Relationship Tracking
|
|
This story creates the `article_links` table infrastructure. The actual population of link relationships will happen in:
|
|
- **Story 3.3**: Stores wheel and homepage links when injecting them into content
|
|
- **Story 4.2**: Stores tiered links when logging final URLs after deployment
|
|
- The table enables future analytics on link distribution, tier structure, and interlinking patterns
|
|
|
|
## Total Effort
|
|
16 story points
|
|
|