Fixed NOT TESTED: now actually listens to # of links. Also makes See Also smaller.
parent
b168d33e2d
commit
083a8cacdd
|
|
@ -0,0 +1,247 @@
|
||||||
|
# Deploy-Batch Analysis for test_shaft_machining.json
|
||||||
|
|
||||||
|
## Quick Answers to Your Questions
|
||||||
|
|
||||||
|
### 1. What should the anchor text be at each level?
|
||||||
|
|
||||||
|
**Tier 1 Articles (5 articles):**
|
||||||
|
- **Money Site Links:** Uses `main_keyword` variations from project
|
||||||
|
- "shaft machining"
|
||||||
|
- "learn about shaft machining"
|
||||||
|
- "shaft machining guide"
|
||||||
|
- "best shaft machining"
|
||||||
|
- "shaft machining tips"
|
||||||
|
- System tries to find these phrases in content; picks first one that matches
|
||||||
|
|
||||||
|
- **Home Link:** Now in navigation menu (not injected into content)
|
||||||
|
|
||||||
|
- **See Also Links:** Uses article titles as anchor text
|
||||||
|
|
||||||
|
**Tier 2 Articles (20 articles):**
|
||||||
|
- **Lower Tier Links:** Uses `related_searches` from CORA data
|
||||||
|
- Depends on what related searches were in the shaft_machining.xlsx file
|
||||||
|
- If no related searches exist, falls back to main_keyword variations
|
||||||
|
|
||||||
|
- **Home Link:** Now in navigation menu (not injected into content)
|
||||||
|
|
||||||
|
- **See Also Links:** Uses article titles as anchor text
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- Anchor text rules come from `master.config.json` → `interlinking.tier_anchor_text_rules`
|
||||||
|
- Can be overridden in job config with `anchor_text_config`
|
||||||
|
|
||||||
|
### 2. How many links should be in each article?
|
||||||
|
|
||||||
|
**Tier 1 Articles:**
|
||||||
|
- 1 link to money site (https://fzemanufacturing.com/capabilities/shaft-machining-services)
|
||||||
|
- 4 "See Also" links (to the other 4 tier1 articles)
|
||||||
|
- **Total: 5 links per tier1 article** (plus Home in nav menu)
|
||||||
|
|
||||||
|
**Tier 2 Articles:**
|
||||||
|
- 2-4 links to tier1 articles (random selection, count is `interlinking.links_per_article_min` to `max`)
|
||||||
|
- 19 "See Also" links (to the other 19 tier2 articles)
|
||||||
|
- **Total: 21-23 links per tier2 article** (plus Home in nav menu)
|
||||||
|
|
||||||
|
**Your JSON Configuration:**
|
||||||
|
```json
|
||||||
|
"interlinking": {
|
||||||
|
"links_per_article_min": 2,
|
||||||
|
"links_per_article_max": 4
|
||||||
|
}
|
||||||
|
```
|
||||||
|
This controls the tiered links (tier2 → tier1). Each tier2 article will get between 2-4 random tier1 articles to link to.
|
||||||
|
|
||||||
|
### 3. Should "Home" be a link?
|
||||||
|
|
||||||
|
**YES** - Home is a link in the navigation menu at the top of every page.
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
- The HTML template (`basic.html`) includes a `<nav>` menu with Home link
|
||||||
|
- Template line 113: `<li><a href="/index.html">Home</a></li>`
|
||||||
|
- This is part of the template wrapper, not injected into article content
|
||||||
|
|
||||||
|
**Old behavior (now removed):**
|
||||||
|
- Previously, system searched article content for "Home" and tried to link it
|
||||||
|
- This was redundant since Home is already in the nav menu
|
||||||
|
- Code has been updated to remove this injection
|
||||||
|
|
||||||
|
## Step-by-Step: What Happens During deploy-batch
|
||||||
|
|
||||||
|
### Step 1: Load Articles from Database
|
||||||
|
```
|
||||||
|
- Project 1 has generated content already
|
||||||
|
- Tier 1: 5 articles
|
||||||
|
- Tier 2: 20 articles
|
||||||
|
- Each article has: title, content (HTML), site_deployment_id
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: URL Generation (already done during generate-batch)
|
||||||
|
```
|
||||||
|
Tier 1 URLs (round-robin between getcnc.info and textbullseye.com):
|
||||||
|
- Article 0: https://getcnc.info/{slug}.html
|
||||||
|
- Article 1: https://www.textbullseye.com/{slug}.html
|
||||||
|
- Article 2: https://getcnc.info/{slug}.html
|
||||||
|
- Article 3: https://www.textbullseye.com/{slug}.html
|
||||||
|
- Article 4: https://getcnc.info/{slug}.html
|
||||||
|
|
||||||
|
Tier 2 URLs (round-robin):
|
||||||
|
- Articles 0-19 distributed across both domains
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Tiered Links (already injected during generate-batch)
|
||||||
|
|
||||||
|
**For Tier 1:**
|
||||||
|
- Target: Money site URL from project database
|
||||||
|
- Anchor text: main_keyword variations
|
||||||
|
- Links already in `generated_content.content` HTML
|
||||||
|
|
||||||
|
**For Tier 2:**
|
||||||
|
- Target: Random selection of tier1 URLs (2-4 per article)
|
||||||
|
- Anchor text: related_searches from project
|
||||||
|
- Links already in HTML
|
||||||
|
|
||||||
|
### Step 4: Homepage Links
|
||||||
|
- Home link is in the navigation menu (template)
|
||||||
|
- No longer injected into article content
|
||||||
|
|
||||||
|
### Step 5: See Also Section (already injected)
|
||||||
|
- HTML section with links to other articles in same tier
|
||||||
|
|
||||||
|
### Step 6: Template Application (already done)
|
||||||
|
- HTML wrapped in template from `src/templating/templates/basic.html`
|
||||||
|
- Navigation menu added
|
||||||
|
- Stored in `generated_content.formatted_html`
|
||||||
|
|
||||||
|
### Step 7: Upload to Bunny.net
|
||||||
|
```
|
||||||
|
For each article:
|
||||||
|
1. Get site deployment credentials
|
||||||
|
2. Upload formatted_html to storage zone
|
||||||
|
3. File path: /{slug}.html
|
||||||
|
4. Log URL to deployment_logs/
|
||||||
|
5. Update database: deployed_url, status='deployed'
|
||||||
|
|
||||||
|
For each site's boilerplate pages:
|
||||||
|
1. Upload index.html (if exists)
|
||||||
|
2. Upload about.html
|
||||||
|
3. Upload contact.html
|
||||||
|
4. Upload privacy.html
|
||||||
|
```
|
||||||
|
|
||||||
|
## Database Link Tracking
|
||||||
|
|
||||||
|
All links are tracked in `article_links` table:
|
||||||
|
|
||||||
|
**Tier 1 Article Example (ID: 43):**
|
||||||
|
```
|
||||||
|
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||||
|
|-----------------|---------------|--------|-------------|-----------|
|
||||||
|
| 43 | NULL | https://fzemanufacturing.com/... | "shaft machining" | tiered |
|
||||||
|
| 43 | 44 | NULL | "Understanding CNC..." | wheel_see_also |
|
||||||
|
| 43 | 45 | NULL | "Advanced Shaft..." | wheel_see_also |
|
||||||
|
| 43 | 46 | NULL | "Precision Machining..." | wheel_see_also |
|
||||||
|
| 43 | 47 | NULL | "Modern Shaft..." | wheel_see_also |
|
||||||
|
```
|
||||||
|
|
||||||
|
**Tier 2 Article Example (ID: 48):**
|
||||||
|
```
|
||||||
|
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||||
|
|-----------------|---------------|--------|-------------|-----------|
|
||||||
|
| 48 | NULL | https://getcnc.info/{slug1}.html | "cnc machining services" | tiered |
|
||||||
|
| 48 | NULL | https://www.textbullseye.com/{slug2}.html | "precision shaft work" | tiered |
|
||||||
|
| 48 | NULL | https://getcnc.info/{slug3}.html | "shaft turning operations" | tiered |
|
||||||
|
| 48 | 49 | NULL | "Tier 2 Article 2 Title" | wheel_see_also |
|
||||||
|
| ... | ... | ... | ... | ... |
|
||||||
|
| 48 | 67 | NULL | "Tier 2 Article 20 Title" | wheel_see_also |
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** Home link is no longer tracked in the database since it's in the template, not injected into content.
|
||||||
|
|
||||||
|
## Your Specific JSON File Analysis
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"jobs": [
|
||||||
|
{
|
||||||
|
"project_id": 1,
|
||||||
|
"deployment_targets": [
|
||||||
|
"getcnc.info",
|
||||||
|
"www.textbullseye.com"
|
||||||
|
],
|
||||||
|
"tiers": {
|
||||||
|
"tier1": {
|
||||||
|
"count": 5,
|
||||||
|
"min_word_count": 1500,
|
||||||
|
"max_word_count": 2000,
|
||||||
|
"models": {
|
||||||
|
"title": "openai/gpt-4o-mini",
|
||||||
|
"outline": "openai/gpt-4o-mini",
|
||||||
|
"content": "anthropic/claude-3.5-sonnet"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"tier2": {
|
||||||
|
"count": 20,
|
||||||
|
"models": {
|
||||||
|
"title": "openai/gpt-4o-mini",
|
||||||
|
"outline": "openai/gpt-4o-mini",
|
||||||
|
"content": "openai/gpt-4o-mini"
|
||||||
|
},
|
||||||
|
"interlinking": {
|
||||||
|
"links_per_article_min": 2,
|
||||||
|
"links_per_article_max": 4
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**What This Configuration Does:**
|
||||||
|
|
||||||
|
1. **Tier 1 (5 articles):**
|
||||||
|
- Uses Claude Sonnet for content, GPT-4o-mini for titles/outlines
|
||||||
|
- 1500-2000 words per article
|
||||||
|
- Distributed across getcnc.info and textbullseye.com
|
||||||
|
- Each links to: money site (1) + See Also (4) = 5 total links (plus Home in nav menu)
|
||||||
|
|
||||||
|
2. **Tier 2 (20 articles):**
|
||||||
|
- Uses GPT-4o-mini for everything (cheaper)
|
||||||
|
- Default word count (1100-1500)
|
||||||
|
- Each links to: 2-4 tier1 articles + See Also (19) = 21-23 total links (plus Home in nav menu)
|
||||||
|
- Distributed across both domains
|
||||||
|
|
||||||
|
3. **Missing Configurations (using defaults):**
|
||||||
|
- `tier1.interlinking`: Not specified → uses defaults (but tier1 always gets 1 money site link anyway)
|
||||||
|
- `anchor_text_config`: Not specified → uses master.config.json rules
|
||||||
|
|
||||||
|
## All JSON Fields That Affect Behavior
|
||||||
|
|
||||||
|
See `MASTER_JSON.json` for the complete reference. Key fields:
|
||||||
|
|
||||||
|
**Top-level job fields:**
|
||||||
|
- `project_id` - Which project's data to use
|
||||||
|
- `deployment_targets` - Which domains to deploy to
|
||||||
|
- `models` - Which AI models to use
|
||||||
|
- `tiered_link_count_range` - How many tiered links (job-level default)
|
||||||
|
- `anchor_text_config` - Override anchor text generation
|
||||||
|
- `interlinking` - Job-level interlinking defaults
|
||||||
|
|
||||||
|
**Tier-level fields:**
|
||||||
|
- `count` - Number of articles
|
||||||
|
- `min_word_count`, `max_word_count` - Content length
|
||||||
|
- `min_h2_tags`, `max_h2_tags`, `min_h3_tags`, `max_h3_tags` - Outline structure
|
||||||
|
- `models` - Tier-specific model overrides
|
||||||
|
- `interlinking` - Tier-specific interlinking overrides
|
||||||
|
|
||||||
|
**Fields in master.config.json:**
|
||||||
|
- `interlinking.tier_anchor_text_rules` - Defines anchor text sources per tier
|
||||||
|
- `interlinking.include_home_link` - Global default for Home links
|
||||||
|
- `interlinking.wheel_links` - Enable/disable See Also sections
|
||||||
|
|
||||||
|
**Fields in project database:**
|
||||||
|
- `main_keyword` - Used for tier1 anchor text
|
||||||
|
- `related_searches` - Used for tier2 anchor text
|
||||||
|
- `entities` - Used for tier3+ anchor text
|
||||||
|
- `money_site_url` - Destination for tier1 links
|
||||||
|
|
||||||
|
|
@ -0,0 +1,161 @@
|
||||||
|
# Job Configuration Field Reference
|
||||||
|
|
||||||
|
## Quick Field List
|
||||||
|
|
||||||
|
### Job Level (applies to all tiers)
|
||||||
|
```
|
||||||
|
project_id - Required, integer
|
||||||
|
deployment_targets - Array of domain strings
|
||||||
|
tier1_preferred_sites - Array of domain strings (subset of deployment_targets)
|
||||||
|
auto_create_sites - Boolean (NOT IMPLEMENTED - parsed but doesn't work)
|
||||||
|
create_sites_for_keywords - Array of {keyword, count} objects (NOT IMPLEMENTED - parsed but doesn't work)
|
||||||
|
models - {title, outline, content} with model strings
|
||||||
|
tiered_link_count_range - {min, max} integers
|
||||||
|
anchor_text_config - {mode, custom_text}
|
||||||
|
failure_config - {max_consecutive_failures, skip_on_failure}
|
||||||
|
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max}
|
||||||
|
tiers - Required, object with tier1/tier2/tier3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tier Level (per tier configuration)
|
||||||
|
```
|
||||||
|
count - Required, integer (number of articles)
|
||||||
|
min_word_count - Integer
|
||||||
|
max_word_count - Integer
|
||||||
|
min_h2_tags - Integer
|
||||||
|
max_h2_tags - Integer
|
||||||
|
min_h3_tags - Integer
|
||||||
|
max_h3_tags - Integer
|
||||||
|
models - {title, outline, content} - overrides job-level
|
||||||
|
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max} - overrides job-level
|
||||||
|
```
|
||||||
|
|
||||||
|
## Field Behaviors
|
||||||
|
|
||||||
|
**deployment_targets**: Sites to deploy to (round-robin distribution)
|
||||||
|
|
||||||
|
**tier1_preferred_sites**: If set, tier1 only uses these sites
|
||||||
|
|
||||||
|
**models**: Use format "provider/model-name" (e.g., "openai/gpt-4o-mini")
|
||||||
|
|
||||||
|
**anchor_text_config**: Job-level only, applies to ALL tiers (no tier-specific option)
|
||||||
|
- "default" = Use master.config.json tier rules
|
||||||
|
- "override" = Replace with custom_text for all tiers
|
||||||
|
- "append" = Add custom_text to tier rules for all tiers
|
||||||
|
|
||||||
|
**tiered_link_count_range**: How many links to lower tier
|
||||||
|
- Tier1: Always 1 link to money site (this setting ignored)
|
||||||
|
- Tier2+: Random between min and max links to lower tier
|
||||||
|
|
||||||
|
**interlinking.links_per_article_min/max**: Same as tiered_link_count_range
|
||||||
|
|
||||||
|
**interlinking.see_also_min/max**: How many See Also links (default 4-5)
|
||||||
|
- Randomly selects this many articles from same tier for See Also section
|
||||||
|
|
||||||
|
## Defaults
|
||||||
|
|
||||||
|
If not specified, these defaults apply:
|
||||||
|
|
||||||
|
### Tier1 Defaults
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"min_word_count": 2000,
|
||||||
|
"max_word_count": 2500,
|
||||||
|
"min_h2_tags": 3,
|
||||||
|
"max_h2_tags": 5,
|
||||||
|
"min_h3_tags": 5,
|
||||||
|
"max_h3_tags": 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tier2 Defaults
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"min_word_count": 1100,
|
||||||
|
"max_word_count": 1500,
|
||||||
|
"min_h2_tags": 2,
|
||||||
|
"max_h2_tags": 4,
|
||||||
|
"min_h3_tags": 3,
|
||||||
|
"max_h3_tags": 8
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tier3 Defaults
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"min_word_count": 850,
|
||||||
|
"max_word_count": 1350,
|
||||||
|
"min_h2_tags": 2,
|
||||||
|
"max_h2_tags": 3,
|
||||||
|
"min_h3_tags": 2,
|
||||||
|
"max_h3_tags": 6
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Minimal Working Example
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"jobs": [{
|
||||||
|
"project_id": 1,
|
||||||
|
"deployment_targets": ["example.com"],
|
||||||
|
"tiers": {
|
||||||
|
"tier1": {"count": 5},
|
||||||
|
"tier2": {"count": 20}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Your Current Example
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"jobs": [{
|
||||||
|
"project_id": 1,
|
||||||
|
"deployment_targets": ["getcnc.info", "www.textbullseye.com"],
|
||||||
|
"tiers": {
|
||||||
|
"tier1": {
|
||||||
|
"count": 5,
|
||||||
|
"min_word_count": 1500,
|
||||||
|
"max_word_count": 2000,
|
||||||
|
"models": {
|
||||||
|
"title": "openai/gpt-4o-mini",
|
||||||
|
"outline": "openai/gpt-4o-mini",
|
||||||
|
"content": "anthropic/claude-3.5-sonnet"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"tier2": {
|
||||||
|
"count": 20,
|
||||||
|
"models": {
|
||||||
|
"title": "openai/gpt-4o-mini",
|
||||||
|
"outline": "openai/gpt-4o-mini",
|
||||||
|
"content": "openai/gpt-4o-mini"
|
||||||
|
},
|
||||||
|
"interlinking": {
|
||||||
|
"links_per_article_min": 2,
|
||||||
|
"links_per_article_max": 4
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Result Behavior
|
||||||
|
|
||||||
|
**Tier 1 Articles (5):**
|
||||||
|
- 1 link to money site
|
||||||
|
- 4 See Also links to other tier1 articles
|
||||||
|
- Home link in nav menu
|
||||||
|
|
||||||
|
**Tier 2 Articles (20):**
|
||||||
|
- 2-4 links to random tier1 articles
|
||||||
|
- 19 See Also links to other tier2 articles
|
||||||
|
- Home link in nav menu
|
||||||
|
|
||||||
|
**Anchor Text:**
|
||||||
|
- Tier1: Uses main_keyword from project
|
||||||
|
- Tier2: Uses related_searches from project
|
||||||
|
- Can override with anchor_text_config
|
||||||
|
|
||||||
|
|
@ -0,0 +1,34 @@
|
||||||
|
import sqlite3
|
||||||
|
|
||||||
|
conn = sqlite3.connect('content_automation.db')
|
||||||
|
cursor = conn.cursor()
|
||||||
|
|
||||||
|
# Get first tier2 article
|
||||||
|
cursor.execute('SELECT id FROM generated_content WHERE project_id=1 AND tier="tier2" LIMIT 1')
|
||||||
|
tier2_id = cursor.fetchone()[0]
|
||||||
|
|
||||||
|
# Count links by type
|
||||||
|
cursor.execute('''
|
||||||
|
SELECT link_type, COUNT(*)
|
||||||
|
FROM article_links
|
||||||
|
WHERE from_content_id=?
|
||||||
|
GROUP BY link_type
|
||||||
|
''', (tier2_id,))
|
||||||
|
|
||||||
|
print(f'Tier2 article {tier2_id} link counts:')
|
||||||
|
for row in cursor.fetchall():
|
||||||
|
print(f' {row[0]}: {row[1]}')
|
||||||
|
|
||||||
|
# Show the actual tiered links
|
||||||
|
cursor.execute('''
|
||||||
|
SELECT to_url, anchor_text
|
||||||
|
FROM article_links
|
||||||
|
WHERE from_content_id=? AND link_type="tiered"
|
||||||
|
''', (tier2_id,))
|
||||||
|
|
||||||
|
print(f'\nTiered links for article {tier2_id}:')
|
||||||
|
for i, row in enumerate(cursor.fetchall(), 1):
|
||||||
|
print(f' {i}. {row[1]} -> {row[0][:60]}...')
|
||||||
|
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
@ -365,6 +365,17 @@ class BatchProcessor:
|
||||||
click.echo(f" {tier_name}: No articles with site assignments to post-process")
|
click.echo(f" {tier_name}: No articles with site assignments to post-process")
|
||||||
return
|
return
|
||||||
|
|
||||||
|
# Skip articles already post-processed (idempotency check)
|
||||||
|
unprocessed = [a for a in content_records if not a.formatted_html]
|
||||||
|
|
||||||
|
if not unprocessed:
|
||||||
|
click.echo(f" {tier_name}: All {len(content_records)} articles already post-processed, skipping")
|
||||||
|
return
|
||||||
|
|
||||||
|
if len(unprocessed) < len(content_records):
|
||||||
|
click.echo(f" {tier_name}: Skipping {len(content_records) - len(unprocessed)} already processed articles")
|
||||||
|
|
||||||
|
content_records = unprocessed
|
||||||
click.echo(f" {tier_name}: Post-processing {len(content_records)} articles...")
|
click.echo(f" {tier_name}: Post-processing {len(content_records)} articles...")
|
||||||
|
|
||||||
# Step 1: Generate URLs (Story 3.1)
|
# Step 1: Generate URLs (Story 3.1)
|
||||||
|
|
|
||||||
|
|
@ -63,6 +63,8 @@ class InterlinkingConfig:
|
||||||
links_per_article_min: int = 2
|
links_per_article_min: int = 2
|
||||||
links_per_article_max: int = 4
|
links_per_article_max: int = 4
|
||||||
include_home_link: bool = True
|
include_home_link: bool = True
|
||||||
|
see_also_min: int = 4
|
||||||
|
see_also_max: int = 5
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|
@ -265,16 +267,24 @@ class JobConfig:
|
||||||
min_links = interlinking_data.get("links_per_article_min", 2)
|
min_links = interlinking_data.get("links_per_article_min", 2)
|
||||||
max_links = interlinking_data.get("links_per_article_max", 4)
|
max_links = interlinking_data.get("links_per_article_max", 4)
|
||||||
include_home = interlinking_data.get("include_home_link", True)
|
include_home = interlinking_data.get("include_home_link", True)
|
||||||
|
see_also_min = interlinking_data.get("see_also_min", 4)
|
||||||
|
see_also_max = interlinking_data.get("see_also_max", 5)
|
||||||
if not isinstance(min_links, int) or min_links < 0:
|
if not isinstance(min_links, int) or min_links < 0:
|
||||||
raise ValueError("'interlinking' links_per_article_min must be a non-negative integer")
|
raise ValueError("'interlinking' links_per_article_min must be a non-negative integer")
|
||||||
if not isinstance(max_links, int) or max_links < min_links:
|
if not isinstance(max_links, int) or max_links < min_links:
|
||||||
raise ValueError("'interlinking' links_per_article_max must be >= links_per_article_min")
|
raise ValueError("'interlinking' links_per_article_max must be >= links_per_article_min")
|
||||||
if not isinstance(include_home, bool):
|
if not isinstance(include_home, bool):
|
||||||
raise ValueError("'interlinking' include_home_link must be a boolean")
|
raise ValueError("'interlinking' include_home_link must be a boolean")
|
||||||
|
if not isinstance(see_also_min, int) or see_also_min < 0:
|
||||||
|
raise ValueError("'interlinking' see_also_min must be a non-negative integer")
|
||||||
|
if not isinstance(see_also_max, int) or see_also_max < see_also_min:
|
||||||
|
raise ValueError("'interlinking' see_also_max must be >= see_also_min")
|
||||||
interlinking = InterlinkingConfig(
|
interlinking = InterlinkingConfig(
|
||||||
links_per_article_min=min_links,
|
links_per_article_min=min_links,
|
||||||
links_per_article_max=max_links,
|
links_per_article_max=max_links,
|
||||||
include_home_link=include_home
|
include_home_link=include_home,
|
||||||
|
see_also_min=see_also_min,
|
||||||
|
see_also_max=see_also_max
|
||||||
)
|
)
|
||||||
|
|
||||||
return Job(
|
return Job(
|
||||||
|
|
|
||||||
|
|
@ -64,14 +64,11 @@ def inject_interlinks(
|
||||||
html, content, tiered_links, project, job_config, link_repo
|
html, content, tiered_links, project, job_config, link_repo
|
||||||
)
|
)
|
||||||
|
|
||||||
# Inject homepage link
|
# Note: Home link is now in the navigation menu (template), no need to inject into content
|
||||||
html = _inject_homepage_link(
|
|
||||||
html, content, article_url, project, link_repo
|
|
||||||
)
|
|
||||||
|
|
||||||
# Inject See Also section
|
# Inject See Also section
|
||||||
html = _inject_see_also_section(
|
html = _inject_see_also_section(
|
||||||
html, content, article_urls, link_repo
|
html, content, article_urls, link_repo, job_config
|
||||||
)
|
)
|
||||||
|
|
||||||
# Update content in database
|
# Update content in database
|
||||||
|
|
@ -199,9 +196,10 @@ def _inject_see_also_section(
|
||||||
html: str,
|
html: str,
|
||||||
content: GeneratedContent,
|
content: GeneratedContent,
|
||||||
article_urls: List[Dict],
|
article_urls: List[Dict],
|
||||||
link_repo: ArticleLinkRepository
|
link_repo: ArticleLinkRepository,
|
||||||
|
job_config=None
|
||||||
) -> str:
|
) -> str:
|
||||||
"""Inject See Also section with all other batch articles"""
|
"""Inject See Also section with random selection of batch articles"""
|
||||||
# Get all other articles (excluding current)
|
# Get all other articles (excluding current)
|
||||||
other_articles = [a for a in article_urls if a['content_id'] != content.id]
|
other_articles = [a for a in article_urls if a['content_id'] != content.id]
|
||||||
|
|
||||||
|
|
@ -209,9 +207,18 @@ def _inject_see_also_section(
|
||||||
logger.info(f"No other articles for See Also section in content {content.id}")
|
logger.info(f"No other articles for See Also section in content {content.id}")
|
||||||
return html
|
return html
|
||||||
|
|
||||||
|
# Get See Also link count (default 4-5)
|
||||||
|
see_also_config = _get_see_also_config(job_config)
|
||||||
|
min_links = see_also_config['min']
|
||||||
|
max_links = see_also_config['max']
|
||||||
|
|
||||||
|
# Select random articles
|
||||||
|
count = min(random.randint(min_links, max_links), len(other_articles))
|
||||||
|
selected_articles = random.sample(other_articles, count)
|
||||||
|
|
||||||
# Build See Also HTML
|
# Build See Also HTML
|
||||||
see_also_html = "<h3>See Also</h3>\n<ul>\n"
|
see_also_html = "<h3>See Also</h3>\n<ul>\n"
|
||||||
for article in other_articles:
|
for article in selected_articles:
|
||||||
see_also_html += f' <li><a href="{article["url"]}">{article["title"]}</a></li>\n'
|
see_also_html += f' <li><a href="{article["url"]}">{article["title"]}</a></li>\n'
|
||||||
see_also_html += "</ul>\n"
|
see_also_html += "</ul>\n"
|
||||||
|
|
||||||
|
|
@ -219,7 +226,7 @@ def _inject_see_also_section(
|
||||||
html = _insert_before_closing_tags(html, see_also_html)
|
html = _insert_before_closing_tags(html, see_also_html)
|
||||||
|
|
||||||
# Record links
|
# Record links
|
||||||
for article in other_articles:
|
for article in selected_articles:
|
||||||
link_repo.create(
|
link_repo.create(
|
||||||
from_content_id=content.id,
|
from_content_id=content.id,
|
||||||
to_content_id=article['content_id'],
|
to_content_id=article['content_id'],
|
||||||
|
|
@ -228,10 +235,41 @@ def _inject_see_also_section(
|
||||||
link_type="wheel_see_also"
|
link_type="wheel_see_also"
|
||||||
)
|
)
|
||||||
|
|
||||||
logger.info(f"Injected See Also section with {len(other_articles)} links for content {content.id}")
|
logger.info(f"Injected See Also section with {len(selected_articles)} links for content {content.id}")
|
||||||
return html
|
return html
|
||||||
|
|
||||||
|
|
||||||
|
def _get_see_also_config(job_config) -> Dict[str, int]:
|
||||||
|
"""Get See Also link count config, default 4-5"""
|
||||||
|
default_config = {"min": 4, "max": 5}
|
||||||
|
|
||||||
|
if job_config is None:
|
||||||
|
return default_config
|
||||||
|
|
||||||
|
# Check for see_also_min/max in interlinking config
|
||||||
|
interlinking = None
|
||||||
|
if hasattr(job_config, 'interlinking'):
|
||||||
|
interlinking = job_config.interlinking
|
||||||
|
elif isinstance(job_config, dict):
|
||||||
|
interlinking = job_config.get('interlinking')
|
||||||
|
|
||||||
|
if not interlinking:
|
||||||
|
return default_config
|
||||||
|
|
||||||
|
# Get min/max from interlinking config
|
||||||
|
if isinstance(interlinking, dict):
|
||||||
|
min_val = interlinking.get('see_also_min')
|
||||||
|
max_val = interlinking.get('see_also_max')
|
||||||
|
else:
|
||||||
|
min_val = getattr(interlinking, 'see_also_min', None)
|
||||||
|
max_val = getattr(interlinking, 'see_also_max', None)
|
||||||
|
|
||||||
|
if min_val is not None and max_val is not None:
|
||||||
|
return {"min": min_val, "max": max_val}
|
||||||
|
|
||||||
|
return default_config
|
||||||
|
|
||||||
|
|
||||||
def _get_anchor_texts_for_tier(
|
def _get_anchor_texts_for_tier(
|
||||||
tier: str,
|
tier: str,
|
||||||
project: Project,
|
project: Project,
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,34 @@
|
||||||
|
from src.database.session import db_manager
|
||||||
|
from src.database.repositories import GeneratedContentRepository, ProjectRepository, SiteDeploymentRepository
|
||||||
|
from src.interlinking.tiered_links import find_tiered_links
|
||||||
|
from src.generation.job_config import JobConfig
|
||||||
|
|
||||||
|
session = db_manager.get_session()
|
||||||
|
content_repo = GeneratedContentRepository(session)
|
||||||
|
project_repo = ProjectRepository(session)
|
||||||
|
site_repo = SiteDeploymentRepository(session)
|
||||||
|
|
||||||
|
# Get tier2 articles
|
||||||
|
tier2_articles = content_repo.get_by_project_and_tier(1, "tier2")
|
||||||
|
print(f"Found {len(tier2_articles)} tier2 articles")
|
||||||
|
|
||||||
|
# Load job config
|
||||||
|
job_config = JobConfig("jobs/test_shaft_machining.json")
|
||||||
|
job = job_config.get_jobs()[0]
|
||||||
|
|
||||||
|
print(f"\nJob config:")
|
||||||
|
print(f" tiered_link_count_range: {job.tiered_link_count_range}")
|
||||||
|
print(f" interlinking: {job.interlinking}")
|
||||||
|
|
||||||
|
# Test the function
|
||||||
|
print(f"\nCalling find_tiered_links()...")
|
||||||
|
result = find_tiered_links(tier2_articles, job, project_repo, content_repo, site_repo)
|
||||||
|
|
||||||
|
print(f"\nResult:")
|
||||||
|
print(f" tier: {result.get('tier')}")
|
||||||
|
print(f" lower_tier: {result.get('lower_tier')}")
|
||||||
|
print(f" Number of URLs selected: {len(result.get('lower_tier_urls', []))}")
|
||||||
|
print(f" URLs: {result.get('lower_tier_urls', [])}")
|
||||||
|
|
||||||
|
session.close()
|
||||||
|
|
||||||
Loading…
Reference in New Issue