Fixed NOT TESTED: now actually listens to # of links. Also makes See Also smaller.
parent
b168d33e2d
commit
083a8cacdd
|
|
@ -0,0 +1,247 @@
|
|||
# Deploy-Batch Analysis for test_shaft_machining.json
|
||||
|
||||
## Quick Answers to Your Questions
|
||||
|
||||
### 1. What should the anchor text be at each level?
|
||||
|
||||
**Tier 1 Articles (5 articles):**
|
||||
- **Money Site Links:** Uses `main_keyword` variations from project
|
||||
- "shaft machining"
|
||||
- "learn about shaft machining"
|
||||
- "shaft machining guide"
|
||||
- "best shaft machining"
|
||||
- "shaft machining tips"
|
||||
- System tries to find these phrases in content; picks first one that matches
|
||||
|
||||
- **Home Link:** Now in navigation menu (not injected into content)
|
||||
|
||||
- **See Also Links:** Uses article titles as anchor text
|
||||
|
||||
**Tier 2 Articles (20 articles):**
|
||||
- **Lower Tier Links:** Uses `related_searches` from CORA data
|
||||
- Depends on what related searches were in the shaft_machining.xlsx file
|
||||
- If no related searches exist, falls back to main_keyword variations
|
||||
|
||||
- **Home Link:** Now in navigation menu (not injected into content)
|
||||
|
||||
- **See Also Links:** Uses article titles as anchor text
|
||||
|
||||
**Configuration:**
|
||||
- Anchor text rules come from `master.config.json` → `interlinking.tier_anchor_text_rules`
|
||||
- Can be overridden in job config with `anchor_text_config`
|
||||
|
||||
### 2. How many links should be in each article?
|
||||
|
||||
**Tier 1 Articles:**
|
||||
- 1 link to money site (https://fzemanufacturing.com/capabilities/shaft-machining-services)
|
||||
- 4 "See Also" links (to the other 4 tier1 articles)
|
||||
- **Total: 5 links per tier1 article** (plus Home in nav menu)
|
||||
|
||||
**Tier 2 Articles:**
|
||||
- 2-4 links to tier1 articles (random selection, count is `interlinking.links_per_article_min` to `max`)
|
||||
- 19 "See Also" links (to the other 19 tier2 articles)
|
||||
- **Total: 21-23 links per tier2 article** (plus Home in nav menu)
|
||||
|
||||
**Your JSON Configuration:**
|
||||
```json
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
```
|
||||
This controls the tiered links (tier2 → tier1). Each tier2 article will get between 2-4 random tier1 articles to link to.
|
||||
|
||||
### 3. Should "Home" be a link?
|
||||
|
||||
**YES** - Home is a link in the navigation menu at the top of every page.
|
||||
|
||||
**How it works:**
|
||||
- The HTML template (`basic.html`) includes a `<nav>` menu with Home link
|
||||
- Template line 113: `<li><a href="/index.html">Home</a></li>`
|
||||
- This is part of the template wrapper, not injected into article content
|
||||
|
||||
**Old behavior (now removed):**
|
||||
- Previously, system searched article content for "Home" and tried to link it
|
||||
- This was redundant since Home is already in the nav menu
|
||||
- Code has been updated to remove this injection
|
||||
|
||||
## Step-by-Step: What Happens During deploy-batch
|
||||
|
||||
### Step 1: Load Articles from Database
|
||||
```
|
||||
- Project 1 has generated content already
|
||||
- Tier 1: 5 articles
|
||||
- Tier 2: 20 articles
|
||||
- Each article has: title, content (HTML), site_deployment_id
|
||||
```
|
||||
|
||||
### Step 2: URL Generation (already done during generate-batch)
|
||||
```
|
||||
Tier 1 URLs (round-robin between getcnc.info and textbullseye.com):
|
||||
- Article 0: https://getcnc.info/{slug}.html
|
||||
- Article 1: https://www.textbullseye.com/{slug}.html
|
||||
- Article 2: https://getcnc.info/{slug}.html
|
||||
- Article 3: https://www.textbullseye.com/{slug}.html
|
||||
- Article 4: https://getcnc.info/{slug}.html
|
||||
|
||||
Tier 2 URLs (round-robin):
|
||||
- Articles 0-19 distributed across both domains
|
||||
```
|
||||
|
||||
### Step 3: Tiered Links (already injected during generate-batch)
|
||||
|
||||
**For Tier 1:**
|
||||
- Target: Money site URL from project database
|
||||
- Anchor text: main_keyword variations
|
||||
- Links already in `generated_content.content` HTML
|
||||
|
||||
**For Tier 2:**
|
||||
- Target: Random selection of tier1 URLs (2-4 per article)
|
||||
- Anchor text: related_searches from project
|
||||
- Links already in HTML
|
||||
|
||||
### Step 4: Homepage Links
|
||||
- Home link is in the navigation menu (template)
|
||||
- No longer injected into article content
|
||||
|
||||
### Step 5: See Also Section (already injected)
|
||||
- HTML section with links to other articles in same tier
|
||||
|
||||
### Step 6: Template Application (already done)
|
||||
- HTML wrapped in template from `src/templating/templates/basic.html`
|
||||
- Navigation menu added
|
||||
- Stored in `generated_content.formatted_html`
|
||||
|
||||
### Step 7: Upload to Bunny.net
|
||||
```
|
||||
For each article:
|
||||
1. Get site deployment credentials
|
||||
2. Upload formatted_html to storage zone
|
||||
3. File path: /{slug}.html
|
||||
4. Log URL to deployment_logs/
|
||||
5. Update database: deployed_url, status='deployed'
|
||||
|
||||
For each site's boilerplate pages:
|
||||
1. Upload index.html (if exists)
|
||||
2. Upload about.html
|
||||
3. Upload contact.html
|
||||
4. Upload privacy.html
|
||||
```
|
||||
|
||||
## Database Link Tracking
|
||||
|
||||
All links are tracked in `article_links` table:
|
||||
|
||||
**Tier 1 Article Example (ID: 43):**
|
||||
```
|
||||
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||
|-----------------|---------------|--------|-------------|-----------|
|
||||
| 43 | NULL | https://fzemanufacturing.com/... | "shaft machining" | tiered |
|
||||
| 43 | 44 | NULL | "Understanding CNC..." | wheel_see_also |
|
||||
| 43 | 45 | NULL | "Advanced Shaft..." | wheel_see_also |
|
||||
| 43 | 46 | NULL | "Precision Machining..." | wheel_see_also |
|
||||
| 43 | 47 | NULL | "Modern Shaft..." | wheel_see_also |
|
||||
```
|
||||
|
||||
**Tier 2 Article Example (ID: 48):**
|
||||
```
|
||||
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|
||||
|-----------------|---------------|--------|-------------|-----------|
|
||||
| 48 | NULL | https://getcnc.info/{slug1}.html | "cnc machining services" | tiered |
|
||||
| 48 | NULL | https://www.textbullseye.com/{slug2}.html | "precision shaft work" | tiered |
|
||||
| 48 | NULL | https://getcnc.info/{slug3}.html | "shaft turning operations" | tiered |
|
||||
| 48 | 49 | NULL | "Tier 2 Article 2 Title" | wheel_see_also |
|
||||
| ... | ... | ... | ... | ... |
|
||||
| 48 | 67 | NULL | "Tier 2 Article 20 Title" | wheel_see_also |
|
||||
```
|
||||
|
||||
**Note:** Home link is no longer tracked in the database since it's in the template, not injected into content.
|
||||
|
||||
## Your Specific JSON File Analysis
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"project_id": 1,
|
||||
"deployment_targets": [
|
||||
"getcnc.info",
|
||||
"www.textbullseye.com"
|
||||
],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
},
|
||||
"tier2": {
|
||||
"count": 20,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "openai/gpt-4o-mini"
|
||||
},
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**What This Configuration Does:**
|
||||
|
||||
1. **Tier 1 (5 articles):**
|
||||
- Uses Claude Sonnet for content, GPT-4o-mini for titles/outlines
|
||||
- 1500-2000 words per article
|
||||
- Distributed across getcnc.info and textbullseye.com
|
||||
- Each links to: money site (1) + See Also (4) = 5 total links (plus Home in nav menu)
|
||||
|
||||
2. **Tier 2 (20 articles):**
|
||||
- Uses GPT-4o-mini for everything (cheaper)
|
||||
- Default word count (1100-1500)
|
||||
- Each links to: 2-4 tier1 articles + See Also (19) = 21-23 total links (plus Home in nav menu)
|
||||
- Distributed across both domains
|
||||
|
||||
3. **Missing Configurations (using defaults):**
|
||||
- `tier1.interlinking`: Not specified → uses defaults (but tier1 always gets 1 money site link anyway)
|
||||
- `anchor_text_config`: Not specified → uses master.config.json rules
|
||||
|
||||
## All JSON Fields That Affect Behavior
|
||||
|
||||
See `MASTER_JSON.json` for the complete reference. Key fields:
|
||||
|
||||
**Top-level job fields:**
|
||||
- `project_id` - Which project's data to use
|
||||
- `deployment_targets` - Which domains to deploy to
|
||||
- `models` - Which AI models to use
|
||||
- `tiered_link_count_range` - How many tiered links (job-level default)
|
||||
- `anchor_text_config` - Override anchor text generation
|
||||
- `interlinking` - Job-level interlinking defaults
|
||||
|
||||
**Tier-level fields:**
|
||||
- `count` - Number of articles
|
||||
- `min_word_count`, `max_word_count` - Content length
|
||||
- `min_h2_tags`, `max_h2_tags`, `min_h3_tags`, `max_h3_tags` - Outline structure
|
||||
- `models` - Tier-specific model overrides
|
||||
- `interlinking` - Tier-specific interlinking overrides
|
||||
|
||||
**Fields in master.config.json:**
|
||||
- `interlinking.tier_anchor_text_rules` - Defines anchor text sources per tier
|
||||
- `interlinking.include_home_link` - Global default for Home links
|
||||
- `interlinking.wheel_links` - Enable/disable See Also sections
|
||||
|
||||
**Fields in project database:**
|
||||
- `main_keyword` - Used for tier1 anchor text
|
||||
- `related_searches` - Used for tier2 anchor text
|
||||
- `entities` - Used for tier3+ anchor text
|
||||
- `money_site_url` - Destination for tier1 links
|
||||
|
||||
|
|
@ -0,0 +1,161 @@
|
|||
# Job Configuration Field Reference
|
||||
|
||||
## Quick Field List
|
||||
|
||||
### Job Level (applies to all tiers)
|
||||
```
|
||||
project_id - Required, integer
|
||||
deployment_targets - Array of domain strings
|
||||
tier1_preferred_sites - Array of domain strings (subset of deployment_targets)
|
||||
auto_create_sites - Boolean (NOT IMPLEMENTED - parsed but doesn't work)
|
||||
create_sites_for_keywords - Array of {keyword, count} objects (NOT IMPLEMENTED - parsed but doesn't work)
|
||||
models - {title, outline, content} with model strings
|
||||
tiered_link_count_range - {min, max} integers
|
||||
anchor_text_config - {mode, custom_text}
|
||||
failure_config - {max_consecutive_failures, skip_on_failure}
|
||||
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max}
|
||||
tiers - Required, object with tier1/tier2/tier3
|
||||
```
|
||||
|
||||
### Tier Level (per tier configuration)
|
||||
```
|
||||
count - Required, integer (number of articles)
|
||||
min_word_count - Integer
|
||||
max_word_count - Integer
|
||||
min_h2_tags - Integer
|
||||
max_h2_tags - Integer
|
||||
min_h3_tags - Integer
|
||||
max_h3_tags - Integer
|
||||
models - {title, outline, content} - overrides job-level
|
||||
interlinking - {links_per_article_min, links_per_article_max, see_also_min, see_also_max} - overrides job-level
|
||||
```
|
||||
|
||||
## Field Behaviors
|
||||
|
||||
**deployment_targets**: Sites to deploy to (round-robin distribution)
|
||||
|
||||
**tier1_preferred_sites**: If set, tier1 only uses these sites
|
||||
|
||||
**models**: Use format "provider/model-name" (e.g., "openai/gpt-4o-mini")
|
||||
|
||||
**anchor_text_config**: Job-level only, applies to ALL tiers (no tier-specific option)
|
||||
- "default" = Use master.config.json tier rules
|
||||
- "override" = Replace with custom_text for all tiers
|
||||
- "append" = Add custom_text to tier rules for all tiers
|
||||
|
||||
**tiered_link_count_range**: How many links to lower tier
|
||||
- Tier1: Always 1 link to money site (this setting ignored)
|
||||
- Tier2+: Random between min and max links to lower tier
|
||||
|
||||
**interlinking.links_per_article_min/max**: Same as tiered_link_count_range
|
||||
|
||||
**interlinking.see_also_min/max**: How many See Also links (default 4-5)
|
||||
- Randomly selects this many articles from same tier for See Also section
|
||||
|
||||
## Defaults
|
||||
|
||||
If not specified, these defaults apply:
|
||||
|
||||
### Tier1 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 2000,
|
||||
"max_word_count": 2500,
|
||||
"min_h2_tags": 3,
|
||||
"max_h2_tags": 5,
|
||||
"min_h3_tags": 5,
|
||||
"max_h3_tags": 10
|
||||
}
|
||||
```
|
||||
|
||||
### Tier2 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 1100,
|
||||
"max_word_count": 1500,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 4,
|
||||
"min_h3_tags": 3,
|
||||
"max_h3_tags": 8
|
||||
}
|
||||
```
|
||||
|
||||
### Tier3 Defaults
|
||||
```json
|
||||
{
|
||||
"min_word_count": 850,
|
||||
"max_word_count": 1350,
|
||||
"min_h2_tags": 2,
|
||||
"max_h2_tags": 3,
|
||||
"min_h3_tags": 2,
|
||||
"max_h3_tags": 6
|
||||
}
|
||||
```
|
||||
|
||||
## Minimal Working Example
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["example.com"],
|
||||
"tiers": {
|
||||
"tier1": {"count": 5},
|
||||
"tier2": {"count": 20}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Your Current Example
|
||||
|
||||
```json
|
||||
{
|
||||
"jobs": [{
|
||||
"project_id": 1,
|
||||
"deployment_targets": ["getcnc.info", "www.textbullseye.com"],
|
||||
"tiers": {
|
||||
"tier1": {
|
||||
"count": 5,
|
||||
"min_word_count": 1500,
|
||||
"max_word_count": 2000,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "anthropic/claude-3.5-sonnet"
|
||||
}
|
||||
},
|
||||
"tier2": {
|
||||
"count": 20,
|
||||
"models": {
|
||||
"title": "openai/gpt-4o-mini",
|
||||
"outline": "openai/gpt-4o-mini",
|
||||
"content": "openai/gpt-4o-mini"
|
||||
},
|
||||
"interlinking": {
|
||||
"links_per_article_min": 2,
|
||||
"links_per_article_max": 4
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Result Behavior
|
||||
|
||||
**Tier 1 Articles (5):**
|
||||
- 1 link to money site
|
||||
- 4 See Also links to other tier1 articles
|
||||
- Home link in nav menu
|
||||
|
||||
**Tier 2 Articles (20):**
|
||||
- 2-4 links to random tier1 articles
|
||||
- 19 See Also links to other tier2 articles
|
||||
- Home link in nav menu
|
||||
|
||||
**Anchor Text:**
|
||||
- Tier1: Uses main_keyword from project
|
||||
- Tier2: Uses related_searches from project
|
||||
- Can override with anchor_text_config
|
||||
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
import sqlite3
|
||||
|
||||
conn = sqlite3.connect('content_automation.db')
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Get first tier2 article
|
||||
cursor.execute('SELECT id FROM generated_content WHERE project_id=1 AND tier="tier2" LIMIT 1')
|
||||
tier2_id = cursor.fetchone()[0]
|
||||
|
||||
# Count links by type
|
||||
cursor.execute('''
|
||||
SELECT link_type, COUNT(*)
|
||||
FROM article_links
|
||||
WHERE from_content_id=?
|
||||
GROUP BY link_type
|
||||
''', (tier2_id,))
|
||||
|
||||
print(f'Tier2 article {tier2_id} link counts:')
|
||||
for row in cursor.fetchall():
|
||||
print(f' {row[0]}: {row[1]}')
|
||||
|
||||
# Show the actual tiered links
|
||||
cursor.execute('''
|
||||
SELECT to_url, anchor_text
|
||||
FROM article_links
|
||||
WHERE from_content_id=? AND link_type="tiered"
|
||||
''', (tier2_id,))
|
||||
|
||||
print(f'\nTiered links for article {tier2_id}:')
|
||||
for i, row in enumerate(cursor.fetchall(), 1):
|
||||
print(f' {i}. {row[1]} -> {row[0][:60]}...')
|
||||
|
||||
conn.close()
|
||||
|
||||
|
|
@ -365,6 +365,17 @@ class BatchProcessor:
|
|||
click.echo(f" {tier_name}: No articles with site assignments to post-process")
|
||||
return
|
||||
|
||||
# Skip articles already post-processed (idempotency check)
|
||||
unprocessed = [a for a in content_records if not a.formatted_html]
|
||||
|
||||
if not unprocessed:
|
||||
click.echo(f" {tier_name}: All {len(content_records)} articles already post-processed, skipping")
|
||||
return
|
||||
|
||||
if len(unprocessed) < len(content_records):
|
||||
click.echo(f" {tier_name}: Skipping {len(content_records) - len(unprocessed)} already processed articles")
|
||||
|
||||
content_records = unprocessed
|
||||
click.echo(f" {tier_name}: Post-processing {len(content_records)} articles...")
|
||||
|
||||
# Step 1: Generate URLs (Story 3.1)
|
||||
|
|
|
|||
|
|
@ -63,6 +63,8 @@ class InterlinkingConfig:
|
|||
links_per_article_min: int = 2
|
||||
links_per_article_max: int = 4
|
||||
include_home_link: bool = True
|
||||
see_also_min: int = 4
|
||||
see_also_max: int = 5
|
||||
|
||||
|
||||
@dataclass
|
||||
|
|
@ -265,16 +267,24 @@ class JobConfig:
|
|||
min_links = interlinking_data.get("links_per_article_min", 2)
|
||||
max_links = interlinking_data.get("links_per_article_max", 4)
|
||||
include_home = interlinking_data.get("include_home_link", True)
|
||||
see_also_min = interlinking_data.get("see_also_min", 4)
|
||||
see_also_max = interlinking_data.get("see_also_max", 5)
|
||||
if not isinstance(min_links, int) or min_links < 0:
|
||||
raise ValueError("'interlinking' links_per_article_min must be a non-negative integer")
|
||||
if not isinstance(max_links, int) or max_links < min_links:
|
||||
raise ValueError("'interlinking' links_per_article_max must be >= links_per_article_min")
|
||||
if not isinstance(include_home, bool):
|
||||
raise ValueError("'interlinking' include_home_link must be a boolean")
|
||||
if not isinstance(see_also_min, int) or see_also_min < 0:
|
||||
raise ValueError("'interlinking' see_also_min must be a non-negative integer")
|
||||
if not isinstance(see_also_max, int) or see_also_max < see_also_min:
|
||||
raise ValueError("'interlinking' see_also_max must be >= see_also_min")
|
||||
interlinking = InterlinkingConfig(
|
||||
links_per_article_min=min_links,
|
||||
links_per_article_max=max_links,
|
||||
include_home_link=include_home
|
||||
include_home_link=include_home,
|
||||
see_also_min=see_also_min,
|
||||
see_also_max=see_also_max
|
||||
)
|
||||
|
||||
return Job(
|
||||
|
|
|
|||
|
|
@ -64,14 +64,11 @@ def inject_interlinks(
|
|||
html, content, tiered_links, project, job_config, link_repo
|
||||
)
|
||||
|
||||
# Inject homepage link
|
||||
html = _inject_homepage_link(
|
||||
html, content, article_url, project, link_repo
|
||||
)
|
||||
# Note: Home link is now in the navigation menu (template), no need to inject into content
|
||||
|
||||
# Inject See Also section
|
||||
html = _inject_see_also_section(
|
||||
html, content, article_urls, link_repo
|
||||
html, content, article_urls, link_repo, job_config
|
||||
)
|
||||
|
||||
# Update content in database
|
||||
|
|
@ -199,9 +196,10 @@ def _inject_see_also_section(
|
|||
html: str,
|
||||
content: GeneratedContent,
|
||||
article_urls: List[Dict],
|
||||
link_repo: ArticleLinkRepository
|
||||
link_repo: ArticleLinkRepository,
|
||||
job_config=None
|
||||
) -> str:
|
||||
"""Inject See Also section with all other batch articles"""
|
||||
"""Inject See Also section with random selection of batch articles"""
|
||||
# Get all other articles (excluding current)
|
||||
other_articles = [a for a in article_urls if a['content_id'] != content.id]
|
||||
|
||||
|
|
@ -209,9 +207,18 @@ def _inject_see_also_section(
|
|||
logger.info(f"No other articles for See Also section in content {content.id}")
|
||||
return html
|
||||
|
||||
# Get See Also link count (default 4-5)
|
||||
see_also_config = _get_see_also_config(job_config)
|
||||
min_links = see_also_config['min']
|
||||
max_links = see_also_config['max']
|
||||
|
||||
# Select random articles
|
||||
count = min(random.randint(min_links, max_links), len(other_articles))
|
||||
selected_articles = random.sample(other_articles, count)
|
||||
|
||||
# Build See Also HTML
|
||||
see_also_html = "<h3>See Also</h3>\n<ul>\n"
|
||||
for article in other_articles:
|
||||
for article in selected_articles:
|
||||
see_also_html += f' <li><a href="{article["url"]}">{article["title"]}</a></li>\n'
|
||||
see_also_html += "</ul>\n"
|
||||
|
||||
|
|
@ -219,7 +226,7 @@ def _inject_see_also_section(
|
|||
html = _insert_before_closing_tags(html, see_also_html)
|
||||
|
||||
# Record links
|
||||
for article in other_articles:
|
||||
for article in selected_articles:
|
||||
link_repo.create(
|
||||
from_content_id=content.id,
|
||||
to_content_id=article['content_id'],
|
||||
|
|
@ -228,10 +235,41 @@ def _inject_see_also_section(
|
|||
link_type="wheel_see_also"
|
||||
)
|
||||
|
||||
logger.info(f"Injected See Also section with {len(other_articles)} links for content {content.id}")
|
||||
logger.info(f"Injected See Also section with {len(selected_articles)} links for content {content.id}")
|
||||
return html
|
||||
|
||||
|
||||
def _get_see_also_config(job_config) -> Dict[str, int]:
|
||||
"""Get See Also link count config, default 4-5"""
|
||||
default_config = {"min": 4, "max": 5}
|
||||
|
||||
if job_config is None:
|
||||
return default_config
|
||||
|
||||
# Check for see_also_min/max in interlinking config
|
||||
interlinking = None
|
||||
if hasattr(job_config, 'interlinking'):
|
||||
interlinking = job_config.interlinking
|
||||
elif isinstance(job_config, dict):
|
||||
interlinking = job_config.get('interlinking')
|
||||
|
||||
if not interlinking:
|
||||
return default_config
|
||||
|
||||
# Get min/max from interlinking config
|
||||
if isinstance(interlinking, dict):
|
||||
min_val = interlinking.get('see_also_min')
|
||||
max_val = interlinking.get('see_also_max')
|
||||
else:
|
||||
min_val = getattr(interlinking, 'see_also_min', None)
|
||||
max_val = getattr(interlinking, 'see_also_max', None)
|
||||
|
||||
if min_val is not None and max_val is not None:
|
||||
return {"min": min_val, "max": max_val}
|
||||
|
||||
return default_config
|
||||
|
||||
|
||||
def _get_anchor_texts_for_tier(
|
||||
tier: str,
|
||||
project: Project,
|
||||
|
|
|
|||
|
|
@ -0,0 +1,34 @@
|
|||
from src.database.session import db_manager
|
||||
from src.database.repositories import GeneratedContentRepository, ProjectRepository, SiteDeploymentRepository
|
||||
from src.interlinking.tiered_links import find_tiered_links
|
||||
from src.generation.job_config import JobConfig
|
||||
|
||||
session = db_manager.get_session()
|
||||
content_repo = GeneratedContentRepository(session)
|
||||
project_repo = ProjectRepository(session)
|
||||
site_repo = SiteDeploymentRepository(session)
|
||||
|
||||
# Get tier2 articles
|
||||
tier2_articles = content_repo.get_by_project_and_tier(1, "tier2")
|
||||
print(f"Found {len(tier2_articles)} tier2 articles")
|
||||
|
||||
# Load job config
|
||||
job_config = JobConfig("jobs/test_shaft_machining.json")
|
||||
job = job_config.get_jobs()[0]
|
||||
|
||||
print(f"\nJob config:")
|
||||
print(f" tiered_link_count_range: {job.tiered_link_count_range}")
|
||||
print(f" interlinking: {job.interlinking}")
|
||||
|
||||
# Test the function
|
||||
print(f"\nCalling find_tiered_links()...")
|
||||
result = find_tiered_links(tier2_articles, job, project_repo, content_repo, site_repo)
|
||||
|
||||
print(f"\nResult:")
|
||||
print(f" tier: {result.get('tier')}")
|
||||
print(f" lower_tier: {result.get('lower_tier')}")
|
||||
print(f" Number of URLs selected: {len(result.get('lower_tier_urls', []))}")
|
||||
print(f" URLs: {result.get('lower_tier_urls', [])}")
|
||||
|
||||
session.close()
|
||||
|
||||
Loading…
Reference in New Issue