Big-Link-Man/DEPLOY_BATCH_ANALYSIS.md

248 lines
8.3 KiB
Markdown

# Deploy-Batch Analysis for test_shaft_machining.json
## Quick Answers to Your Questions
### 1. What should the anchor text be at each level?
**Tier 1 Articles (5 articles):**
- **Money Site Links:** Uses `main_keyword` variations from project
- "shaft machining"
- "learn about shaft machining"
- "shaft machining guide"
- "best shaft machining"
- "shaft machining tips"
- System tries to find these phrases in content; picks first one that matches
- **Home Link:** Now in navigation menu (not injected into content)
- **See Also Links:** Uses article titles as anchor text
**Tier 2 Articles (20 articles):**
- **Lower Tier Links:** Uses `related_searches` from CORA data
- Depends on what related searches were in the shaft_machining.xlsx file
- If no related searches exist, falls back to main_keyword variations
- **Home Link:** Now in navigation menu (not injected into content)
- **See Also Links:** Uses article titles as anchor text
**Configuration:**
- Anchor text rules come from `master.config.json``interlinking.tier_anchor_text_rules`
- Can be overridden in job config with `anchor_text_config`
### 2. How many links should be in each article?
**Tier 1 Articles:**
- 1 link to money site (https://fzemanufacturing.com/capabilities/shaft-machining-services)
- 4 "See Also" links (to the other 4 tier1 articles)
- **Total: 5 links per tier1 article** (plus Home in nav menu)
**Tier 2 Articles:**
- 2-4 links to tier1 articles (random selection, count is `interlinking.links_per_article_min` to `max`)
- 19 "See Also" links (to the other 19 tier2 articles)
- **Total: 21-23 links per tier2 article** (plus Home in nav menu)
**Your JSON Configuration:**
```json
"interlinking": {
"links_per_article_min": 2,
"links_per_article_max": 4
}
```
This controls the tiered links (tier2 → tier1). Each tier2 article will get between 2-4 random tier1 articles to link to.
### 3. Should "Home" be a link?
**YES** - Home is a link in the navigation menu at the top of every page.
**How it works:**
- The HTML template (`basic.html`) includes a `<nav>` menu with Home link
- Template line 113: `<li><a href="/index.html">Home</a></li>`
- This is part of the template wrapper, not injected into article content
**Old behavior (now removed):**
- Previously, system searched article content for "Home" and tried to link it
- This was redundant since Home is already in the nav menu
- Code has been updated to remove this injection
## Step-by-Step: What Happens During deploy-batch
### Step 1: Load Articles from Database
```
- Project 1 has generated content already
- Tier 1: 5 articles
- Tier 2: 20 articles
- Each article has: title, content (HTML), site_deployment_id
```
### Step 2: URL Generation (already done during generate-batch)
```
Tier 1 URLs (round-robin between getcnc.info and textbullseye.com):
- Article 0: https://getcnc.info/{slug}.html
- Article 1: https://www.textbullseye.com/{slug}.html
- Article 2: https://getcnc.info/{slug}.html
- Article 3: https://www.textbullseye.com/{slug}.html
- Article 4: https://getcnc.info/{slug}.html
Tier 2 URLs (round-robin):
- Articles 0-19 distributed across both domains
```
### Step 3: Tiered Links (already injected during generate-batch)
**For Tier 1:**
- Target: Money site URL from project database
- Anchor text: main_keyword variations
- Links already in `generated_content.content` HTML
**For Tier 2:**
- Target: Random selection of tier1 URLs (2-4 per article)
- Anchor text: related_searches from project
- Links already in HTML
### Step 4: Homepage Links
- Home link is in the navigation menu (template)
- No longer injected into article content
### Step 5: See Also Section (already injected)
- HTML section with links to other articles in same tier
### Step 6: Template Application (already done)
- HTML wrapped in template from `src/templating/templates/basic.html`
- Navigation menu added
- Stored in `generated_content.formatted_html`
### Step 7: Upload to Bunny.net
```
For each article:
1. Get site deployment credentials
2. Upload formatted_html to storage zone
3. File path: /{slug}.html
4. Log URL to deployment_logs/
5. Update database: deployed_url, status='deployed'
For each site's boilerplate pages:
1. Upload index.html (if exists)
2. Upload about.html
3. Upload contact.html
4. Upload privacy.html
```
## Database Link Tracking
All links are tracked in `article_links` table:
**Tier 1 Article Example (ID: 43):**
```
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|-----------------|---------------|--------|-------------|-----------|
| 43 | NULL | https://fzemanufacturing.com/... | "shaft machining" | tiered |
| 43 | 44 | NULL | "Understanding CNC..." | wheel_see_also |
| 43 | 45 | NULL | "Advanced Shaft..." | wheel_see_also |
| 43 | 46 | NULL | "Precision Machining..." | wheel_see_also |
| 43 | 47 | NULL | "Modern Shaft..." | wheel_see_also |
```
**Tier 2 Article Example (ID: 48):**
```
| from_content_id | to_content_id | to_url | anchor_text | link_type |
|-----------------|---------------|--------|-------------|-----------|
| 48 | NULL | https://getcnc.info/{slug1}.html | "cnc machining services" | tiered |
| 48 | NULL | https://www.textbullseye.com/{slug2}.html | "precision shaft work" | tiered |
| 48 | NULL | https://getcnc.info/{slug3}.html | "shaft turning operations" | tiered |
| 48 | 49 | NULL | "Tier 2 Article 2 Title" | wheel_see_also |
| ... | ... | ... | ... | ... |
| 48 | 67 | NULL | "Tier 2 Article 20 Title" | wheel_see_also |
```
**Note:** Home link is no longer tracked in the database since it's in the template, not injected into content.
## Your Specific JSON File Analysis
```json
{
"jobs": [
{
"project_id": 1,
"deployment_targets": [
"getcnc.info",
"www.textbullseye.com"
],
"tiers": {
"tier1": {
"count": 5,
"min_word_count": 1500,
"max_word_count": 2000,
"models": {
"title": "openai/gpt-4o-mini",
"outline": "openai/gpt-4o-mini",
"content": "anthropic/claude-3.5-sonnet"
}
},
"tier2": {
"count": 20,
"models": {
"title": "openai/gpt-4o-mini",
"outline": "openai/gpt-4o-mini",
"content": "openai/gpt-4o-mini"
},
"interlinking": {
"links_per_article_min": 2,
"links_per_article_max": 4
}
}
}
}
]
}
```
**What This Configuration Does:**
1. **Tier 1 (5 articles):**
- Uses Claude Sonnet for content, GPT-4o-mini for titles/outlines
- 1500-2000 words per article
- Distributed across getcnc.info and textbullseye.com
- Each links to: money site (1) + See Also (4) = 5 total links (plus Home in nav menu)
2. **Tier 2 (20 articles):**
- Uses GPT-4o-mini for everything (cheaper)
- Default word count (1100-1500)
- Each links to: 2-4 tier1 articles + See Also (19) = 21-23 total links (plus Home in nav menu)
- Distributed across both domains
3. **Missing Configurations (using defaults):**
- `tier1.interlinking`: Not specified → uses defaults (but tier1 always gets 1 money site link anyway)
- `anchor_text_config`: Not specified → uses master.config.json rules
## All JSON Fields That Affect Behavior
See `MASTER_JSON.json` for the complete reference. Key fields:
**Top-level job fields:**
- `project_id` - Which project's data to use
- `deployment_targets` - Which domains to deploy to
- `models` - Which AI models to use
- `tiered_link_count_range` - How many tiered links (job-level default)
- `anchor_text_config` - Override anchor text generation
- `interlinking` - Job-level interlinking defaults
**Tier-level fields:**
- `count` - Number of articles
- `min_word_count`, `max_word_count` - Content length
- `min_h2_tags`, `max_h2_tags`, `min_h3_tags`, `max_h3_tags` - Outline structure
- `models` - Tier-specific model overrides
- `interlinking` - Tier-specific interlinking overrides
**Fields in master.config.json:**
- `interlinking.tier_anchor_text_rules` - Defines anchor text sources per tier
- `interlinking.include_home_link` - Global default for Home links
- `interlinking.wheel_links` - Enable/disable See Also sections
**Fields in project database:**
- `main_keyword` - Used for tier1 anchor text
- `related_searches` - Used for tier2 anchor text
- `entities` - Used for tier3+ anchor text
- `money_site_url` - Destination for tier1 links