added branded + anchor text with -bp flag

main
PeninsulaInd 2026-01-21 16:23:29 -06:00
parent 3210dc5739
commit 4d3a78d255
2 changed files with 242 additions and 17 deletions

View File

@ -0,0 +1,116 @@
# Branded+ Anchor Text Implementation Plan
## Overview
Enhance the `ingest-cora` command to support "branded+" anchor text generation, which combines brand names with related searches. Add a brand mapping system to store company URLs and their associated brand names, and update the anchor text calculation logic to handle branded, branded+, and regular terms sequentially.
## Components
### 1. Brand Mapping Storage
- **File**: `brands.json` (root directory)
- **Format**: JSON mapping normalized domains to brand name arrays
```json
{
"gullco.com": ["Gullco", "Gullco International"]
}
```
- **Location**: Project root for easy editing
- **Normalization**: Store only normalized domains (no www., no scheme)
### 2. Brand Lookup Helper (Inline)
- **File**: `src/cli/commands.py` (add helper function)
- **Function**: `_get_brands_for_url(url: str) -> List[str]`
- Extract domain from URL (remove scheme, www., trailing slash)
- Load brands.json from project root
- Lookup normalized domain
- Return brand names list or empty list if not found/file missing
### 3. Branded+ Anchor Text Generation
- **File**: `src/cli/commands.py` (modify `create_job_file_for_project`)
- **Patterns**: Generate two variations per related search:
- `"{brand} {term}"` (e.g., "Gullco welder")
- `"{term} by {brand}"` (e.g., "welder by Gullco")
- **Logic**: For each brand name and each related search, generate both patterns
### 4. CLI Command Updates
- **File**: `src/cli/commands.py` (modify `ingest_cora`)
- **New flag**: `--tier1-branded-plus-ratio` (float, optional)
- Only prompts for branded+ if this flag is provided
- Prompts for percentage (0.0-1.0) of remaining slots after branded
- **Brand text prompt update**:
- Show default brands from brand mapping if URL found
- Allow Enter to accept defaults
- Format: "Enter branded anchor text (company name) for tier1 [default: 'Gullco, Gullco International'] (press Enter for default):"
### 5. Anchor Text Calculation Logic
- **File**: `src/cli/commands.py` (modify `create_job_file_for_project`)
- **Calculation order**:
1. Get available terms (custom_anchor_text or related_searches)
2. Calculate branded count: `total * tier1_branded_ratio`
3. Calculate remaining: `total - branded_count`
4. Calculate branded+ count: `remaining * branded_plus_ratio` (if enabled)
5. Calculate regular count: `remaining - branded_plus_count`
- **Generation**:
- Branded terms: Use provided brand names (cycled)
- Branded+ terms: Generate from brands + related_searches (both patterns)
- Regular terms: Use remaining related_searches/keyword variations
### 6. Function Signature Updates
- **File**: `src/cli/commands.py`
- **`create_job_file_for_project`**:
- Add `tier1_branded_plus_ratio: Optional[float] = None`
- Add `brand_names: Optional[List[str]] = None` (for branded+ generation)
- **`ingest_cora`**:
- Add `tier1_branded_plus_ratio: Optional[float] = None` parameter
- Pass brand names to `create_job_file_for_project`
## Implementation Details
### Brand Lookup Flow
1. Normalize `money_site_url`: remove scheme (http://, https://), remove www. prefix, remove trailing slash
2. Look up normalized domain in brands.json
3. Return list of brand names or empty list if not found
### Branded+ Generation Example
- Brands: ["Gullco", "Gullco International"]
- Related searches: ["welder", "automatic welder"]
- Generated terms:
- "Gullco welder"
- "welder by Gullco"
- "Gullco automatic welder"
- "automatic welder by Gullco"
- "Gullco International welder"
- "welder by Gullco International"
- "Gullco International automatic welder"
- "automatic welder by Gullco International"
### Anchor Text Distribution Example
- Total available terms: 10
- `tier1_branded_ratio`: 0.4 → 4 branded terms
- Remaining: 6
- `tier1_branded_plus_ratio`: 0.67 → 4 branded+ terms
- Regular: 2 terms
- Final list: [4 branded, 4 branded+, 2 regular]
## Files to Modify
1. `src/cli/commands.py` - Add branded+ logic, brand lookup helper, update prompts, calculation
2. `brands.json` - New file for brand mappings (create with example entry)
## Testing Considerations
- Test with brand mapping present and absent
- Test with Enter (default) and custom brand input
- Test branded+ calculation with various ratios
- Test URL normalization (with/without www., http/https)
- Test with multiple brand names per URL
- Test with no related searches (fallback behavior)

View File

@ -3,7 +3,7 @@ CLI command definitions using Click
"""
import random
import click
from typing import Optional
from typing import Optional, List
from src.core.config import get_config, get_bunny_account_api_key, get_concurrent_workers
from src.auth.service import AuthService
from src.database.session import db_manager
@ -37,12 +37,49 @@ from datetime import datetime
load_dotenv()
def _get_brands_for_url(url: str) -> List[str]:
"""
Look up brand names for a given URL from brands.json
Args:
url: Money site URL (e.g., "https://www.gullco.com")
Returns:
List of brand names, or empty list if not found or file missing
"""
try:
from urllib.parse import urlparse
# Normalize URL: remove scheme, www., trailing slash
parsed = urlparse(url)
domain = parsed.netloc
# Remove www. prefix if present
if domain.startswith('www.'):
domain = domain[4:]
# Load brands.json from project root
brands_file = Path("brands.json")
if not brands_file.exists():
return []
with open(brands_file, 'r', encoding='utf-8') as f:
brands_data = json.load(f)
# Look up normalized domain
return brands_data.get(domain, [])
except Exception:
return []
def create_job_file_for_project(
project_id: int,
project_name: str,
session,
tier1_branded_ratio: Optional[float] = None,
tier1_branded_text: Optional[str] = None,
tier1_branded_plus_ratio: Optional[float] = None,
brand_names: Optional[List[str]] = None,
random_deployment_targets: Optional[int] = None
) -> Optional[str]:
"""
@ -54,6 +91,8 @@ def create_job_file_for_project(
session: Database session
tier1_branded_ratio: Optional ratio of branded anchor text for tier1 (0.0-1.0)
tier1_branded_text: Optional branded anchor text (company name) for tier1
tier1_branded_plus_ratio: Optional ratio of branded+ anchor text for tier1 (0.0-1.0, applied to remaining slots after branded)
brand_names: Optional list of brand names for branded+ generation
random_deployment_targets: Optional number of random deployment targets to select (default: random 2-3)
Returns:
@ -107,8 +146,8 @@ def create_job_file_for_project(
}
}
# Add anchor_text_config if branded ratio and text are provided
if tier1_branded_ratio is not None and tier1_branded_text:
# Add anchor_text_config if branded ratio/text or branded+ ratio is provided
if (tier1_branded_ratio is not None and tier1_branded_text) or (tier1_branded_plus_ratio is not None and brand_names):
# Get project to retrieve main_keyword for non-branded terms
project_repo = ProjectRepository(session)
project = project_repo.get_by_id(project_id)
@ -128,24 +167,58 @@ def create_job_file_for_project(
# Use the ACTUAL count of available terms
actual_count = len(keyword_variations)
# Calculate branded and keyword counts based on actual available terms
branded_count = int(actual_count * tier1_branded_ratio)
keyword_count = actual_count - branded_count
# Calculate branded and remaining counts based on actual available terms
branded_count = 0
if tier1_branded_ratio is not None and tier1_branded_text:
branded_count = int(actual_count * tier1_branded_ratio)
remaining_count = actual_count - branded_count
# Parse comma-separated branded anchor texts
branded_texts = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
branded_texts = []
if tier1_branded_text:
branded_texts = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
# Create anchor text list with branded terms (cycling through multiple if provided) and custom anchor text from CORA
# Create anchor text list starting with branded terms
anchor_terms = []
for i in range(branded_count):
branded_text = branded_texts[i % len(branded_texts)] # Cycle through branded texts
anchor_terms.append(branded_text)
# Randomize keyword selection if we're not using all available terms
if keyword_count < actual_count:
selected_keywords = random.sample(keyword_variations, keyword_count)
else:
selected_keywords = keyword_variations
anchor_terms.extend(selected_keywords)
# Generate branded+ terms if enabled
branded_plus_count = 0
if tier1_branded_plus_ratio is not None and brand_names and len(brand_names) > 0:
branded_plus_count = int(remaining_count * tier1_branded_plus_ratio)
# Generate branded+ terms from brands + related_searches
# Use related_searches from project, or fallback to keyword_variations
related_searches = project.related_searches if project.related_searches else keyword_variations
branded_plus_terms = []
for brand in brand_names:
for term in related_searches:
branded_plus_terms.append(f"{brand} {term}")
branded_plus_terms.append(f"{term} by {brand}")
# Randomly select the needed number of branded+ terms
if len(branded_plus_terms) > 0:
if branded_plus_count > len(branded_plus_terms):
selected_branded_plus = branded_plus_terms
else:
selected_branded_plus = random.sample(branded_plus_terms, branded_plus_count)
anchor_terms.extend(selected_branded_plus)
# Calculate regular count from remaining slots
regular_count = remaining_count - branded_plus_count
# Add regular terms
if regular_count > 0:
# Randomize keyword selection if we're not using all available terms
if regular_count < len(keyword_variations):
selected_keywords = random.sample(keyword_variations, regular_count)
else:
selected_keywords = keyword_variations[:regular_count]
anchor_terms.extend(selected_keywords)
tier1_config["anchor_text_config"] = {
"mode": "explicit",
"terms": anchor_terms
@ -1005,10 +1078,11 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run
@click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)')
@click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)')
@click.option('--tier1-branded-ratio', '-t', default=None, type=float, help='Ratio of branded anchor text for tier1 (optional, only prompts if provided)')
@click.option('--tier1-branded-plus-ratio', '-bp', default=None, type=float, help='Ratio of branded+ anchor text for tier1 (optional, applied to remaining slots after branded)')
@click.option('--random-deployment-targets', '-r', type=int, help='Number of random deployment targets to select (default: random 2-3)')
@click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-p', help='Password for authentication')
def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, random_deployment_targets: Optional[int], username: Optional[str], password: Optional[str]):
def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, tier1_branded_plus_ratio: Optional[float], random_deployment_targets: Optional[int], username: Optional[str], password: Optional[str]):
"""Ingest a CORA .xlsx report and create a new project"""
try:
if not username or not password:
@ -1079,15 +1153,48 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
# Handle tier1 branded anchor text if ratio is specified
tier1_branded_text = None
brand_names = None
if tier1_branded_ratio is not None and tier1_branded_ratio > 0:
# Look up default brands from brand mapping
default_brands = _get_brands_for_url(money_site_url)
default_prompt = ""
if default_brands:
default_prompt = f" [default: '{', '.join(default_brands)}'] (press Enter for default)"
tier1_branded_text = click.prompt(
"\nEnter branded anchor text (company name) for tier1 (comma-separated for multiple, e.g., 'AGI Fabricators, AGI')",
type=str
f"\nEnter branded anchor text (company name) for tier1 (comma-separated for multiple, e.g., 'AGI Fabricators, AGI'){default_prompt}",
type=str,
default=""
).strip()
# Use defaults if Enter was pressed and defaults exist
if not tier1_branded_text and default_brands:
tier1_branded_text = ", ".join(default_brands)
click.echo(f"Using default brands: {tier1_branded_text}")
if not tier1_branded_text:
click.echo("Warning: Empty branded anchor text provided, skipping tier1 branded anchor text configuration.", err=True)
tier1_branded_text = None
tier1_branded_ratio = None
else:
# Parse brand names for branded+ generation
brand_names = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
# Handle branded+ ratio if flag is provided
if tier1_branded_plus_ratio is not None:
# Validate the provided ratio
if tier1_branded_plus_ratio <= 0 or tier1_branded_plus_ratio > 1:
click.echo("Warning: Invalid branded+ ratio provided, skipping branded+ configuration.", err=True)
tier1_branded_plus_ratio = None
elif not brand_names:
# If brand names weren't set from branded prompt, try to get them from brand lookup
default_brands = _get_brands_for_url(money_site_url)
if default_brands:
brand_names = default_brands
click.echo(f"Using brand names from mapping for branded+: {', '.join(brand_names)}")
else:
click.echo("Warning: No brand names available for branded+ (set --tier1-branded-ratio or add to brands.json). Skipping branded+ configuration.", err=True)
tier1_branded_plus_ratio = None
job_file = create_job_file_for_project(
project.id,
@ -1095,6 +1202,8 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
session,
tier1_branded_ratio=tier1_branded_ratio,
tier1_branded_text=tier1_branded_text,
tier1_branded_plus_ratio=tier1_branded_plus_ratio,
brand_names=brand_names,
random_deployment_targets=random_deployment_targets
)
if job_file: