added branded + anchor text with -bp flag
parent
3210dc5739
commit
4d3a78d255
|
|
@ -0,0 +1,116 @@
|
||||||
|
# Branded+ Anchor Text Implementation Plan
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Enhance the `ingest-cora` command to support "branded+" anchor text generation, which combines brand names with related searches. Add a brand mapping system to store company URLs and their associated brand names, and update the anchor text calculation logic to handle branded, branded+, and regular terms sequentially.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### 1. Brand Mapping Storage
|
||||||
|
|
||||||
|
- **File**: `brands.json` (root directory)
|
||||||
|
- **Format**: JSON mapping normalized domains to brand name arrays
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"gullco.com": ["Gullco", "Gullco International"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- **Location**: Project root for easy editing
|
||||||
|
- **Normalization**: Store only normalized domains (no www., no scheme)
|
||||||
|
|
||||||
|
### 2. Brand Lookup Helper (Inline)
|
||||||
|
|
||||||
|
- **File**: `src/cli/commands.py` (add helper function)
|
||||||
|
- **Function**: `_get_brands_for_url(url: str) -> List[str]`
|
||||||
|
- Extract domain from URL (remove scheme, www., trailing slash)
|
||||||
|
- Load brands.json from project root
|
||||||
|
- Lookup normalized domain
|
||||||
|
- Return brand names list or empty list if not found/file missing
|
||||||
|
|
||||||
|
### 3. Branded+ Anchor Text Generation
|
||||||
|
|
||||||
|
- **File**: `src/cli/commands.py` (modify `create_job_file_for_project`)
|
||||||
|
- **Patterns**: Generate two variations per related search:
|
||||||
|
- `"{brand} {term}"` (e.g., "Gullco welder")
|
||||||
|
- `"{term} by {brand}"` (e.g., "welder by Gullco")
|
||||||
|
- **Logic**: For each brand name and each related search, generate both patterns
|
||||||
|
|
||||||
|
### 4. CLI Command Updates
|
||||||
|
|
||||||
|
- **File**: `src/cli/commands.py` (modify `ingest_cora`)
|
||||||
|
- **New flag**: `--tier1-branded-plus-ratio` (float, optional)
|
||||||
|
- Only prompts for branded+ if this flag is provided
|
||||||
|
- Prompts for percentage (0.0-1.0) of remaining slots after branded
|
||||||
|
- **Brand text prompt update**:
|
||||||
|
- Show default brands from brand mapping if URL found
|
||||||
|
- Allow Enter to accept defaults
|
||||||
|
- Format: "Enter branded anchor text (company name) for tier1 [default: 'Gullco, Gullco International'] (press Enter for default):"
|
||||||
|
|
||||||
|
### 5. Anchor Text Calculation Logic
|
||||||
|
|
||||||
|
- **File**: `src/cli/commands.py` (modify `create_job_file_for_project`)
|
||||||
|
- **Calculation order**:
|
||||||
|
1. Get available terms (custom_anchor_text or related_searches)
|
||||||
|
2. Calculate branded count: `total * tier1_branded_ratio`
|
||||||
|
3. Calculate remaining: `total - branded_count`
|
||||||
|
4. Calculate branded+ count: `remaining * branded_plus_ratio` (if enabled)
|
||||||
|
5. Calculate regular count: `remaining - branded_plus_count`
|
||||||
|
- **Generation**:
|
||||||
|
- Branded terms: Use provided brand names (cycled)
|
||||||
|
- Branded+ terms: Generate from brands + related_searches (both patterns)
|
||||||
|
- Regular terms: Use remaining related_searches/keyword variations
|
||||||
|
|
||||||
|
### 6. Function Signature Updates
|
||||||
|
|
||||||
|
- **File**: `src/cli/commands.py`
|
||||||
|
- **`create_job_file_for_project`**:
|
||||||
|
- Add `tier1_branded_plus_ratio: Optional[float] = None`
|
||||||
|
- Add `brand_names: Optional[List[str]] = None` (for branded+ generation)
|
||||||
|
- **`ingest_cora`**:
|
||||||
|
- Add `tier1_branded_plus_ratio: Optional[float] = None` parameter
|
||||||
|
- Pass brand names to `create_job_file_for_project`
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
### Brand Lookup Flow
|
||||||
|
|
||||||
|
1. Normalize `money_site_url`: remove scheme (http://, https://), remove www. prefix, remove trailing slash
|
||||||
|
2. Look up normalized domain in brands.json
|
||||||
|
3. Return list of brand names or empty list if not found
|
||||||
|
|
||||||
|
### Branded+ Generation Example
|
||||||
|
|
||||||
|
- Brands: ["Gullco", "Gullco International"]
|
||||||
|
- Related searches: ["welder", "automatic welder"]
|
||||||
|
- Generated terms:
|
||||||
|
- "Gullco welder"
|
||||||
|
- "welder by Gullco"
|
||||||
|
- "Gullco automatic welder"
|
||||||
|
- "automatic welder by Gullco"
|
||||||
|
- "Gullco International welder"
|
||||||
|
- "welder by Gullco International"
|
||||||
|
- "Gullco International automatic welder"
|
||||||
|
- "automatic welder by Gullco International"
|
||||||
|
|
||||||
|
### Anchor Text Distribution Example
|
||||||
|
|
||||||
|
- Total available terms: 10
|
||||||
|
- `tier1_branded_ratio`: 0.4 → 4 branded terms
|
||||||
|
- Remaining: 6
|
||||||
|
- `tier1_branded_plus_ratio`: 0.67 → 4 branded+ terms
|
||||||
|
- Regular: 2 terms
|
||||||
|
- Final list: [4 branded, 4 branded+, 2 regular]
|
||||||
|
|
||||||
|
## Files to Modify
|
||||||
|
|
||||||
|
1. `src/cli/commands.py` - Add branded+ logic, brand lookup helper, update prompts, calculation
|
||||||
|
2. `brands.json` - New file for brand mappings (create with example entry)
|
||||||
|
|
||||||
|
## Testing Considerations
|
||||||
|
|
||||||
|
- Test with brand mapping present and absent
|
||||||
|
- Test with Enter (default) and custom brand input
|
||||||
|
- Test branded+ calculation with various ratios
|
||||||
|
- Test URL normalization (with/without www., http/https)
|
||||||
|
- Test with multiple brand names per URL
|
||||||
|
- Test with no related searches (fallback behavior)
|
||||||
|
|
@ -3,7 +3,7 @@ CLI command definitions using Click
|
||||||
"""
|
"""
|
||||||
import random
|
import random
|
||||||
import click
|
import click
|
||||||
from typing import Optional
|
from typing import Optional, List
|
||||||
from src.core.config import get_config, get_bunny_account_api_key, get_concurrent_workers
|
from src.core.config import get_config, get_bunny_account_api_key, get_concurrent_workers
|
||||||
from src.auth.service import AuthService
|
from src.auth.service import AuthService
|
||||||
from src.database.session import db_manager
|
from src.database.session import db_manager
|
||||||
|
|
@ -37,12 +37,49 @@ from datetime import datetime
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
|
|
||||||
|
|
||||||
|
def _get_brands_for_url(url: str) -> List[str]:
|
||||||
|
"""
|
||||||
|
Look up brand names for a given URL from brands.json
|
||||||
|
|
||||||
|
Args:
|
||||||
|
url: Money site URL (e.g., "https://www.gullco.com")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of brand names, or empty list if not found or file missing
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
# Normalize URL: remove scheme, www., trailing slash
|
||||||
|
parsed = urlparse(url)
|
||||||
|
domain = parsed.netloc
|
||||||
|
|
||||||
|
# Remove www. prefix if present
|
||||||
|
if domain.startswith('www.'):
|
||||||
|
domain = domain[4:]
|
||||||
|
|
||||||
|
# Load brands.json from project root
|
||||||
|
brands_file = Path("brands.json")
|
||||||
|
if not brands_file.exists():
|
||||||
|
return []
|
||||||
|
|
||||||
|
with open(brands_file, 'r', encoding='utf-8') as f:
|
||||||
|
brands_data = json.load(f)
|
||||||
|
|
||||||
|
# Look up normalized domain
|
||||||
|
return brands_data.get(domain, [])
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
def create_job_file_for_project(
|
def create_job_file_for_project(
|
||||||
project_id: int,
|
project_id: int,
|
||||||
project_name: str,
|
project_name: str,
|
||||||
session,
|
session,
|
||||||
tier1_branded_ratio: Optional[float] = None,
|
tier1_branded_ratio: Optional[float] = None,
|
||||||
tier1_branded_text: Optional[str] = None,
|
tier1_branded_text: Optional[str] = None,
|
||||||
|
tier1_branded_plus_ratio: Optional[float] = None,
|
||||||
|
brand_names: Optional[List[str]] = None,
|
||||||
random_deployment_targets: Optional[int] = None
|
random_deployment_targets: Optional[int] = None
|
||||||
) -> Optional[str]:
|
) -> Optional[str]:
|
||||||
"""
|
"""
|
||||||
|
|
@ -54,6 +91,8 @@ def create_job_file_for_project(
|
||||||
session: Database session
|
session: Database session
|
||||||
tier1_branded_ratio: Optional ratio of branded anchor text for tier1 (0.0-1.0)
|
tier1_branded_ratio: Optional ratio of branded anchor text for tier1 (0.0-1.0)
|
||||||
tier1_branded_text: Optional branded anchor text (company name) for tier1
|
tier1_branded_text: Optional branded anchor text (company name) for tier1
|
||||||
|
tier1_branded_plus_ratio: Optional ratio of branded+ anchor text for tier1 (0.0-1.0, applied to remaining slots after branded)
|
||||||
|
brand_names: Optional list of brand names for branded+ generation
|
||||||
random_deployment_targets: Optional number of random deployment targets to select (default: random 2-3)
|
random_deployment_targets: Optional number of random deployment targets to select (default: random 2-3)
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
|
|
@ -107,8 +146,8 @@ def create_job_file_for_project(
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Add anchor_text_config if branded ratio and text are provided
|
# Add anchor_text_config if branded ratio/text or branded+ ratio is provided
|
||||||
if tier1_branded_ratio is not None and tier1_branded_text:
|
if (tier1_branded_ratio is not None and tier1_branded_text) or (tier1_branded_plus_ratio is not None and brand_names):
|
||||||
# Get project to retrieve main_keyword for non-branded terms
|
# Get project to retrieve main_keyword for non-branded terms
|
||||||
project_repo = ProjectRepository(session)
|
project_repo = ProjectRepository(session)
|
||||||
project = project_repo.get_by_id(project_id)
|
project = project_repo.get_by_id(project_id)
|
||||||
|
|
@ -128,24 +167,58 @@ def create_job_file_for_project(
|
||||||
# Use the ACTUAL count of available terms
|
# Use the ACTUAL count of available terms
|
||||||
actual_count = len(keyword_variations)
|
actual_count = len(keyword_variations)
|
||||||
|
|
||||||
# Calculate branded and keyword counts based on actual available terms
|
# Calculate branded and remaining counts based on actual available terms
|
||||||
|
branded_count = 0
|
||||||
|
if tier1_branded_ratio is not None and tier1_branded_text:
|
||||||
branded_count = int(actual_count * tier1_branded_ratio)
|
branded_count = int(actual_count * tier1_branded_ratio)
|
||||||
keyword_count = actual_count - branded_count
|
remaining_count = actual_count - branded_count
|
||||||
|
|
||||||
# Parse comma-separated branded anchor texts
|
# Parse comma-separated branded anchor texts
|
||||||
|
branded_texts = []
|
||||||
|
if tier1_branded_text:
|
||||||
branded_texts = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
|
branded_texts = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
|
||||||
|
|
||||||
# Create anchor text list with branded terms (cycling through multiple if provided) and custom anchor text from CORA
|
# Create anchor text list starting with branded terms
|
||||||
anchor_terms = []
|
anchor_terms = []
|
||||||
for i in range(branded_count):
|
for i in range(branded_count):
|
||||||
branded_text = branded_texts[i % len(branded_texts)] # Cycle through branded texts
|
branded_text = branded_texts[i % len(branded_texts)] # Cycle through branded texts
|
||||||
anchor_terms.append(branded_text)
|
anchor_terms.append(branded_text)
|
||||||
# Randomize keyword selection if we're not using all available terms
|
|
||||||
if keyword_count < actual_count:
|
# Generate branded+ terms if enabled
|
||||||
selected_keywords = random.sample(keyword_variations, keyword_count)
|
branded_plus_count = 0
|
||||||
|
if tier1_branded_plus_ratio is not None and brand_names and len(brand_names) > 0:
|
||||||
|
branded_plus_count = int(remaining_count * tier1_branded_plus_ratio)
|
||||||
|
|
||||||
|
# Generate branded+ terms from brands + related_searches
|
||||||
|
# Use related_searches from project, or fallback to keyword_variations
|
||||||
|
related_searches = project.related_searches if project.related_searches else keyword_variations
|
||||||
|
|
||||||
|
branded_plus_terms = []
|
||||||
|
for brand in brand_names:
|
||||||
|
for term in related_searches:
|
||||||
|
branded_plus_terms.append(f"{brand} {term}")
|
||||||
|
branded_plus_terms.append(f"{term} by {brand}")
|
||||||
|
|
||||||
|
# Randomly select the needed number of branded+ terms
|
||||||
|
if len(branded_plus_terms) > 0:
|
||||||
|
if branded_plus_count > len(branded_plus_terms):
|
||||||
|
selected_branded_plus = branded_plus_terms
|
||||||
else:
|
else:
|
||||||
selected_keywords = keyword_variations
|
selected_branded_plus = random.sample(branded_plus_terms, branded_plus_count)
|
||||||
|
anchor_terms.extend(selected_branded_plus)
|
||||||
|
|
||||||
|
# Calculate regular count from remaining slots
|
||||||
|
regular_count = remaining_count - branded_plus_count
|
||||||
|
|
||||||
|
# Add regular terms
|
||||||
|
if regular_count > 0:
|
||||||
|
# Randomize keyword selection if we're not using all available terms
|
||||||
|
if regular_count < len(keyword_variations):
|
||||||
|
selected_keywords = random.sample(keyword_variations, regular_count)
|
||||||
|
else:
|
||||||
|
selected_keywords = keyword_variations[:regular_count]
|
||||||
anchor_terms.extend(selected_keywords)
|
anchor_terms.extend(selected_keywords)
|
||||||
|
|
||||||
tier1_config["anchor_text_config"] = {
|
tier1_config["anchor_text_config"] = {
|
||||||
"mode": "explicit",
|
"mode": "explicit",
|
||||||
"terms": anchor_terms
|
"terms": anchor_terms
|
||||||
|
|
@ -1005,10 +1078,11 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run
|
||||||
@click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)')
|
@click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)')
|
||||||
@click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)')
|
@click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)')
|
||||||
@click.option('--tier1-branded-ratio', '-t', default=None, type=float, help='Ratio of branded anchor text for tier1 (optional, only prompts if provided)')
|
@click.option('--tier1-branded-ratio', '-t', default=None, type=float, help='Ratio of branded anchor text for tier1 (optional, only prompts if provided)')
|
||||||
|
@click.option('--tier1-branded-plus-ratio', '-bp', default=None, type=float, help='Ratio of branded+ anchor text for tier1 (optional, applied to remaining slots after branded)')
|
||||||
@click.option('--random-deployment-targets', '-r', type=int, help='Number of random deployment targets to select (default: random 2-3)')
|
@click.option('--random-deployment-targets', '-r', type=int, help='Number of random deployment targets to select (default: random 2-3)')
|
||||||
@click.option('--username', '-u', help='Username for authentication')
|
@click.option('--username', '-u', help='Username for authentication')
|
||||||
@click.option('--password', '-p', help='Password for authentication')
|
@click.option('--password', '-p', help='Password for authentication')
|
||||||
def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, random_deployment_targets: Optional[int], username: Optional[str], password: Optional[str]):
|
def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, tier1_branded_plus_ratio: Optional[float], random_deployment_targets: Optional[int], username: Optional[str], password: Optional[str]):
|
||||||
"""Ingest a CORA .xlsx report and create a new project"""
|
"""Ingest a CORA .xlsx report and create a new project"""
|
||||||
try:
|
try:
|
||||||
if not username or not password:
|
if not username or not password:
|
||||||
|
|
@ -1079,15 +1153,48 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
|
||||||
|
|
||||||
# Handle tier1 branded anchor text if ratio is specified
|
# Handle tier1 branded anchor text if ratio is specified
|
||||||
tier1_branded_text = None
|
tier1_branded_text = None
|
||||||
|
brand_names = None
|
||||||
if tier1_branded_ratio is not None and tier1_branded_ratio > 0:
|
if tier1_branded_ratio is not None and tier1_branded_ratio > 0:
|
||||||
|
# Look up default brands from brand mapping
|
||||||
|
default_brands = _get_brands_for_url(money_site_url)
|
||||||
|
default_prompt = ""
|
||||||
|
if default_brands:
|
||||||
|
default_prompt = f" [default: '{', '.join(default_brands)}'] (press Enter for default)"
|
||||||
|
|
||||||
tier1_branded_text = click.prompt(
|
tier1_branded_text = click.prompt(
|
||||||
"\nEnter branded anchor text (company name) for tier1 (comma-separated for multiple, e.g., 'AGI Fabricators, AGI')",
|
f"\nEnter branded anchor text (company name) for tier1 (comma-separated for multiple, e.g., 'AGI Fabricators, AGI'){default_prompt}",
|
||||||
type=str
|
type=str,
|
||||||
|
default=""
|
||||||
).strip()
|
).strip()
|
||||||
|
|
||||||
|
# Use defaults if Enter was pressed and defaults exist
|
||||||
|
if not tier1_branded_text and default_brands:
|
||||||
|
tier1_branded_text = ", ".join(default_brands)
|
||||||
|
click.echo(f"Using default brands: {tier1_branded_text}")
|
||||||
|
|
||||||
if not tier1_branded_text:
|
if not tier1_branded_text:
|
||||||
click.echo("Warning: Empty branded anchor text provided, skipping tier1 branded anchor text configuration.", err=True)
|
click.echo("Warning: Empty branded anchor text provided, skipping tier1 branded anchor text configuration.", err=True)
|
||||||
tier1_branded_text = None
|
tier1_branded_text = None
|
||||||
tier1_branded_ratio = None
|
tier1_branded_ratio = None
|
||||||
|
else:
|
||||||
|
# Parse brand names for branded+ generation
|
||||||
|
brand_names = [text.strip() for text in tier1_branded_text.split(',') if text.strip()]
|
||||||
|
|
||||||
|
# Handle branded+ ratio if flag is provided
|
||||||
|
if tier1_branded_plus_ratio is not None:
|
||||||
|
# Validate the provided ratio
|
||||||
|
if tier1_branded_plus_ratio <= 0 or tier1_branded_plus_ratio > 1:
|
||||||
|
click.echo("Warning: Invalid branded+ ratio provided, skipping branded+ configuration.", err=True)
|
||||||
|
tier1_branded_plus_ratio = None
|
||||||
|
elif not brand_names:
|
||||||
|
# If brand names weren't set from branded prompt, try to get them from brand lookup
|
||||||
|
default_brands = _get_brands_for_url(money_site_url)
|
||||||
|
if default_brands:
|
||||||
|
brand_names = default_brands
|
||||||
|
click.echo(f"Using brand names from mapping for branded+: {', '.join(brand_names)}")
|
||||||
|
else:
|
||||||
|
click.echo("Warning: No brand names available for branded+ (set --tier1-branded-ratio or add to brands.json). Skipping branded+ configuration.", err=True)
|
||||||
|
tier1_branded_plus_ratio = None
|
||||||
|
|
||||||
job_file = create_job_file_for_project(
|
job_file = create_job_file_for_project(
|
||||||
project.id,
|
project.id,
|
||||||
|
|
@ -1095,6 +1202,8 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
|
||||||
session,
|
session,
|
||||||
tier1_branded_ratio=tier1_branded_ratio,
|
tier1_branded_ratio=tier1_branded_ratio,
|
||||||
tier1_branded_text=tier1_branded_text,
|
tier1_branded_text=tier1_branded_text,
|
||||||
|
tier1_branded_plus_ratio=tier1_branded_plus_ratio,
|
||||||
|
brand_names=brand_names,
|
||||||
random_deployment_targets=random_deployment_targets
|
random_deployment_targets=random_deployment_targets
|
||||||
)
|
)
|
||||||
if job_file:
|
if job_file:
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue