4.4 KiB
4.4 KiB
Branded+ Anchor Text Implementation Plan
Overview
Enhance the ingest-cora command to support "branded+" anchor text generation, which combines brand names with related searches. Add a brand mapping system to store company URLs and their associated brand names, and update the anchor text calculation logic to handle branded, branded+, and regular terms sequentially.
Components
1. Brand Mapping Storage
- File:
brands.json(root directory) - Format: JSON mapping normalized domains to brand name arrays
{ "gullco.com": ["Gullco", "Gullco International"] } - Location: Project root for easy editing
- Normalization: Store only normalized domains (no www., no scheme)
2. Brand Lookup Helper (Inline)
- File:
src/cli/commands.py(add helper function) - Function:
_get_brands_for_url(url: str) -> List[str]- Extract domain from URL (remove scheme, www., trailing slash)
- Load brands.json from project root
- Lookup normalized domain
- Return brand names list or empty list if not found/file missing
3. Branded+ Anchor Text Generation
- File:
src/cli/commands.py(modifycreate_job_file_for_project) - Patterns: Generate two variations per related search:
"{brand} {term}"(e.g., "Gullco welder")"{term} by {brand}"(e.g., "welder by Gullco")
- Logic: For each brand name and each related search, generate both patterns
4. CLI Command Updates
- File:
src/cli/commands.py(modifyingest_cora) - New flag:
--tier1-branded-plus-ratio(float, optional)- Only prompts for branded+ if this flag is provided
- Prompts for percentage (0.0-1.0) of remaining slots after branded
- Brand text prompt update:
- Show default brands from brand mapping if URL found
- Allow Enter to accept defaults
- Format: "Enter branded anchor text (company name) for tier1 [default: 'Gullco, Gullco International'] (press Enter for default):"
5. Anchor Text Calculation Logic
- File:
src/cli/commands.py(modifycreate_job_file_for_project) - Calculation order:
- Get available terms (custom_anchor_text or related_searches)
- Calculate branded count:
total * tier1_branded_ratio - Calculate remaining:
total - branded_count - Calculate branded+ count:
remaining * branded_plus_ratio(if enabled) - Calculate regular count:
remaining - branded_plus_count
- Generation:
- Branded terms: Use provided brand names (cycled)
- Branded+ terms: Generate from brands + related_searches (both patterns)
- Regular terms: Use remaining related_searches/keyword variations
6. Function Signature Updates
- File:
src/cli/commands.py create_job_file_for_project:- Add
tier1_branded_plus_ratio: Optional[float] = None - Add
brand_names: Optional[List[str]] = None(for branded+ generation)
- Add
ingest_cora:- Add
tier1_branded_plus_ratio: Optional[float] = Noneparameter - Pass brand names to
create_job_file_for_project
- Add
Implementation Details
Brand Lookup Flow
- Normalize
money_site_url: remove scheme (http://, https://), remove www. prefix, remove trailing slash - Look up normalized domain in brands.json
- Return list of brand names or empty list if not found
Branded+ Generation Example
- Brands: ["Gullco", "Gullco International"]
- Related searches: ["welder", "automatic welder"]
- Generated terms:
- "Gullco welder"
- "welder by Gullco"
- "Gullco automatic welder"
- "automatic welder by Gullco"
- "Gullco International welder"
- "welder by Gullco International"
- "Gullco International automatic welder"
- "automatic welder by Gullco International"
Anchor Text Distribution Example
- Total available terms: 10
tier1_branded_ratio: 0.4 → 4 branded terms- Remaining: 6
tier1_branded_plus_ratio: 0.67 → 4 branded+ terms- Regular: 2 terms
- Final list: [4 branded, 4 branded+, 2 regular]
Files to Modify
src/cli/commands.py- Add branded+ logic, brand lookup helper, update prompts, calculationbrands.json- New file for brand mappings (create with example entry)
Testing Considerations
- Test with brand mapping present and absent
- Test with Enter (default) and custom brand input
- Test branded+ calculation with various ratios
- Test URL normalization (with/without www., http/https)
- Test with multiple brand names per URL
- Test with no related searches (fallback behavior)