Add tier1 branded anchor text ratio flag to ingest-cora command

- Add --tier1-branded-ratio flag (default: 0.75) to ingest-cora command - Prompt for branded anchor text when ratio is specified - Generate explicit anchor_text_config in tier1 job with specified ratio - Update documentation in CLI_COMMAND_REFERENCE.md, job-schema.md, and gui-planning.md
2026-01-16 15:53:07 -06:00 · 2026-01-16 15:53:07 -06:00 · 6e2977c500
parent ba306b9e10
commit 6e2977c500
4 changed files with 1649 additions and 21 deletions
--- a/docs/CLI_COMMAND_REFERENCE.md
+++ b/docs/CLI_COMMAND_REFERENCE.md
@ -350,6 +350,9 @@ Ingest a CORA .xlsx report and create a new project
 - `--custom-anchors`, `-a`
  - Type: STRING | Comma-separated list of custom anchor text (optional)
 - `--tier1-branded-ratio`
  - Type: FLOAT | Ratio of branded anchor text for tier1 (default: 0.75). When specified, prompts for branded anchor text (company name) and configures tier1 job with explicit anchor text terms achieving the specified ratio.
 - `--username`, `-u`
  - Type: STRING | Username for authentication
@ -362,6 +365,14 @@ Ingest a CORA .xlsx report and create a new project
 uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project"
 ```
 **Example with branded anchor text ratio:**
 ```bash
 uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project" --tier1-branded-ratio 0.75
 ```
 When using `--tier1-branded-ratio`, you will be prompted to enter the branded anchor text (company name). The generated job file will include tier1 anchor_text_config with explicit mode, where the specified percentage of terms are branded and the remainder are main keyword variations.
 ---
 ### `ingest-simple`
--- a/docs/gui-planning.md
+++ b/docs/gui-planning.md
--- a/docs/job-schema.md
+++ b/docs/job-schema.md
@ -142,11 +142,13 @@ Each tier in the `tiers` object defines content generation parameters for that s
 - **Type**: `boolean` (optional, default: `false`)
 - **Purpose**: Auto-create sites when available pool is insufficient
 - **Behavior**: Creates generic sites using project keyword as prefix
 - **Status**: ⚠️ **NOT IMPLEMENTED** - Parsed but does not function
 ### `create_sites_for_keywords`
 - **Type**: `Array<Object>` (optional)
 - **Purpose**: Pre-create sites for specific keywords before assignment
 - **Structure**: Each object must have `keyword` (string) and `count` (integer)
 - **Status**: ⚠️ **NOT IMPLEMENTED** - Parsed but does not function
 #### Keyword Site Creation Object
 | Field | Type | Required | Description |
@ -190,14 +192,7 @@ Each tier in the `tiers` object defines content generation parameters for that s
 | `outline` | `string` | Model to use for outline generation |
 | `content` | `string` | Model to use for content generation |
-### Available Models (from master.config.json)
+
 - `anthropic/claude-sonnet-4.5` (Claude Sonnet 4.5)
 - `anthropic/claude-3.5-sonnet` (Claude 3.5 Sonnet)
 - `openai/gpt-4o` (GPT-4 Optimized)
 - `openai/gpt-4o-mini` (GPT-4 Mini)
 - `meta-llama/llama-3.1-70b-instruct` (Llama 3.1 70B)
 - `meta-llama/llama-3.1-8b-instruct` (Llama 3.1 8B)
 - `google/gemini-2.5-flash` (Gemini 2.5 Flash)
 ### Example
 ```json
@ -249,6 +244,9 @@ Each tier in the `tiers` object defines content generation parameters for that s
 - **Type**: `Object` (optional)
 - **Purpose**: Configures how many tiered links to generate per article
 - **Default**: `{"min": 2, "max": 4}` if not specified
 - **Behavior**: 
  - Tier1: Always 1 link to money site (this setting ignored)
  - Tier2+: Random between min and max links to lower tier
 #### Tiered Link Range Object
 | Field | Type | Required | Description |
@ -266,6 +264,141 @@ Each tier in the `tiers` object defines content generation parameters for that s
 }
 ```
 ## Interlinking Configuration (Story 3.3)
 ### `interlinking`
 - **Type**: `Object` (optional)
 - **Purpose**: Configures internal linking behavior within articles
 - **Can be set at**: Job level (all tiers) or tier level (specific tier)
 - **Tier-level override**: Tier-level config overrides job-level for that tier
 #### Interlinking Object Fields
 | Field | Type | Description |
 |-------|------|-------------|
 | `links_per_article_min` | `integer` | Minimum number of tiered links (same as `tiered_link_count_range.min`) |
 | `links_per_article_max` | `integer` | Maximum number of tiered links (same as `tiered_link_count_range.max`) |
 | `see_also_min` | `integer` | Minimum number of "See Also" links to same-tier articles (default: 4) |
 | `see_also_max` | `integer` | Maximum number of "See Also" links to same-tier articles (default: 5) |
 ### Example
 ```json
 {
  "interlinking": {
    "links_per_article_min": 2,
    "links_per_article_max": 4,
    "see_also_min": 4,
    "see_also_max": 5
  }
 }
 ```
 **Behavior:**
 - `links_per_article_min/max`: Controls how many links to lower tier articles
 - `see_also_min/max`: Controls how many "See Also" links to randomly selected articles from the same tier
 ## Anchor Text Configuration (Story 8.1)
 ### `anchor_text_config`
 - **Type**: `Object` (optional)
 - **Purpose**: Controls anchor text selection for tiered links
 - **Can be set at**: Job level (all tiers) or tier level (specific tier)
 - **Tier-level override**: Tier-level config overrides job-level for that tier
 #### Anchor Text Config Modes
 Explicit is great for doing branded anchor text - we can add companyname to the mix as many times as we want to get the percentage we want.
 | Mode | Description |
 |------|-------------|
 | `default` | Use master.config.json tier rules (main_keyword for tier1, related_searches for tier2+) |
 | `override` | Replace tier rules with `custom_text` array |
 | `append` | Add `custom_text` array to tier rules |
 | `explicit` | Use only explicitly provided terms (no algorithm-generated terms) |
 #### Anchor Text Config Object (Job Level)
 | Field | Type | Description |
 |-------|------|-------------|
 | `mode` | `string` | One of: "default", "override", "append", "explicit" |
 | `custom_text` | `Array<string>` | Custom anchor text terms (for override/append modes) |
 | `tier1` | `Array<string>` | Explicit terms for tier1 (for explicit mode) |
 | `tier2` | `Array<string>` | Explicit terms for tier2 (for explicit mode) |
 | `tier3` | `Array<string>` | Explicit terms for tier3 (for explicit mode) |
 | `tier4_plus` | `Array<string>` | Explicit terms for tier4+ (for explicit mode) |
 #### Anchor Text Config Object (Tier Level)
 | Field | Type | Description |
 |-------|------|-------------|
 | `mode` | `string` | One of: "default", "override", "append", "explicit" |
 | `custom_text` | `Array<string>` | Custom anchor text terms (for override/append modes) |
 | `terms` | `Array<string>` | Explicit terms for this tier (for explicit mode) |
 ### Examples
 **Default mode (use tier rules):**
 ```json
 {
  "anchor_text_config": {
    "mode": "default"
  }
 }
 ```
 **Override mode (replace with custom text):**
 ```json
 {
  "anchor_text_config": {
    "mode": "override",
    "custom_text": ["custom term 1", "custom term 2"]
  }
 }
 ```
 **Explicit mode (job level):**
 ```json
 {
  "anchor_text_config": {
    "mode": "explicit",
    "tier1": ["high volume", "precision machining", "custom manufacturing"],
    "tier2": ["high volume production", "bulk manufacturing", "large scale"]
  }
 }
 ```
 **Explicit mode (tier level override):**
 ```json
 {
  "tiers": {
    "tier1": {
      "count": 12,
      "anchor_text_config": {
        "mode": "explicit",
        "terms": ["high volume", "precision"]
      }
    }
  }
 }
 ```
 **Explicit mode with branded anchor text ratio (generated via ingest-cora):**
 When using `ingest-cora` with `--tier1-branded-ratio`, the system automatically generates an explicit anchor text list with the specified ratio of branded terms. For example, with a 75% ratio and branded text "Acme Corp", the generated config might look like:
 ```json
 {
  "tiers": {
    "tier1": {
      "count": 10,
      "anchor_text_config": {
        "mode": "explicit",
        "terms": ["Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "main keyword", "learn about main keyword", "main keyword guide", "best main keyword", "main keyword tips"]
      }
    }
  }
 }
 ```
 This achieves 75% branded (15/20) and 25% keyword-based (5/20) anchor text selection.
 **Behavior:**
 - System tries to find provided terms in content first, then inserts if not found
 - When using "explicit" mode, only the provided terms are used (no algorithm-generated terms)
 - Tier-level explicit config takes precedence over job-level for that tier
 ## Complete Example
 ```json
--- a/src/cli/commands.py
+++ b/src/cli/commands.py
@ -9,6 +9,7 @@ from src.auth.service import AuthService
 from src.database.session import db_manager
 from src.database.repositories import UserRepository, SiteDeploymentRepository, ProjectRepository
 from src.database.models import User
 from src.interlinking.anchor_text_generator import AnchorTextGenerator
 from src.deployment.bunnynet import (
    BunnyNetClient,
    BunnyNetAPIError,
@ -36,7 +37,13 @@ from datetime import datetime
 load_dotenv()
-def create_job_file_for_project(project_id: int, project_name: str, session) -> Optional[str]:
+def create_job_file_for_project(
    project_id: int, 
    project_name: str, 
    session,
    tier1_branded_ratio: Optional[float] = None,
    tier1_branded_text: Optional[str] = None
 ) -> Optional[str]:
    """
    Create a job JSON file for a newly created project.
@ -44,6 +51,8 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
        project_id: The ID of the created project
        project_name: The name of the project (for filename)
        session: Database session
        tier1_branded_ratio: Optional ratio of branded anchor text for tier1 (0.0-1.0)
        tier1_branded_text: Optional branded anchor text (company name) for tier1
    Returns:
        Path to created file, or None if creation failed
@ -81,13 +90,8 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
            base_filename = f"{sanitized_name}-{date_suffix}.json"
            filepath = jobs_dir / base_filename
-        job_template = {
+        # Build tier1 configuration
-            "jobs": [
+        tier1_config = {
                {
                    "project_id": project_id,
                    "deployment_targets": selected_domains,
                    "tiers": {
                        "tier1": {
            "count": t1_count,
            "min_word_count": 1250,
            "max_word_count": 2000,
@ -96,7 +100,40 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
                "outline": "openai/gpt-4o-mini",
                "content": "x-ai/grok-4-fast"
            }
-                        },
+        }
        # Add anchor_text_config if branded ratio and text are provided
        if tier1_branded_ratio is not None and tier1_branded_text:
            # Get project to retrieve main_keyword for non-branded terms
            project_repo = ProjectRepository(session)
            project = project_repo.get_by_id(project_id)
            if project and project.main_keyword:
                # Generate keyword variations for non-branded terms
                anchor_generator = AnchorTextGenerator()
                keyword_variations = anchor_generator._generate_from_keyword(project, 10)
                # Calculate term distribution (use 20 terms for good distribution)
                total_terms = 20
                branded_count = int(total_terms * tier1_branded_ratio)
                keyword_count = total_terms - branded_count
                # Create anchor text list with branded terms and keyword variations
                anchor_terms = [tier1_branded_text] * branded_count
                anchor_terms.extend(keyword_variations[:keyword_count])
                tier1_config["anchor_text_config"] = {
                    "mode": "explicit",
                    "terms": anchor_terms
                }
        job_template = {
            "jobs": [
                {
                    "project_id": project_id,
                    "deployment_targets": selected_domains,
                    "tiers": {
                        "tier1": tier1_config,
                        "tier2": {
                            "count": t2_count,
                            "min_word_count": 1000,
@ -943,9 +980,10 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run
@click.option('--name', '-n', required=True, help='Project name')
@click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)')
@click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)')
@click.option('--tier1-branded-ratio', default=0.75, type=float, help='Ratio of branded anchor text for tier1 (default: 0.75)')
@click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-p', help='Password for authentication')
-def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], username: Optional[str], password: Optional[str]):
+def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, username: Optional[str], password: Optional[str]):
    """Ingest a CORA .xlsx report and create a new project"""
    try:
        if not username or not password:
@ -1014,7 +1052,25 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
            if project.custom_anchor_text:
                click.echo(f"Custom Anchor Text: {', '.join(project.custom_anchor_text)}")
-            job_file = create_job_file_for_project(project.id, project.name, session)
+            # Handle tier1 branded anchor text if ratio is specified
            tier1_branded_text = None
            if tier1_branded_ratio is not None and tier1_branded_ratio > 0:
                tier1_branded_text = click.prompt(
                    "\nEnter branded anchor text (company name) for tier1",
                    type=str
                ).strip()
                if not tier1_branded_text:
                    click.echo("Warning: Empty branded anchor text provided, skipping tier1 branded anchor text configuration.", err=True)
                    tier1_branded_text = None
                    tier1_branded_ratio = None
            job_file = create_job_file_for_project(
                project.id, 
                project.name, 
                session,
                tier1_branded_ratio=tier1_branded_ratio,
                tier1_branded_text=tier1_branded_text
            )
            if job_file:
                click.echo(f"Job file created: {job_file}")
@ -1193,6 +1249,133 @@ def list_projects(username: Optional[str], password: Optional[str]):
        raise click.Abort()
@app.command("create-job")
@click.option('--project-id', '-p', required=True, type=int, help='Project ID to create job file for')
@click.option('--deployment-targets', '-d', multiple=True, help='Deployment target hostnames (can specify multiple times)')
@click.option('--tier1-count', default=10, type=int, help='Number of tier1 articles (default: 10)')
@click.option('--tier2-count', default=30, type=int, help='Number of tier2 articles (default: 30)')
@click.option('--output', '-o', type=click.Path(), help='Output file path (default: jobs/{project_name}.json)')
@click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-pwd', help='Password for authentication')
 def create_job(
    project_id: int,
    deployment_targets: tuple,
    tier1_count: int,
    tier2_count: int,
    output: Optional[str],
    username: Optional[str],
    password: Optional[str]
 ):
    """Create a job file from an existing project ID"""
    try:
        if not username or not password:
            username, password = prompt_admin_credentials()
        session = db_manager.get_session()
        try:
            user_repo = UserRepository(session)
            auth_service = AuthService(user_repo)
            user = auth_service.authenticate_user(username, password)
            if not user:
                click.echo("Error: Authentication failed", err=True)
                raise click.Abort()
            project_repo = ProjectRepository(session)
            project = project_repo.get_by_id(project_id)
            if not project:
                click.echo(f"Error: Project {project_id} not found", err=True)
                raise click.Abort()
            deployment_targets_list = list(deployment_targets) if deployment_targets else None
            if not deployment_targets_list:
                site_repo = SiteDeploymentRepository(session)
                sites = site_repo.get_all()
                available_domains = [
                    site.custom_hostname 
                    for site in sites 
                    if site.custom_hostname is not None
                ]
                if available_domains:
                    click.echo(f"Available sites: {', '.join(available_domains[:5])}{'...' if len(available_domains) > 5 else ''}")
                    click.echo("Note: No deployment_targets specified. You can add them manually to the job file.")
            sanitized_name = "".join(c if c.isalnum() or c in ('-', '_') else '-' for c in project.name.lower()).strip('-')
            sanitized_name = '-'.join(sanitized_name.split())
            jobs_dir = Path("jobs")
            jobs_dir.mkdir(exist_ok=True)
            if output:
                filepath = Path(output)
            else:
                base_filename = f"{sanitized_name}.json"
                filepath = jobs_dir / base_filename
                if filepath.exists():
                    date_suffix = datetime.now().strftime("%y%m%d")
                    base_filename = f"{sanitized_name}-{date_suffix}.json"
                    filepath = jobs_dir / base_filename
            job_template = {
                "jobs": [
                    {
                        "project_id": project_id,
                        "tiers": {
                            "tier1": {
                                "count": tier1_count,
                                "min_word_count": 1250,
                                "max_word_count": 2000,
                                "models": {
                                    "title": "openai/gpt-4o-mini",
                                    "outline": "openai/gpt-4o-mini",
                                    "content": "anthropic/claude-3.5-sonnet"
                                }
                            },
                            "tier2": {
                                "count": tier2_count,
                                "min_word_count": 1000,
                                "max_word_count": 1250,
                                "models": {
                                    "title": "openai/gpt-4o-mini",
                                    "outline": "openai/gpt-4o-mini",
                                    "content": "openai/gpt-4o-mini"
                                },
                                "interlinking": {
                                    "links_per_article_min": 3,
                                    "links_per_article_max": 6
                                }
                            }
                        }
                    }
                ]
            }
            if deployment_targets_list:
                job_template["jobs"][0]["deployment_targets"] = deployment_targets_list
            with open(filepath, 'w', encoding='utf-8') as f:
                json.dump(job_template, f, indent=2)
            click.echo(f"\nJob file created: {filepath}")
            click.echo(f"Project: {project.name} (ID: {project_id})")
            click.echo(f"Tier1: {tier1_count} articles")
            click.echo(f"Tier2: {tier2_count} articles")
            if deployment_targets_list:
                click.echo(f"Deployment targets: {', '.join(deployment_targets_list)}")
            click.echo(f"\nTo run this job:")
            click.echo(f"  uv run python main.py generate-batch --job-file {filepath} -u {username} --password <password>")
        finally:
            session.close()
    except Exception as e:
        click.echo(f"Error creating job file: {e}", err=True)
        raise click.Abort()
@app.command("generate-batch")
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True), 
              help='Path to job JSON file')