Add tier1 branded anchor text ratio flag to ingest-cora command

- Add --tier1-branded-ratio flag (default: 0.75) to ingest-cora command
- Prompt for branded anchor text when ratio is specified
- Generate explicit anchor_text_config in tier1 job with specified ratio
- Update documentation in CLI_COMMAND_REFERENCE.md, job-schema.md, and gui-planning.md
main
PeninsulaInd 2026-01-16 15:53:07 -06:00
parent ba306b9e10
commit 6e2977c500
4 changed files with 1649 additions and 21 deletions

View File

@ -350,6 +350,9 @@ Ingest a CORA .xlsx report and create a new project
- `--custom-anchors`, `-a` - `--custom-anchors`, `-a`
- Type: STRING | Comma-separated list of custom anchor text (optional) - Type: STRING | Comma-separated list of custom anchor text (optional)
- `--tier1-branded-ratio`
- Type: FLOAT | Ratio of branded anchor text for tier1 (default: 0.75). When specified, prompts for branded anchor text (company name) and configures tier1 job with explicit anchor text terms achieving the specified ratio.
- `--username`, `-u` - `--username`, `-u`
- Type: STRING | Username for authentication - Type: STRING | Username for authentication
@ -362,6 +365,14 @@ Ingest a CORA .xlsx report and create a new project
uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project" uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project"
``` ```
**Example with branded anchor text ratio:**
```bash
uv run python main.py ingest-cora --file path/to/file.xlsx --name "My Project" --tier1-branded-ratio 0.75
```
When using `--tier1-branded-ratio`, you will be prompted to enter the branded anchor text (company name). The generated job file will include tier1 anchor_text_config with explicit mode, where the specified percentage of terms are branded and the remainder are main keyword variations.
--- ---
### `ingest-simple` ### `ingest-simple`

1301
docs/gui-planning.md 100644

File diff suppressed because it is too large Load Diff

View File

@ -142,11 +142,13 @@ Each tier in the `tiers` object defines content generation parameters for that s
- **Type**: `boolean` (optional, default: `false`) - **Type**: `boolean` (optional, default: `false`)
- **Purpose**: Auto-create sites when available pool is insufficient - **Purpose**: Auto-create sites when available pool is insufficient
- **Behavior**: Creates generic sites using project keyword as prefix - **Behavior**: Creates generic sites using project keyword as prefix
- **Status**: ⚠️ **NOT IMPLEMENTED** - Parsed but does not function
### `create_sites_for_keywords` ### `create_sites_for_keywords`
- **Type**: `Array<Object>` (optional) - **Type**: `Array<Object>` (optional)
- **Purpose**: Pre-create sites for specific keywords before assignment - **Purpose**: Pre-create sites for specific keywords before assignment
- **Structure**: Each object must have `keyword` (string) and `count` (integer) - **Structure**: Each object must have `keyword` (string) and `count` (integer)
- **Status**: ⚠️ **NOT IMPLEMENTED** - Parsed but does not function
#### Keyword Site Creation Object #### Keyword Site Creation Object
| Field | Type | Required | Description | | Field | Type | Required | Description |
@ -190,14 +192,7 @@ Each tier in the `tiers` object defines content generation parameters for that s
| `outline` | `string` | Model to use for outline generation | | `outline` | `string` | Model to use for outline generation |
| `content` | `string` | Model to use for content generation | | `content` | `string` | Model to use for content generation |
### Available Models (from master.config.json)
- `anthropic/claude-sonnet-4.5` (Claude Sonnet 4.5)
- `anthropic/claude-3.5-sonnet` (Claude 3.5 Sonnet)
- `openai/gpt-4o` (GPT-4 Optimized)
- `openai/gpt-4o-mini` (GPT-4 Mini)
- `meta-llama/llama-3.1-70b-instruct` (Llama 3.1 70B)
- `meta-llama/llama-3.1-8b-instruct` (Llama 3.1 8B)
- `google/gemini-2.5-flash` (Gemini 2.5 Flash)
### Example ### Example
```json ```json
@ -249,6 +244,9 @@ Each tier in the `tiers` object defines content generation parameters for that s
- **Type**: `Object` (optional) - **Type**: `Object` (optional)
- **Purpose**: Configures how many tiered links to generate per article - **Purpose**: Configures how many tiered links to generate per article
- **Default**: `{"min": 2, "max": 4}` if not specified - **Default**: `{"min": 2, "max": 4}` if not specified
- **Behavior**:
- Tier1: Always 1 link to money site (this setting ignored)
- Tier2+: Random between min and max links to lower tier
#### Tiered Link Range Object #### Tiered Link Range Object
| Field | Type | Required | Description | | Field | Type | Required | Description |
@ -266,6 +264,141 @@ Each tier in the `tiers` object defines content generation parameters for that s
} }
``` ```
## Interlinking Configuration (Story 3.3)
### `interlinking`
- **Type**: `Object` (optional)
- **Purpose**: Configures internal linking behavior within articles
- **Can be set at**: Job level (all tiers) or tier level (specific tier)
- **Tier-level override**: Tier-level config overrides job-level for that tier
#### Interlinking Object Fields
| Field | Type | Description |
|-------|------|-------------|
| `links_per_article_min` | `integer` | Minimum number of tiered links (same as `tiered_link_count_range.min`) |
| `links_per_article_max` | `integer` | Maximum number of tiered links (same as `tiered_link_count_range.max`) |
| `see_also_min` | `integer` | Minimum number of "See Also" links to same-tier articles (default: 4) |
| `see_also_max` | `integer` | Maximum number of "See Also" links to same-tier articles (default: 5) |
### Example
```json
{
"interlinking": {
"links_per_article_min": 2,
"links_per_article_max": 4,
"see_also_min": 4,
"see_also_max": 5
}
}
```
**Behavior:**
- `links_per_article_min/max`: Controls how many links to lower tier articles
- `see_also_min/max`: Controls how many "See Also" links to randomly selected articles from the same tier
## Anchor Text Configuration (Story 8.1)
### `anchor_text_config`
- **Type**: `Object` (optional)
- **Purpose**: Controls anchor text selection for tiered links
- **Can be set at**: Job level (all tiers) or tier level (specific tier)
- **Tier-level override**: Tier-level config overrides job-level for that tier
#### Anchor Text Config Modes
Explicit is great for doing branded anchor text - we can add companyname to the mix as many times as we want to get the percentage we want.
| Mode | Description |
|------|-------------|
| `default` | Use master.config.json tier rules (main_keyword for tier1, related_searches for tier2+) |
| `override` | Replace tier rules with `custom_text` array |
| `append` | Add `custom_text` array to tier rules |
| `explicit` | Use only explicitly provided terms (no algorithm-generated terms) |
#### Anchor Text Config Object (Job Level)
| Field | Type | Description |
|-------|------|-------------|
| `mode` | `string` | One of: "default", "override", "append", "explicit" |
| `custom_text` | `Array<string>` | Custom anchor text terms (for override/append modes) |
| `tier1` | `Array<string>` | Explicit terms for tier1 (for explicit mode) |
| `tier2` | `Array<string>` | Explicit terms for tier2 (for explicit mode) |
| `tier3` | `Array<string>` | Explicit terms for tier3 (for explicit mode) |
| `tier4_plus` | `Array<string>` | Explicit terms for tier4+ (for explicit mode) |
#### Anchor Text Config Object (Tier Level)
| Field | Type | Description |
|-------|------|-------------|
| `mode` | `string` | One of: "default", "override", "append", "explicit" |
| `custom_text` | `Array<string>` | Custom anchor text terms (for override/append modes) |
| `terms` | `Array<string>` | Explicit terms for this tier (for explicit mode) |
### Examples
**Default mode (use tier rules):**
```json
{
"anchor_text_config": {
"mode": "default"
}
}
```
**Override mode (replace with custom text):**
```json
{
"anchor_text_config": {
"mode": "override",
"custom_text": ["custom term 1", "custom term 2"]
}
}
```
**Explicit mode (job level):**
```json
{
"anchor_text_config": {
"mode": "explicit",
"tier1": ["high volume", "precision machining", "custom manufacturing"],
"tier2": ["high volume production", "bulk manufacturing", "large scale"]
}
}
```
**Explicit mode (tier level override):**
```json
{
"tiers": {
"tier1": {
"count": 12,
"anchor_text_config": {
"mode": "explicit",
"terms": ["high volume", "precision"]
}
}
}
}
```
**Explicit mode with branded anchor text ratio (generated via ingest-cora):**
When using `ingest-cora` with `--tier1-branded-ratio`, the system automatically generates an explicit anchor text list with the specified ratio of branded terms. For example, with a 75% ratio and branded text "Acme Corp", the generated config might look like:
```json
{
"tiers": {
"tier1": {
"count": 10,
"anchor_text_config": {
"mode": "explicit",
"terms": ["Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "Acme Corp", "main keyword", "learn about main keyword", "main keyword guide", "best main keyword", "main keyword tips"]
}
}
}
}
```
This achieves 75% branded (15/20) and 25% keyword-based (5/20) anchor text selection.
**Behavior:**
- System tries to find provided terms in content first, then inserts if not found
- When using "explicit" mode, only the provided terms are used (no algorithm-generated terms)
- Tier-level explicit config takes precedence over job-level for that tier
## Complete Example ## Complete Example
```json ```json

View File

@ -9,6 +9,7 @@ from src.auth.service import AuthService
from src.database.session import db_manager from src.database.session import db_manager
from src.database.repositories import UserRepository, SiteDeploymentRepository, ProjectRepository from src.database.repositories import UserRepository, SiteDeploymentRepository, ProjectRepository
from src.database.models import User from src.database.models import User
from src.interlinking.anchor_text_generator import AnchorTextGenerator
from src.deployment.bunnynet import ( from src.deployment.bunnynet import (
BunnyNetClient, BunnyNetClient,
BunnyNetAPIError, BunnyNetAPIError,
@ -36,7 +37,13 @@ from datetime import datetime
load_dotenv() load_dotenv()
def create_job_file_for_project(project_id: int, project_name: str, session) -> Optional[str]: def create_job_file_for_project(
project_id: int,
project_name: str,
session,
tier1_branded_ratio: Optional[float] = None,
tier1_branded_text: Optional[str] = None
) -> Optional[str]:
""" """
Create a job JSON file for a newly created project. Create a job JSON file for a newly created project.
@ -44,6 +51,8 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
project_id: The ID of the created project project_id: The ID of the created project
project_name: The name of the project (for filename) project_name: The name of the project (for filename)
session: Database session session: Database session
tier1_branded_ratio: Optional ratio of branded anchor text for tier1 (0.0-1.0)
tier1_branded_text: Optional branded anchor text (company name) for tier1
Returns: Returns:
Path to created file, or None if creation failed Path to created file, or None if creation failed
@ -81,13 +90,8 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
base_filename = f"{sanitized_name}-{date_suffix}.json" base_filename = f"{sanitized_name}-{date_suffix}.json"
filepath = jobs_dir / base_filename filepath = jobs_dir / base_filename
job_template = { # Build tier1 configuration
"jobs": [ tier1_config = {
{
"project_id": project_id,
"deployment_targets": selected_domains,
"tiers": {
"tier1": {
"count": t1_count, "count": t1_count,
"min_word_count": 1250, "min_word_count": 1250,
"max_word_count": 2000, "max_word_count": 2000,
@ -96,7 +100,40 @@ def create_job_file_for_project(project_id: int, project_name: str, session) ->
"outline": "openai/gpt-4o-mini", "outline": "openai/gpt-4o-mini",
"content": "x-ai/grok-4-fast" "content": "x-ai/grok-4-fast"
} }
}, }
# Add anchor_text_config if branded ratio and text are provided
if tier1_branded_ratio is not None and tier1_branded_text:
# Get project to retrieve main_keyword for non-branded terms
project_repo = ProjectRepository(session)
project = project_repo.get_by_id(project_id)
if project and project.main_keyword:
# Generate keyword variations for non-branded terms
anchor_generator = AnchorTextGenerator()
keyword_variations = anchor_generator._generate_from_keyword(project, 10)
# Calculate term distribution (use 20 terms for good distribution)
total_terms = 20
branded_count = int(total_terms * tier1_branded_ratio)
keyword_count = total_terms - branded_count
# Create anchor text list with branded terms and keyword variations
anchor_terms = [tier1_branded_text] * branded_count
anchor_terms.extend(keyword_variations[:keyword_count])
tier1_config["anchor_text_config"] = {
"mode": "explicit",
"terms": anchor_terms
}
job_template = {
"jobs": [
{
"project_id": project_id,
"deployment_targets": selected_domains,
"tiers": {
"tier1": tier1_config,
"tier2": { "tier2": {
"count": t2_count, "count": t2_count,
"min_word_count": 1000, "min_word_count": 1000,
@ -943,9 +980,10 @@ def sync_sites(admin_user: Optional[str], admin_password: Optional[str], dry_run
@click.option('--name', '-n', required=True, help='Project name') @click.option('--name', '-n', required=True, help='Project name')
@click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)') @click.option('--money-site-url', '-m', help='Money site URL (e.g., https://example.com)')
@click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)') @click.option('--custom-anchors', '-a', help='Comma-separated list of custom anchor text (optional)')
@click.option('--tier1-branded-ratio', default=0.75, type=float, help='Ratio of branded anchor text for tier1 (default: 0.75)')
@click.option('--username', '-u', help='Username for authentication') @click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-p', help='Password for authentication') @click.option('--password', '-p', help='Password for authentication')
def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], username: Optional[str], password: Optional[str]): def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom_anchors: Optional[str], tier1_branded_ratio: float, username: Optional[str], password: Optional[str]):
"""Ingest a CORA .xlsx report and create a new project""" """Ingest a CORA .xlsx report and create a new project"""
try: try:
if not username or not password: if not username or not password:
@ -1014,7 +1052,25 @@ def ingest_cora(file_path: str, name: str, money_site_url: Optional[str], custom
if project.custom_anchor_text: if project.custom_anchor_text:
click.echo(f"Custom Anchor Text: {', '.join(project.custom_anchor_text)}") click.echo(f"Custom Anchor Text: {', '.join(project.custom_anchor_text)}")
job_file = create_job_file_for_project(project.id, project.name, session) # Handle tier1 branded anchor text if ratio is specified
tier1_branded_text = None
if tier1_branded_ratio is not None and tier1_branded_ratio > 0:
tier1_branded_text = click.prompt(
"\nEnter branded anchor text (company name) for tier1",
type=str
).strip()
if not tier1_branded_text:
click.echo("Warning: Empty branded anchor text provided, skipping tier1 branded anchor text configuration.", err=True)
tier1_branded_text = None
tier1_branded_ratio = None
job_file = create_job_file_for_project(
project.id,
project.name,
session,
tier1_branded_ratio=tier1_branded_ratio,
tier1_branded_text=tier1_branded_text
)
if job_file: if job_file:
click.echo(f"Job file created: {job_file}") click.echo(f"Job file created: {job_file}")
@ -1193,6 +1249,133 @@ def list_projects(username: Optional[str], password: Optional[str]):
raise click.Abort() raise click.Abort()
@app.command("create-job")
@click.option('--project-id', '-p', required=True, type=int, help='Project ID to create job file for')
@click.option('--deployment-targets', '-d', multiple=True, help='Deployment target hostnames (can specify multiple times)')
@click.option('--tier1-count', default=10, type=int, help='Number of tier1 articles (default: 10)')
@click.option('--tier2-count', default=30, type=int, help='Number of tier2 articles (default: 30)')
@click.option('--output', '-o', type=click.Path(), help='Output file path (default: jobs/{project_name}.json)')
@click.option('--username', '-u', help='Username for authentication')
@click.option('--password', '-pwd', help='Password for authentication')
def create_job(
project_id: int,
deployment_targets: tuple,
tier1_count: int,
tier2_count: int,
output: Optional[str],
username: Optional[str],
password: Optional[str]
):
"""Create a job file from an existing project ID"""
try:
if not username or not password:
username, password = prompt_admin_credentials()
session = db_manager.get_session()
try:
user_repo = UserRepository(session)
auth_service = AuthService(user_repo)
user = auth_service.authenticate_user(username, password)
if not user:
click.echo("Error: Authentication failed", err=True)
raise click.Abort()
project_repo = ProjectRepository(session)
project = project_repo.get_by_id(project_id)
if not project:
click.echo(f"Error: Project {project_id} not found", err=True)
raise click.Abort()
deployment_targets_list = list(deployment_targets) if deployment_targets else None
if not deployment_targets_list:
site_repo = SiteDeploymentRepository(session)
sites = site_repo.get_all()
available_domains = [
site.custom_hostname
for site in sites
if site.custom_hostname is not None
]
if available_domains:
click.echo(f"Available sites: {', '.join(available_domains[:5])}{'...' if len(available_domains) > 5 else ''}")
click.echo("Note: No deployment_targets specified. You can add them manually to the job file.")
sanitized_name = "".join(c if c.isalnum() or c in ('-', '_') else '-' for c in project.name.lower()).strip('-')
sanitized_name = '-'.join(sanitized_name.split())
jobs_dir = Path("jobs")
jobs_dir.mkdir(exist_ok=True)
if output:
filepath = Path(output)
else:
base_filename = f"{sanitized_name}.json"
filepath = jobs_dir / base_filename
if filepath.exists():
date_suffix = datetime.now().strftime("%y%m%d")
base_filename = f"{sanitized_name}-{date_suffix}.json"
filepath = jobs_dir / base_filename
job_template = {
"jobs": [
{
"project_id": project_id,
"tiers": {
"tier1": {
"count": tier1_count,
"min_word_count": 1250,
"max_word_count": 2000,
"models": {
"title": "openai/gpt-4o-mini",
"outline": "openai/gpt-4o-mini",
"content": "anthropic/claude-3.5-sonnet"
}
},
"tier2": {
"count": tier2_count,
"min_word_count": 1000,
"max_word_count": 1250,
"models": {
"title": "openai/gpt-4o-mini",
"outline": "openai/gpt-4o-mini",
"content": "openai/gpt-4o-mini"
},
"interlinking": {
"links_per_article_min": 3,
"links_per_article_max": 6
}
}
}
}
]
}
if deployment_targets_list:
job_template["jobs"][0]["deployment_targets"] = deployment_targets_list
with open(filepath, 'w', encoding='utf-8') as f:
json.dump(job_template, f, indent=2)
click.echo(f"\nJob file created: {filepath}")
click.echo(f"Project: {project.name} (ID: {project_id})")
click.echo(f"Tier1: {tier1_count} articles")
click.echo(f"Tier2: {tier2_count} articles")
if deployment_targets_list:
click.echo(f"Deployment targets: {', '.join(deployment_targets_list)}")
click.echo(f"\nTo run this job:")
click.echo(f" uv run python main.py generate-batch --job-file {filepath} -u {username} --password <password>")
finally:
session.close()
except Exception as e:
click.echo(f"Error creating job file: {e}", err=True)
raise click.Abort()
@app.command("generate-batch") @app.command("generate-batch")
@click.option('--job-file', '-j', required=True, type=click.Path(exists=True), @click.option('--job-file', '-j', required=True, type=click.Path(exists=True),
help='Path to job JSON file') help='Path to job JSON file')