tag contains the main keyword from the project's data. - The engine must validate that at least one

# Epic 2: Content Ingestion & Generation

## Epic Goal
Implement the core workflow for ingesting CORA data and using AI to generate and format content into HTML that adheres to specific quality and SEO standards.

## Stories

### Story 2.1: CORA Report Data Ingestion
**As a User**, I want to run a script that ingests a CORA .xlsx file, so that a new project is created in the database with the necessary SEO data, including keywords, entities, related searches, and optional anchor text overrides.

**Acceptance Criteria**
- A CLI command is available to accept the path to a CORA .xlsx file.
- The script correctly extracts the specified data points from the spreadsheet (main keyword, entities, related searches, etc.).
- The script must also check for and store any optional, explicitly defined anchor text provided in the spreadsheet.
- A new project record is created in the database, associated with the authenticated user.
- The extracted SEO data is stored correctly in the new project record.
- The script handles errors gracefully if the file is not found or is in an incorrect format.

### Story 2.2: Configurable Content Rule Engine
**As an Admin**, I want to define specific content structure and quality rules in the master configuration, so that all AI-generated content consistently meets my SEO and quality standards.

**Acceptance Criteria**
- The system must load a "content_rules" object from the master JSON configuration file.
- The rule engine must validate that the <h1> tag contains the main keyword from the project's data.
- The engine must validate that at least one <h2> tag starts with the main keyword.
- The engine must validate that other <h2> tags incorporate entities and related searches from the project's data.
- The engine must validate that at least one <h3> tag starts with the main keyword, and that others contain a mix of the keyword, entities, and related searches.
- The engine must validate a dedicated FAQ section where each question is an <h3> tag.
- The engine must enforce that the answer text for each FAQ <h3> begins by restating the question.
- For any AI-generated images, the engine must validate that the alt text contains the main keyword and associated entities.
- For interlinks, the engine must use the explicitly provided anchor text from the project data if it exists.
- If no explicit anchor text is provided, the engine must generate a default anchor text using a combination of the linked article's main keyword, entities, and related searches.
- The anchor text for the link to the home page must be the custom FQDN if one is mapped; otherwise, it must be the main keyword of the site/bucket.
- The anchor text for the link to the existing random article should be that article's main keyword.

### Story 2.3: AI-Powered Content Generation
**As a User**, I want to execute a job for a project that uses AI to generate a title, an outline, and full-text content, so that the core content is created automatically.

**Acceptance Criteria**
- A script can be initiated for a specific project ID.
- The script uses the project's SEO data to prompt an AI model for a title, an outline, and the main body content.
- The content generation process must apply and validate against the rules defined and loaded by the content rule engine (Story 2.2).
- The generated title, outline, and text are stored and associated with the project in the database.
- The process logs its progress (e.g., "Generating title...", "Generating content...").
- The script can handle potential API errors from the AI service.

### Story 2.4: HTML Formatting with Multiple Templates
**As a developer**, I want a module that takes the generated text content and formats it into a standard HTML file using one of a few predefined CSS templates, assigning one template per bucket/subdomain, so that all deployed content has a consistent look and feel per site.

**Acceptance Criteria**
- A directory of multiple, predefined HTML/CSS templates exists.
- The master JSON configuration file maps a specific template to each deployment target (e.g., S3 bucket, subdomain).
- A function accepts the generated content and a target identifier (e.g., bucket name).
- The function correctly selects and applies the appropriate template based on the configuration mapping.
- The content is structured into a valid HTML document with the selected CSS.
- The final HTML content is stored and associated with the project in the database.

**Dependencies**
- Story 2.5 (optional): If no site_deployment_id is assigned, template selection defaults to random.

### Story 2.5: Deployment Target Assignment
**As a developer**, I want to assign deployment targets to generated content during the content generation process, so that each article knows which site/bucket it will be deployed to and can use the appropriate template.

**Acceptance Criteria**
- The job configuration file supports an optional `deployment_targets` array containing site custom_hostnames or site_deployment_ids.
- The job configuration file supports an optional `deployment_overflow` strategy ("round_robin", "random_available", or "none").
- During content generation, each article is assigned a `site_deployment_id` based on its index in the batch:
  - If `deployment_targets` is specified, cycle through the list (round-robin by default).
  - If the batch size exceeds the target list, apply the overflow strategy.
  - If no `deployment_targets` specified, `site_deployment_id` remains null (random template in Story 2.4).
- The `site_deployment_id` is stored in the `GeneratedContent` record at creation time.
- Invalid site references in `deployment_targets` cause graceful errors with clear messages.