# Epic 2: Content Ingestion & Generation
## Epic Goal
Implement the core workflow for ingesting CORA data and using AI to generate and format content into HTML that adheres to specific quality and SEO standards.
## Stories
### Story 2.1: CORA Report Data Ingestion
**As a User**, I want to run a script that ingests a CORA .xlsx file, so that a new project is created in the database with the necessary SEO data, including keywords, entities, related searches, and optional anchor text overrides.
**Acceptance Criteria**
- A CLI command is available to accept the path to a CORA .xlsx file.
- The script correctly extracts the specified data points from the spreadsheet (main keyword, entities, related searches, etc.).
- The script must also check for and store any optional, explicitly defined anchor text provided in the spreadsheet.
- A new project record is created in the database, associated with the authenticated user.
- The extracted SEO data is stored correctly in the new project record.
- The script handles errors gracefully if the file is not found or is in an incorrect format.
### Story 2.2: Configurable Content Rule Engine
**As an Admin**, I want to define specific content structure and quality rules in the master configuration, so that all AI-generated content consistently meets my SEO and quality standards.
**Acceptance Criteria**
- The system must load a "content_rules" object from the master JSON configuration file.
- The rule engine must validate that the
tag contains the main keyword from the project's data.
- The engine must validate that at least one tag starts with the main keyword.
- The engine must validate that other tags incorporate entities and related searches from the project's data.
- The engine must validate that at least one tag starts with the main keyword, and that others contain a mix of the keyword, entities, and related searches.
- The engine must validate a dedicated FAQ section where each question is an tag.
- The engine must enforce that the answer text for each FAQ begins by restating the question.
- For any AI-generated images, the engine must validate that the alt text contains the main keyword and associated entities.
- For interlinks, the engine must use the explicitly provided anchor text from the project data if it exists.
- If no explicit anchor text is provided, the engine must generate a default anchor text using a combination of the linked article's main keyword, entities, and related searches.
- The anchor text for the link to the home page must be the custom FQDN if one is mapped; otherwise, it must be the main keyword of the site/bucket.
- The anchor text for the link to the existing random article should be that article's main keyword.
### Story 2.3: AI-Powered Content Generation
**As a User**, I want to execute a job for a project that uses AI to generate a title, an outline, and full-text content, so that the core content is created automatically.
**Acceptance Criteria**
- A script can be initiated for a specific project ID.
- The script uses the project's SEO data to prompt an AI model for a title, an outline, and the main body content.
- The content generation process must apply and validate against the rules defined and loaded by the content rule engine (Story 2.2).
- The generated title, outline, and text are stored and associated with the project in the database.
- The process logs its progress (e.g., "Generating title...", "Generating content...").
- The script can handle potential API errors from the AI service.
### Story 2.4: HTML Formatting with Multiple Templates
**As a developer**, I want a module that takes the generated text content and formats it into a standard HTML file using one of a few predefined CSS templates, assigning one template per bucket/subdomain, so that all deployed content has a consistent look and feel per site.
**Acceptance Criteria**
- A directory of multiple, predefined HTML/CSS templates exists.
- The master JSON configuration file maps a specific template to each deployment target (e.g., S3 bucket, subdomain).
- A function accepts the generated content and a target identifier (e.g., bucket name).
- The function correctly selects and applies the appropriate template based on the configuration mapping.
- The content is structured into a valid HTML document with the selected CSS.
- The final HTML content is stored and associated with the project in the database.