From 31b958029b8dbffb3cbde486c046aa71e37c6dd8 Mon Sep 17 00:00:00 2001 From: PeninsulaInd Date: Fri, 17 Oct 2025 23:34:54 -0500 Subject: [PATCH] Initial commit: Project structure and planning documents --- docs/architecture.md | 23 ++++++++ docs/architecture/components.md | 35 ++++++++++++ docs/architecture/data-models.md | 56 ++++++++++++++++++ docs/architecture/error-handling.md | 48 ++++++++++++++++ docs/architecture/overview.md | 58 +++++++++++++++++++ docs/architecture/source-tree.md | 59 +++++++++++++++++++ docs/architecture/tech-stack.md | 19 +++++++ docs/architecture/testing-strategy.md | 13 +++++ docs/architecture/workflows.md | 27 +++++++++ docs/prd.md | 18 ++++++ docs/prd/epic-1-foundation.md | 76 +++++++++++++++++++++++++ docs/prd/epic-2-content-generation.md | 56 ++++++++++++++++++ docs/prd/epic-3-pre-deployment.md | 37 ++++++++++++ docs/prd/epic-4-deployment.md | 23 ++++++++ docs/prd/functional-requirements.md | 31 ++++++++++ docs/prd/goals-and-context.md | 17 ++++++ docs/prd/non-functional-requirements.md | 22 +++++++ docs/prd/technical-assumptions.md | 23 ++++++++ 18 files changed, 641 insertions(+) create mode 100644 docs/architecture.md create mode 100644 docs/architecture/components.md create mode 100644 docs/architecture/data-models.md create mode 100644 docs/architecture/error-handling.md create mode 100644 docs/architecture/overview.md create mode 100644 docs/architecture/source-tree.md create mode 100644 docs/architecture/tech-stack.md create mode 100644 docs/architecture/testing-strategy.md create mode 100644 docs/architecture/workflows.md create mode 100644 docs/prd.md create mode 100644 docs/prd/epic-1-foundation.md create mode 100644 docs/prd/epic-2-content-generation.md create mode 100644 docs/prd/epic-3-pre-deployment.md create mode 100644 docs/prd/epic-4-deployment.md create mode 100644 docs/prd/functional-requirements.md create mode 100644 docs/prd/goals-and-context.md create mode 100644 docs/prd/non-functional-requirements.md create mode 100644 docs/prd/technical-assumptions.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..bc90eb1 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,23 @@ +# Content Automation & Syndication Platform Architecture Document + +> **Note**: This document has been sharded into focused sections for better maintainability. See the [Documentation Index](index.md) for navigation. + +## Quick Navigation + +- [Architecture Overview](architecture/overview.md) +- [Technology Stack](architecture/tech-stack.md) +- [Data Models](architecture/data-models.md) +- [System Components](architecture/components.md) +- [Core Workflows](architecture/workflows.md) +- [Source Tree Structure](architecture/source-tree.md) +- [Testing Strategy](architecture/testing-strategy.md) +- [Error Handling Strategy](architecture/error-handling.md) + +## Change Log + +*To be maintained as the project evolves* + +## Next Steps + +This architecture provides a solid foundation. The next logical step is to begin development, following the epics outlined in the PRD. I recommend starting with Epic 1: Foundation & Core Services. This will establish the project structure, database connectivity, and authentication systems that all other features depend on. The development agent should use this document as the primary technical guide for implementation. + diff --git a/docs/architecture/components.md b/docs/architecture/components.md new file mode 100644 index 0000000..fc2c2f8 --- /dev/null +++ b/docs/architecture/components.md @@ -0,0 +1,35 @@ +# System Components + +The application will be structured into the following distinct modules within the source directory. + +## Core Modules + +### cli +Contains all Click command definitions. This is the user-facing entry point. It orchestrates calls to other modules but contains no business logic itself. + +### core +Handles application-wide concerns like configuration loading (master.json, .env), logging setup, and core application state. + +### database +Defines the SQLAlchemy models, repository interfaces (ABCs), and concrete repository implementations for SQLite. Manages the database session. + +### auth +Manages user authentication, password hashing, and role-based access control logic. Used by both the CLI and the API. + +### ingestion +Responsible for parsing the CORA .xlsx files and creating new Project entries in the database. + +### generation +Interacts with the AI service API. It takes project data, constructs prompts, and retrieves the generated text. Includes the Content Rule Engine for validation. + +### templating +Takes raw generated text and applies the appropriate HTML/CSS template based on the project's configuration. + +### interlinking +Contains the logic for generating the batch URL map and injecting the "wheel" of links into the generated HTML. + +### deployment +Implements the Strategy Pattern for multi-cloud deployments. Contains a base DeploymentStrategy and concrete classes for each provider (AWS S3, Azure Blob, etc.). + +### api +The FastAPI application. Defines API models (using Pydantic), routes, and dependencies. It will use the auth and database modules to service requests. diff --git a/docs/architecture/data-models.md b/docs/architecture/data-models.md new file mode 100644 index 0000000..5f3a53c --- /dev/null +++ b/docs/architecture/data-models.md @@ -0,0 +1,56 @@ +# Data Models + +The following data models will be implemented using SQLAlchemy. + +## 1. User + +**Purpose**: Stores user credentials and role information. + +**Key Attributes**: +- `id`: Integer, Primary Key +- `username`: String, Unique, Not Null +- `hashed_password`: String, Not Null +- `role`: String, Not Null ("Admin" or "User") + +**Relationships**: A User can have many Projects. + +## 2. Project + +**Purpose**: Represents a single content generation job initiated from a CORA report. + +**Key Attributes**: +- `id`: Integer, Primary Key +- `user_id`: Integer, Foreign Key to User +- `project_name`: String, Not Null +- `cora_data`: JSON (stores extracted keywords, entities, etc.) +- `status`: String (e.g., "Pending", "Generating", "Complete") + +**Relationships**: A Project belongs to one User and has many GeneratedContents. + +## 3. GeneratedContent + +**Purpose**: Stores the AI-generated content and its final deployed state. + +**Key Attributes**: +- `id`: Integer, Primary Key +- `project_id`: Integer, Foreign Key to Project +- `title`: Text +- `outline`: Text +- `body_text`: Text +- `final_html`: Text +- `deployed_url`: String, Unique +- `tier`: String (for link classification) + +**Relationships**: Belongs to one Project. + +## 4. FqdnMapping + +**Purpose**: Maps cloud storage buckets to fully qualified domain names for URL generation. + +**Key Attributes**: +- `id`: Integer, Primary Key +- `bucket_name`: String, Not Null +- `provider`: String, Not Null (e.g., "aws", "bunny", "azure") +- `fqdn`: String, Not Null + +**Relationships**: None. diff --git a/docs/architecture/error-handling.md b/docs/architecture/error-handling.md new file mode 100644 index 0000000..d0d340b --- /dev/null +++ b/docs/architecture/error-handling.md @@ -0,0 +1,48 @@ +# Error Handling Strategy + +This section defines a unified strategy for error handling and logging across the application. The primary goals are to ensure that errors are handled gracefully, and that logs are structured, informative, and actionable. + +## General Approach + +The application will use a centralized logging configuration initialized at startup. All logging will be structured (JSON format) to facilitate machine parsing and analysis, especially in preparation for future integration with log aggregation services (e.g., AWS CloudWatch, Datadog). + +## Logging Standards + +- **Library**: Python's built-in logging module will be used. It is powerful, flexible, and requires no external dependencies. +- **Format**: All logs will be output as a single-line JSON object. This ensures consistency and makes them easy to parse and query. + +### Example Log Format +```json +{ + "timestamp": "2025-10-18T03:07:43Z", + "level": "INFO", + "message": "Content generation for project 123 complete.", + "module": "src.generation.service", + "context": { + "correlation_id": "job-run-xyz-789", + "project_id": 123, + "user_id": 1 + } +} +``` + +### Log Levels +- **DEBUG**: Detailed information, typically of interest only when diagnosing problems. (e.g., API request bodies, specific function inputs). +- **INFO**: Confirmation that things are working as expected. (e.g., "Job run started", "File deployed successfully"). +- **WARNING**: An indication that something unexpected happened, or indicative of some problem in the near future (e.g., 'disk space low'). The software is still working as expected. (e.g., "Retrying API call, 2 of 3"). +- **ERROR**: Due to a more serious problem, the software has not been able to perform some function. This is for handled exceptions. The error should be logged with a full stack trace. +- **CRITICAL**: A serious error, indicating that the program itself may be unable to continue running. + +### Required Context +To make logs useful, the following context must be included wherever possible: + +- **Correlation ID**: + - For CLI jobs, a unique ID will be generated at the start of each job run and passed through all subsequent function calls. + - For the Internal API, a FastAPI middleware will generate a unique request ID and attach it to every log message within that request's lifecycle. +- **Service Context**: The logger configuration will automatically include the module and function name (`%(name)s`, `%(funcName)s`) where the log was emitted. +- **User Context**: For any authenticated action (CLI or API), the user_id must be included in the log's context. + +## Error Handling Patterns + +- **Exception Handling**: All expected errors (e.g., file not found, API connection error) will be caught in try...except blocks. When an exception is caught and handled, it will be logged at the ERROR level, including the full stack trace. +- **API Errors**: The FastAPI application will use a global exception handler to catch any unhandled errors, log them with a CRITICAL level, and return a standardized 500 Internal Server Error response to the client, ensuring no internal details or stack traces are leaked. diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md new file mode 100644 index 0000000..5c58ad7 --- /dev/null +++ b/docs/architecture/overview.md @@ -0,0 +1,58 @@ +# Architecture Overview + +## Introduction + +This document outlines the overall project architecture for the Content Automation & Syndication Platform, based on the requirements detailed in the PRD v1.0. Its primary goal is to serve as the guiding architectural blueprint for AI-driven development, ensuring consistency and adherence to chosen patterns and technologies. The architecture is designed as a Python-based monolithic application with a Command-Line Interface (CLI), focusing on modularity to support future scalability, including an eventual migration from SQLite to PostgreSQL. + +**Starter Template or Existing Project**: N/A - This is a greenfield project to be built from scratch. + +## Change Log + +*To be maintained as the project evolves* + +## High-Level Architecture + +### Technical Summary + +The system is designed as a modular monolith written in Python, operating primarily through a command-line interface. The core architecture emphasizes a clean separation of concerns, with distinct modules for data ingestion, AI content generation, HTML templating, interlinking, and multi-cloud deployment. A crucial design principle is the implementation of a database-agnostic Repository Pattern to abstract data access, ensuring the initial SQLite database can be seamlessly replaced with a more robust solution like PostgreSQL in the future. An internal REST API, built with FastAPI, will facilitate communication with other internal systems, starting with the link-building machine. + +### High-Level Project Diagram + +```mermaid +graph TD + subgraph User Interaction + User(User/Admin) -- Runs CLI Commands --> CLI(CLI Interface) + end + + subgraph "Python Application (Monolith)" + CLI -- Invokes --> CoreModules(Core Modules) + CoreModules --> D(Database Module) + CoreModules --> AI(AI Generation Module) + CoreModules --> T(Templating Module) + CoreModules --> IL(Interlinking Module) + CoreModules --> DP(Deployment Module) + CoreModules --> API(Internal REST API) + end + + subgraph "Data & Configuration" + D -- "Reads/Writes" --> DB[(SQLite Database)] + CoreModules -- Reads --> ENV(Secrets /.env) + CoreModules -- Reads --> JSON(Config /master.json) + end + + subgraph "External Services" + AI -- "Sends Prompts / Gets Content" --> AIService(AI Service API) + DP -- "Deploys HTML" --> Cloud(Multi-Cloud Storage) + API -- "Sends Job Data" --> LBM(Link-Building Machine) + end + + style User fill:#d3f3d3 + style LBM fill:#f3e5d3 +``` + +## Architectural and Design Patterns + +- **Modular Monolith**: The application will be a single deployable unit, but its internal structure will be divided into loosely-coupled modules with well-defined responsibilities. This provides the simplicity of a monolith with the organizational benefits of microservices. +- **Repository Pattern**: This is a non-negotiable requirement from the PRD. We will create a data access layer that separates business logic from data persistence details. This involves defining repository interfaces (abstract base classes) and concrete implementations for SQLite, making a future switch to PostgreSQL trivial. +- **CLI Command Pattern**: We will use a library like Click or Typer to structure the command-line interface, providing clear, self-documenting commands for all user interactions (e.g., content-tool user add, content-tool job run). +- **Strategy Pattern**: For the multi-cloud deployment module, we will use the Strategy Pattern. A unified DeploymentStrategy interface will be defined, with concrete implementations for each cloud provider (S3, Azure Blob, Bunny.net, etc.). This makes adding new providers straightforward. diff --git a/docs/architecture/source-tree.md b/docs/architecture/source-tree.md new file mode 100644 index 0000000..35f7bbc --- /dev/null +++ b/docs/architecture/source-tree.md @@ -0,0 +1,59 @@ +# Source Tree Structure + +The project will follow a standard Python monorepo structure to organize the different modules. + +``` +content-automation-platform/ +├── .env.example # Example environment variables +├── .gitignore +├── master.config.json # Master configuration file +├── requirements.txt # Python dependencies +├── README.md +├── main.py # Main CLI entry point +│ +└── src/ + ├── __init__.py + ├── api/ + │ ├── __init__.py + │ ├── main.py # FastAPI app instance + │ ├── routes.py # API endpoint definitions + │ └── schemas.py # Pydantic models for API + ├── auth/ + │ ├── __init__.py + │ └── service.py # Hashing, validation, role checks + ├── cli/ + │ ├── __init__.py + │ └── commands.py # Click command groups + ├── core/ + │ ├── __init__.py + │ └── config.py # Config loading and validation + ├── database/ + │ ├── __init__.py + │ ├── models.py # SQLAlchemy models + │ ├── repositories.py # Concrete repository implementations + │ └── interfaces.py # Abstract repository interfaces + ├── deployment/ + │ ├── __init__.py + │ ├── strategies/ # Folder for individual cloud strategies + │ └── manager.py # Manages which strategy to use + ├── generation/ + │ ├── __init__.py + │ ├── service.py # AI API interaction + │ └── rule_engine.py # Content validation rules + ├── ingestion/ + │ ├── __init__.py + │ └── parser.py # CORA .xlsx file parsing + ├── interlinking/ + │ ├── __init__.py + │ └── service.py # Link map generation and injection + ├── templating/ + │ ├── __init__.py + │ ├── service.py # Applies templates to content + │ └── templates/ # Directory for HTML/CSS templates + │ +└── tests/ + ├── __init__.py + ├── conftest.py # Pytest fixtures + ├── unit/ # Unit tests for individual modules + └── integration/ # Integration tests for workflows +``` diff --git a/docs/architecture/tech-stack.md b/docs/architecture/tech-stack.md new file mode 100644 index 0000000..d13533e --- /dev/null +++ b/docs/architecture/tech-stack.md @@ -0,0 +1,19 @@ +# Technology Stack + +This table represents the definitive technology selection. All development must adhere to these choices and versions to ensure consistency. + +| Component | Technology | Version | Purpose | +|-----------|------------|---------|---------| +| Language | Python | 3.11+ | Core application development | +| Web Framework | FastAPI | Latest | Internal REST API | +| CLI Framework | Click/Typer | Latest | Command-line interface | +| Database ORM | SQLAlchemy | Latest | Database abstraction and models | +| Database | SQLite | Built-in | Initial database (MVP) | +| Authentication | Passlib | Latest | Password hashing | +| Cloud - AWS | Boto3 | Latest | S3 integration | +| Cloud - Azure | Azure SDK | Latest | Blob Storage integration | +| Cloud - Bunny | Custom/Requests | Latest | Bunny.net integration | +| Testing | Pytest | Latest | Unit and integration testing | +| Configuration | Pydantic | Latest | Configuration validation | +| Logging | Python logging | Built-in | Structured logging | +| Environment | python-dotenv | Latest | Environment variable management | diff --git a/docs/architecture/testing-strategy.md b/docs/architecture/testing-strategy.md new file mode 100644 index 0000000..11f724c --- /dev/null +++ b/docs/architecture/testing-strategy.md @@ -0,0 +1,13 @@ +# Test Strategy and Standards + +## Philosophy +We will follow a "test-after" approach for the MVP, focusing on robust unit and integration tests. The goal is to ensure each module functions correctly in isolation and that the core workflows operate end-to-end. + +## Unit Tests +Each module in `src/` will have a corresponding test file in `tests/unit/`. We will mock external dependencies like AI services and cloud provider APIs. Business logic (rule engine, interlinking calculations, data parsing) will have high unit test coverage. + +## Integration Tests +Located in `tests/integration/`, these tests will cover the interactions between our modules. For example, a test will verify that ingesting a file correctly creates database entries via the repository. We will use an in-memory SQLite database for these tests to ensure they are fast and isolated. For cloud services, we will use mocking libraries like moto for S3 to test the deployment logic without actual network calls. + +## Execution +All tests will be run via pytest. diff --git a/docs/architecture/workflows.md b/docs/architecture/workflows.md new file mode 100644 index 0000000..4fac485 --- /dev/null +++ b/docs/architecture/workflows.md @@ -0,0 +1,27 @@ +# Core Workflows + +This sequence diagram illustrates the primary workflow for a single content generation job. + +```mermaid +sequenceDiagram + participant User + participant CLI + participant Ingestion + participant Generation + participant Interlinking + participant Deployment + participant API + + User->>CLI: run job --file report.xlsx + CLI->>Ingestion: process_cora_file("report.xlsx") + Ingestion-->>CLI: project_id + CLI->>Generation: generate_content(project_id) + Generation-->>CLI: raw_html_list + CLI->>Interlinking: inject_links(raw_html_list) + Interlinking-->>CLI: final_html_list + CLI->>Deployment: deploy_batch(final_html_list) + Deployment-->>CLI: deployed_urls + CLI->>API: send_to_link_builder(job_data, deployed_urls) + API-->>CLI: success + CLI-->>User: Job Complete! URLs logged. +``` diff --git a/docs/prd.md b/docs/prd.md new file mode 100644 index 0000000..d18d3d2 --- /dev/null +++ b/docs/prd.md @@ -0,0 +1,18 @@ +# Content Automation & Syndication Platform Product Requirements Document (PRD) + +> **Note**: This document has been sharded into focused sections for better maintainability. See the [Documentation Index](index.md) for navigation. + +## Quick Navigation + +- [Goals and Background Context](prd/goals-and-context.md) +- [Functional Requirements](prd/functional-requirements.md) +- [Non-Functional Requirements](prd/non-functional-requirements.md) +- [Technical Assumptions](prd/technical-assumptions.md) +- [Epic 1: Foundation & Core Services](prd/epic-1-foundation.md) +- [Epic 2: Content Ingestion & Generation](prd/epic-2-content-generation.md) +- [Epic 3: Pre-Deployment, URL Generation & Interlinking](prd/epic-3-pre-deployment.md) +- [Epic 4: Cloud Deployment & Handoff](prd/epic-4-deployment.md) + +## Change Log + +*To be maintained as the project evolves* diff --git a/docs/prd/epic-1-foundation.md b/docs/prd/epic-1-foundation.md new file mode 100644 index 0000000..a091918 --- /dev/null +++ b/docs/prd/epic-1-foundation.md @@ -0,0 +1,76 @@ +# Epic 1: Foundation & Core Services + +## Epic Goal +Establish the project's foundational infrastructure, including the database, user authentication, the internal API, a system for managing custom domain mappings, and the initial CI/CD pipeline. + +## Stories + +### Story 1.1: Project Initialization & Configuration +**As a developer**, I want to set up the monorepo structure, initialize the Python project with core dependencies, and establish the .env and JSON configuration file handling, so that there is a stable foundation for all future development. + +**Acceptance Criteria** +- A monorepo directory structure is created. +- A primary Python application is initialized with necessary libraries (e.g., Flask/FastAPI, SQLAlchemy). +- A .env.example file is created with placeholders for all required secrets (database path, cloud API keys). +- The application can successfully load and parse a master JSON configuration file. +- The project is initialized as a Git repository with an initial commit. + +**User Tasks** +After this story is complete, copy .env.example to .env and populate it with the actual API keys and secrets for all required cloud services. + +### Story 1.2: Database Setup & User Model +**As a developer**, I want to implement the SQLite database connection and create the data model for Users using a database-agnostic ORM/data access layer, so that user information can be stored and managed securely. + +**Acceptance Criteria** +- The application successfully connects to a local SQLite database file specified in the .env file. +- A User model is created that includes fields for username and a hashed password. +- The data access layer (Repository Pattern) is implemented for the User model. +- A script can be run to create the initial database schema from the models. +- The User model correctly defines "Admin" and "User" roles. + +### Story 1.3: User Authentication System +**As an Admin**, I want a secure system for user login, so that only authenticated users can access the application's functions. + +**Acceptance Criteria** +- A function exists to securely hash and store user passwords. +- An authentication mechanism is implemented that validates a given username and password against the stored hash. +- Upon successful login, the system can identify the user's role (Admin or User). +- Authentication fails for incorrect credentials. + +### Story 1.4: Internal API Foundation +**As a developer**, I want to create a basic, secured REST API endpoint, so that the foundation for inter-service communication is established. + +**Acceptance Criteria** +- A basic REST API (using Flask/FastAPI) is created within the application. +- A simple health check endpoint (e.g., /health) is available and returns a 200 OK status. +- API endpoints require authentication to be accessed. +- The API is designed to be extensible for future internal use. + +### Story 1.5: Command-Line User Management +**As an Admin**, I want command-line tools to add and remove users, so that I can manage system access for the MVP. + +**Acceptance Criteria** +- A CLI command exists to create a new user with a specified username, password, and role (Admin or User). +- A CLI command exists to delete a user by their username. +- The commands can only be run by an authenticated "Admin". +- Appropriate feedback is provided to the console upon success or failure of the commands. + +### Story 1.6: FQDN Bucket Mapping Management +**As an Admin**, I want a database table and CLI commands to map cloud bucket names to custom FQDNs, so that the system can generate correct public URLs for interlinking. + +**Acceptance Criteria** +- A new database table, fqdn_mappings, is created with columns for bucket_name, provider, and fqdn. +- A CLI command exists for an Admin to add a new mapping (e.g., add-fqdn-mapping --bucket my-bunny-bucket --provider bunny --fqdn www.mycustomdomain.com). +- A CLI command exists for an Admin to remove a mapping. +- A CLI command exists for an Admin to list all current mappings. +- The commands are protected and can only be run by an authenticated "Admin". + +### Story 1.7: CI/CD Pipeline Setup +**As a developer**, I want a basic CI/CD pipeline configured for the project, so that code changes are automatically tested and can be deployed consistently. + +**Acceptance Criteria** +- A CI/CD configuration file (e.g., .github/workflows/ci.yml) is created in the repository. +- The pipeline automatically triggers on pushes or pull requests to the main branch. +- A pipeline stage successfully installs all project dependencies from requirements.txt. +- A subsequent stage runs all unit and integration tests. +- The pipeline accurately reports a success or failure status based on the test results. diff --git a/docs/prd/epic-2-content-generation.md b/docs/prd/epic-2-content-generation.md new file mode 100644 index 0000000..fc1f710 --- /dev/null +++ b/docs/prd/epic-2-content-generation.md @@ -0,0 +1,56 @@ +# Epic 2: Content Ingestion & Generation + +## Epic Goal +Implement the core workflow for ingesting CORA data and using AI to generate and format content into HTML that adheres to specific quality and SEO standards. + +## Stories + +### Story 2.1: CORA Report Data Ingestion +**As a User**, I want to run a script that ingests a CORA .xlsx file, so that a new project is created in the database with the necessary SEO data, including keywords, entities, related searches, and optional anchor text overrides. + +**Acceptance Criteria** +- A CLI command is available to accept the path to a CORA .xlsx file. +- The script correctly extracts the specified data points from the spreadsheet (main keyword, entities, related searches, etc.). +- The script must also check for and store any optional, explicitly defined anchor text provided in the spreadsheet. +- A new project record is created in the database, associated with the authenticated user. +- The extracted SEO data is stored correctly in the new project record. +- The script handles errors gracefully if the file is not found or is in an incorrect format. + +### Story 2.2: Configurable Content Rule Engine +**As an Admin**, I want to define specific content structure and quality rules in the master configuration, so that all AI-generated content consistently meets my SEO and quality standards. + +**Acceptance Criteria** +- The system must load a "content_rules" object from the master JSON configuration file. +- The rule engine must validate that the

tag contains the main keyword from the project's data. +- The engine must validate that at least one

tag starts with the main keyword. +- The engine must validate that other

tags incorporate entities and related searches from the project's data. +- The engine must validate that at least one

tag starts with the main keyword, and that others contain a mix of the keyword, entities, and related searches. +- The engine must validate a dedicated FAQ section where each question is an

tag. +- The engine must enforce that the answer text for each FAQ

begins by restating the question. +- For any AI-generated images, the engine must validate that the alt text contains the main keyword and associated entities. +- For interlinks, the engine must use the explicitly provided anchor text from the project data if it exists. +- If no explicit anchor text is provided, the engine must generate a default anchor text using a combination of the linked article's main keyword, entities, and related searches. +- The anchor text for the link to the home page must be the custom FQDN if one is mapped; otherwise, it must be the main keyword of the site/bucket. +- The anchor text for the link to the existing random article should be that article's main keyword. + +### Story 2.3: AI-Powered Content Generation +**As a User**, I want to execute a job for a project that uses AI to generate a title, an outline, and full-text content, so that the core content is created automatically. + +**Acceptance Criteria** +- A script can be initiated for a specific project ID. +- The script uses the project's SEO data to prompt an AI model for a title, an outline, and the main body content. +- The content generation process must apply and validate against the rules defined and loaded by the content rule engine (Story 2.2). +- The generated title, outline, and text are stored and associated with the project in the database. +- The process logs its progress (e.g., "Generating title...", "Generating content..."). +- The script can handle potential API errors from the AI service. + +### Story 2.4: HTML Formatting with Multiple Templates +**As a developer**, I want a module that takes the generated text content and formats it into a standard HTML file using one of a few predefined CSS templates, assigning one template per bucket/subdomain, so that all deployed content has a consistent look and feel per site. + +**Acceptance Criteria** +- A directory of multiple, predefined HTML/CSS templates exists. +- The master JSON configuration file maps a specific template to each deployment target (e.g., S3 bucket, subdomain). +- A function accepts the generated content and a target identifier (e.g., bucket name). +- The function correctly selects and applies the appropriate template based on the configuration mapping. +- The content is structured into a valid HTML document with the selected CSS. +- The final HTML content is stored and associated with the project in the database. diff --git a/docs/prd/epic-3-pre-deployment.md b/docs/prd/epic-3-pre-deployment.md new file mode 100644 index 0000000..5e82c03 --- /dev/null +++ b/docs/prd/epic-3-pre-deployment.md @@ -0,0 +1,37 @@ +# Epic 3: Pre-Deployment, URL Generation & Interlinking + +## Epic Goal +To validate cloud storage targets, pre-calculate all final content URLs for a batch, and inject the required interlinks into the generated HTML content before deployment. + +## Stories + +### Story 3.1: Cloud Bucket Validation and Creation +**As a developer**, I want a script that can check if a cloud storage bucket exists and create it if it doesn't, so that I can guarantee a valid deployment target before generating final URLs. + +**Acceptance Criteria** +- The script accepts a target bucket name and cloud provider. +- It first checks if the bucket already exists and is accessible with our credentials. +- If the bucket does not exist, it attempts to create it. +- The script returns a success status and the bucket's base URL if the bucket is ready. +- The script returns a clear error and halts the process if the bucket name is taken or creation fails. + +### Story 3.2: Batch URL Generation and Mapping +**As a developer**, I want a module that generates the complete list of final URLs for all new articles in a batch, so that I have a map of all links needed for the interlinking process. + +**Acceptance Criteria** +- The module takes a list of all generated article titles for a project batch. +- It generates a predictable filename for each article (e.g., from the title). +- Using the validated bucket base URL (from Story 3.1), it constructs the URL path for every new article. +- When constructing the final URL, the module MUST first check the fqdn_mappings table. If a mapping exists for the target bucket, the custom FQDN is used as the base URL. Otherwise, the default provider base URL is used. +- The module queries the target to find the URL of one random existing article (if any exist). +- The module identifies the URL for the bucket's home page (index file). +- It returns a complete "link map" object containing all new URLs, the existing article URL, and the home page URL. + +### Story 3.3: Content Interlinking Injection +**As a User**, I want the system to automatically insert a "wheel" of links into each new article, so that all content in the batch is interconnected for SEO purposes. + +**Acceptance Criteria** +- A script takes the generated HTML content (from Epic 2) and the "link map" (from Story 3.2). +- For each article, it correctly injects links to the next and previous articles in the batch, creating a "wheel" (e.g., Article 1 links to 2, 2 links to 1 & 3, 3 links to 2 & 4...). +- Each article must also contain a link to the bucket's home page and the randomly selected existing article URL from the link map. +- The script produces the final, interlinked HTML content, ready for deployment. diff --git a/docs/prd/epic-4-deployment.md b/docs/prd/epic-4-deployment.md new file mode 100644 index 0000000..2e02be5 --- /dev/null +++ b/docs/prd/epic-4-deployment.md @@ -0,0 +1,23 @@ +# Epic 4: Cloud Deployment & Handoff + +## Epic Goal +To deploy the finalized, interlinked HTML files to their cloud targets, log the results, and hand off the data to the link-building machine. + +## Stories + +### Story 4.1: Deploy Finalized HTML Content +**As a User**, I want to deploy the fully interlinked HTML files to their pre-validated cloud storage locations, so that the content goes live. + +**Acceptance Criteria** +- A script takes the final, interlinked HTML content for an article and its corresponding final URL. +- It uses the unified cloud deployment module to upload the content to the correct location. +- The deployment is verified as successful. + +### Story 4.2: URL Logging & API Handoff +**As a developer**, I want to log all the pre-determined URLs to the database and transmit the job data via the internal API, so that the workflow is completed and tracked. + +**Acceptance Criteria** +- After successful deployment, the list of pre-determined public URLs from the "link map" is saved to the database. +- The content Tier is correctly recorded for each URL. +- The URLs are appended to the local .txt file. +- The necessary job data, including the list of new URLs, is successfully transmitted to the link-building machine's API endpoint. diff --git a/docs/prd/functional-requirements.md b/docs/prd/functional-requirements.md new file mode 100644 index 0000000..29e3a37 --- /dev/null +++ b/docs/prd/functional-requirements.md @@ -0,0 +1,31 @@ +# Functional Requirements + +## FR1: CORA Data Ingestion +The system must ingest and extract specified data from a CORA .xlsx file to create a new project entry in the database. + +## FR2: AI Content Generation +The system shall use AI to generate titles, outlines, and full-text content based on the ingested data. + +## FR3: HTML Formatting +The system must format the generated text content into a standardized HTML/CSS template. + +## FR4: Multi-Cloud Deployment +A module will automatically deploy the generated HTML files to multiple cloud storage providers, including AWS S3, Bunny.net, Azure Blob Storage, Digital Ocean Spaces, Linode Object Storage, Backblaze B2, and Cloudflare Pages. + +## FR5: Interlinking Module +An interlinking module must be able to query a target subdomain, identify existing content, and inject new interlinks into the HTML. + +## FR6: URL Logging +All generated URLs must be logged to both the central SQLite database and a local .txt file for tracking. The database record must include the content Tier (e.g., distinguishing links to other owned properties). + +## FR7: Internal API +An internal API endpoint must be available to transmit job data to the separate link-building machine. + +## FR8: User Authentication +The system will include a basic user authentication system (username/password) with each user having one of two roles: "Admin" and "User". + +## FR9: Admin User Management +Users with the "Admin" role can manage users (add/remove) and edit the master JSON configuration files. + +## FR10: User Content Management +Users with the "User" role can execute content generation jobs and have the ability to edit the generated content. diff --git a/docs/prd/goals-and-context.md b/docs/prd/goals-and-context.md new file mode 100644 index 0000000..54ccac3 --- /dev/null +++ b/docs/prd/goals-and-context.md @@ -0,0 +1,17 @@ +# Goals and Background Context + +## Goals + +- Reduce the time spent on Tier 1 content creation and deployment by at least 80%. +- Increase the volume of content that can be produced weekly. +- Ensure a consistent, repeatable process for content syndication. +- Enable successful generation and deployment of a content batch with a single command. +- Provide easy tracking for all generated URLs for any given project. + +## Background Context + +The current content creation workflow is manual, slow, and error-prone, limiting the scale of content strategy execution. The proposed solution is a Python-based application that automates the entire process, from ingesting SEO data and generating AI content to deploying the final HTML to multiple cloud providers and logging the results. This automated system will significantly increase efficiency, consistency, and output capacity. + +## Change Log + +*To be maintained as the project evolves* diff --git a/docs/prd/non-functional-requirements.md b/docs/prd/non-functional-requirements.md new file mode 100644 index 0000000..847c54a --- /dev/null +++ b/docs/prd/non-functional-requirements.md @@ -0,0 +1,22 @@ +# Non-Functional Requirements + +## NFR1: Python Development +The application and all scripts must be developed in Python. + +## NFR2: SQLite Database +The initial database must be SQLite. + +## NFR3: Database-Agnostic Architecture +The data access layer must be architected (e.g., using the Repository Pattern) to be database-agnostic, simplifying a future migration to a more scalable database like PostgreSQL. + +## NFR4: Environment Variables +All secrets, credentials, and API keys must be stored in and accessed from a .env file, not hardcoded. + +## NFR5: Command-Line Interface +The MVP will be a set of command-line tools; a web-based user interface is explicitly out of scope. + +## NFR6: Backup Protection +The interlinking script must create backups of any existing HTML files before modifying them to prevent data loss. + +## NFR7: Cloud Integration Libraries +Cloud integrations will use specific libraries: Boto3 for AWS S3 and the Azure SDK for Blob Storage. diff --git a/docs/prd/technical-assumptions.md b/docs/prd/technical-assumptions.md new file mode 100644 index 0000000..348d6d4 --- /dev/null +++ b/docs/prd/technical-assumptions.md @@ -0,0 +1,23 @@ +# Technical Assumptions + +## Repository Structure: Monorepo + +The project will use a Monorepo to simplify dependency management and code sharing between modules. + +## Service Architecture + +The application will be built as a single Python application (Monolith) with distinct modules. It will expose an internal REST API (using Flask or FastAPI) for communication. While initially consumed only by the link-building machine, this API should be designed with standard practices to allow for future consumption by other internal applications. + +## Testing Requirements + +The project will require both Unit tests for individual functions and Integration tests to ensure modules and cloud services work together correctly. + +## Additional Technical Assumptions and Requests + +- **Primary Language**: Python. +- **Database**: SQLite for the MVP. +- **Data Access**: A database-agnostic data access layer (Repository Pattern) must be used to facilitate a future migration to PostgreSQL. +- **Cloud Integration**: The system will use Boto3 for S3, the Azure SDK for Blob Storage, and refactor existing code for Bunny.net and other specified cloud providers. +- **Configuration**: A central JSON file will manage the workflow for each project. +- **Credentials**: All secrets and API keys will be stored in and accessed from a .env file. +- **API Design for Future Use**: The internal REST API must be designed with clean, well-documented endpoints and data contracts, anticipating potential future use by other internal systems on the network.