# Story 2.8: Simple Spreadsheet Ingestion ## Overview Implement a simplified spreadsheet ingestion path that allows users to quickly create projects from basic data without requiring a full CORA report. This addresses the need for faster project setup when a full CORA run (20-25 minutes) is unnecessary. ## Story Details **As a User**, I want to ingest a simple spreadsheet with minimal required data, so that I can quickly create a project for content generation without waiting for a full CORA analysis. ## Context A full CORA run takes 20-25 minutes and includes extensive metrics. Sometimes users only need to add information from a few cells they pasted into a spreadsheet. Eventually this will be entered via a webform, but for now a simpler spreadsheet format is needed. ## Acceptance Criteria ### 1. CLI Command to Ingest Simple Spreadsheets **Status:** PENDING A CLI command exists to accept simple .xlsx file paths: - Command: `ingest-simple` - Options: `--file`, `--name` (optional, overrides spreadsheet), `--money-site-url`, `--username`, `--password` - Requires user authentication (any authenticated user can create projects) - Returns success message with project details ### 2. Spreadsheet Format **Status:** PENDING The parser accepts a simple single-sheet spreadsheet format: - **First row**: Headers (column names) - **Second row**: Data values **Required columns:** - `main_keyword`: Single phrase keyword (e.g., "shaft machining") - `project_name`: Name for the project - `related_searches`: Comma-delimited list (e.g., "term1, term2, term3") - `entities`: Comma-delimited list (e.g., "entity1, entity2, entity3") **Optional columns:** - `word_count`: Integer (default: 1500) - `term_frequency`: Integer (default: 3) ### 3. Data Parsing **Status:** PENDING The parser correctly extracts and processes data: - Parses comma-delimited `related_searches` into array - Parses comma-delimited `entities` into array - Applies defaults for optional fields (word_count=1500, term_frequency=3) - Sets all structure metrics (title_exact_match, h1_exact, h2_total, etc.) to `None` - Validates required fields are present ### 4. Database Storage **Status:** PENDING Project records are created with all data: - User association (user_id foreign key) - Main keyword and project name - Word count and term frequency (with defaults) - Entities and related searches as JSON arrays - Structure metrics as `NULL` (not required for simple ingestion) - Money site URL (prompted if not provided) - Timestamps (created_at, updated_at) ### 5. Error Handling **Status:** PENDING Graceful error handling for: - File not found errors - Invalid Excel file format - Missing required columns (main_keyword, project_name) - Empty or invalid comma-delimited lists (treated as empty arrays) - Authentication failures - Database errors ## Implementation Details ### Files to Create/Modify #### 1. `src/ingestion/parser.py` - UPDATED Add `SimpleSpreadsheetParser` class: ```python class SimpleSpreadsheetParser: """Parser for simple single-sheet spreadsheets with basic project data""" def __init__(self, file_path: str) def _parse_comma_delimited(self, value: Any) -> List[str] def parse(self) -> Dict[str, Any] ``` **Key Features:** - Reads first sheet of workbook - First row as headers (case-insensitive) - Second row as data values - Parses comma-delimited strings into arrays - Applies defaults for optional fields - Returns data structure compatible with `ProjectRepository.create()` #### 2. `src/cli/commands.py` - UPDATED Add `ingest-simple` command: ```python @app.command() @click.option('--file', '-f', required=True) @click.option('--name', '-n', help='Override project_name from spreadsheet') @click.option('--money-site-url', '-m') @click.option('--username', '-u') @click.option('--password', '-p') def ingest_simple(...) ``` **Features:** - Authenticate user - Parse simple spreadsheet - Display parsed data summary - Prompt for money_site_url if not provided - Create project via ProjectRepository - Show success summary ### Data Model Uses existing `Project` model - no database changes required. Structure metrics will be `NULL` for simple ingestion projects. ### Spreadsheet Example **Simple Format:** | main_keyword | project_name | related_searches | entities | word_count | term_frequency | |-------------|--------------|------------------|----------|------------|----------------| | best coffee makers | Coffee Project | best espresso machines, coffee maker reviews, top coffee makers | coffee, espresso, brewing | 1500 | 3 | **Minimal Format (uses defaults):** | main_keyword | project_name | related_searches | entities | |-------------|--------------|------------------|----------| | shaft machining | Machining Project | CNC machining, precision machining | machining, lathe, milling | ## CLI Usage **Basic:** ```bash python main.py ingest-simple \ --file simple_project.xlsx \ --username admin \ --password pass ``` **With Overrides:** ```bash python main.py ingest-simple \ --file simple_project.xlsx \ --name "Custom Project Name" \ --money-site-url https://example.com \ --username admin \ --password pass ``` **Expected Output:** ``` Authenticated as: admin (Admin) Parsing simple spreadsheet: simple_project.xlsx Main Keyword: best coffee makers Project Name: Coffee Project Word Count: 1500 Term Frequency: 3 Entities: 3 Related Searches: 3 Entities: coffee, espresso, brewing Related Searches: best espresso machines, coffee maker reviews, top coffee makers Enter money site URL (required for tiered linking): https://moneysite.com Creating project: Coffee Project Money Site URL: https://moneysite.com Success: Project 'Coffee Project' created (ID: 1) Main Keyword: best coffee makers Money Site URL: https://moneysite.com Word Count: 1500 Term Frequency: 3 Entities: 3 Related Searches: 3 ``` ## Error Handling Examples **Missing Required Column:** ``` Error parsing spreadsheet: Required field 'main_keyword' not found ``` **Invalid File:** ``` Error parsing spreadsheet: Failed to open Excel file: [details] ``` **Empty Spreadsheet:** ``` Error parsing spreadsheet: No headers found in spreadsheet ``` ## Dependencies - Story 2.1 (CORA ingestion) - Reuses ProjectRepository and Project model - Existing authentication system - Existing database models ## Future Enhancements - Support for multiple projects per spreadsheet (multiple data rows) - CSV format support (in addition to Excel) - Web form interface (deferred to future story) - Validation of comma-delimited format with better error messages