6.5 KiB
Story 2.8: Simple Spreadsheet Ingestion
Overview
Implement a simplified spreadsheet ingestion path that allows users to quickly create projects from basic data without requiring a full CORA report. This addresses the need for faster project setup when a full CORA run (20-25 minutes) is unnecessary.
Story Details
As a User, I want to ingest a simple spreadsheet with minimal required data, so that I can quickly create a project for content generation without waiting for a full CORA analysis.
Context
A full CORA run takes 20-25 minutes and includes extensive metrics. Sometimes users only need to add information from a few cells they pasted into a spreadsheet. Eventually this will be entered via a webform, but for now a simpler spreadsheet format is needed.
Acceptance Criteria
1. CLI Command to Ingest Simple Spreadsheets
Status: PENDING
A CLI command exists to accept simple .xlsx file paths:
- Command:
ingest-simple - Options:
--file,--name(optional, overrides spreadsheet),--money-site-url,--username,--password - Requires user authentication (any authenticated user can create projects)
- Returns success message with project details
2. Spreadsheet Format
Status: PENDING
The parser accepts a simple single-sheet spreadsheet format:
- First row: Headers (column names)
- Second row: Data values
Required columns:
main_keyword: Single phrase keyword (e.g., "shaft machining")project_name: Name for the projectrelated_searches: Comma-delimited list (e.g., "term1, term2, term3")entities: Comma-delimited list (e.g., "entity1, entity2, entity3")
Optional columns:
word_count: Integer (default: 1500)term_frequency: Integer (default: 3)
3. Data Parsing
Status: PENDING
The parser correctly extracts and processes data:
- Parses comma-delimited
related_searchesinto array - Parses comma-delimited
entitiesinto array - Applies defaults for optional fields (word_count=1500, term_frequency=3)
- Sets all structure metrics (title_exact_match, h1_exact, h2_total, etc.) to
None - Validates required fields are present
4. Database Storage
Status: PENDING
Project records are created with all data:
- User association (user_id foreign key)
- Main keyword and project name
- Word count and term frequency (with defaults)
- Entities and related searches as JSON arrays
- Structure metrics as
NULL(not required for simple ingestion) - Money site URL (prompted if not provided)
- Timestamps (created_at, updated_at)
5. Error Handling
Status: PENDING
Graceful error handling for:
- File not found errors
- Invalid Excel file format
- Missing required columns (main_keyword, project_name)
- Empty or invalid comma-delimited lists (treated as empty arrays)
- Authentication failures
- Database errors
Implementation Details
Files to Create/Modify
1. src/ingestion/parser.py - UPDATED
Add SimpleSpreadsheetParser class:
class SimpleSpreadsheetParser:
"""Parser for simple single-sheet spreadsheets with basic project data"""
def __init__(self, file_path: str)
def _parse_comma_delimited(self, value: Any) -> List[str]
def parse(self) -> Dict[str, Any]
Key Features:
- Reads first sheet of workbook
- First row as headers (case-insensitive)
- Second row as data values
- Parses comma-delimited strings into arrays
- Applies defaults for optional fields
- Returns data structure compatible with
ProjectRepository.create()
2. src/cli/commands.py - UPDATED
Add ingest-simple command:
@app.command()
@click.option('--file', '-f', required=True)
@click.option('--name', '-n', help='Override project_name from spreadsheet')
@click.option('--money-site-url', '-m')
@click.option('--username', '-u')
@click.option('--password', '-p')
def ingest_simple(...)
Features:
- Authenticate user
- Parse simple spreadsheet
- Display parsed data summary
- Prompt for money_site_url if not provided
- Create project via ProjectRepository
- Show success summary
Data Model
Uses existing Project model - no database changes required. Structure metrics will be NULL for simple ingestion projects.
Spreadsheet Example
Simple Format:
| main_keyword | project_name | related_searches | entities | word_count | term_frequency |
|---|---|---|---|---|---|
| best coffee makers | Coffee Project | best espresso machines, coffee maker reviews, top coffee makers | coffee, espresso, brewing | 1500 | 3 |
Minimal Format (uses defaults):
| main_keyword | project_name | related_searches | entities |
|---|---|---|---|
| shaft machining | Machining Project | CNC machining, precision machining | machining, lathe, milling |
CLI Usage
Basic:
python main.py ingest-simple \
--file simple_project.xlsx \
--username admin \
--password pass
With Overrides:
python main.py ingest-simple \
--file simple_project.xlsx \
--name "Custom Project Name" \
--money-site-url https://example.com \
--username admin \
--password pass
Expected Output:
Authenticated as: admin (Admin)
Parsing simple spreadsheet: simple_project.xlsx
Main Keyword: best coffee makers
Project Name: Coffee Project
Word Count: 1500
Term Frequency: 3
Entities: 3
Related Searches: 3
Entities: coffee, espresso, brewing
Related Searches: best espresso machines, coffee maker reviews, top coffee makers
Enter money site URL (required for tiered linking): https://moneysite.com
Creating project: Coffee Project
Money Site URL: https://moneysite.com
Success: Project 'Coffee Project' created (ID: 1)
Main Keyword: best coffee makers
Money Site URL: https://moneysite.com
Word Count: 1500
Term Frequency: 3
Entities: 3
Related Searches: 3
Error Handling Examples
Missing Required Column:
Error parsing spreadsheet: Required field 'main_keyword' not found
Invalid File:
Error parsing spreadsheet: Failed to open Excel file: [details]
Empty Spreadsheet:
Error parsing spreadsheet: No headers found in spreadsheet
Dependencies
- Story 2.1 (CORA ingestion) - Reuses ProjectRepository and Project model
- Existing authentication system
- Existing database models
Future Enhancements
- Support for multiple projects per spreadsheet (multiple data rows)
- CSV format support (in addition to Excel)
- Web form interface (deferred to future story)
- Validation of comma-delimited format with better error messages