# Story 2.2: Configurable Content Rule Engine ## Overview Implementation of a CORA-compliant content validation engine that ensures AI-generated HTML meets both universal quality standards and project-specific CORA targets. ## Status **COMPLETED** ## Implementation Details ### 1. Database Changes - Added `tier` field to `Project` model (default=1, indexed) - Created migration script: `scripts/add_tier_to_projects.py` - Tier 1 = strictest validation (default) - Tier 2+ = warnings only for CORA target misses ### 2. Configuration Updates **File:** `master.config.json` Restructured `content_rules` with two validation levels: **Universal Rules** (apply to all tiers, hard failures): - `min_content_length`: 1000 words minimum - `max_content_length`: 5000 words maximum - `title_exact_match_required`: Title must contain main keyword - `h1_exact_match_required`: H1 must contain main keyword - `h2_exact_match_min`: At least 1 H2 with main keyword - `h3_exact_match_min`: At least 1 H3 with main keyword - `faq_section_required`: Must include FAQ section - `faq_question_restatement_required`: FAQ answers restate questions - `image_alt_text_keyword_required`: Alt text must contain keyword - `image_alt_text_entity_required`: Alt text must contain entities **CORA Validation Config**: - `enabled`: Toggle CORA validation on/off - `tier_1_strict`: Tier 1 fails on CORA target misses - `tier_2_plus_warn_only`: Tier 2+ only warns - `round_averages_down`: Round CORA averages down (e.g., 5.6 → 5) ### 3. Core Rule Engine **File:** `src/generation/rule_engine.py` **Classes:** - `ValidationIssue`: Single validation error or warning - `ValidationResult`: Complete validation result with errors/warnings - `ContentHTMLParser`: Extracts structure from HTML (H1/H2/H3/images/links/text) - `ContentRuleEngine`: Main validation engine **Key Features:** - HTML parsing and element extraction - Keyword/entity counting with word boundary matching - Universal rule validation (hard failures) - CORA target validation (tier-aware) - FAQ section detection - Image alt text validation - Detailed error/warning reporting ### 4. Config System Updates **File:** `src/core/config.py` Added: - `UniversalRulesConfig` model - `CORAValidationConfig` model - Updated `ContentRulesConfig` to use nested structure - Added `Config.get()` method for dot notation access (e.g., `config.get("content_rules.universal")`) ### 5. Tests **File:** `tests/unit/test_rule_engine.py` **21 comprehensive tests covering:** - HTML parser functionality (6 tests) - ValidationResult class (4 tests) - Universal rules validation (6 tests) - CORA target validation (4 tests) - Fully compliant content (1 test) **All tests passing ✓** ## Usage Example ```python from src.generation.rule_engine import ContentRuleEngine from src.core.config import get_config from src.database.models import Project # Initialize engine config = get_config() engine = ContentRuleEngine(config) # Validate content html_content = "..." project = # ... load from database result = engine.validate(html_content, project) if result.passed: print("Content is valid!") else: print(f"Errors: {len(result.errors)}") for error in result.errors: print(f" - {error.message}") print(f"Warnings: {len(result.warnings)}") for warning in result.warnings: print(f" - {warning.message}") # Get detailed report report = result.to_dict() ``` ## Validation Logic ### Universal Rules (All Tiers) 1. **Word Count**: Content length between min/max bounds 2. **Title**: Must contain main keyword 3. **H1**: At least one H1 with main keyword 4. **H2/H3 Minimums**: Minimum keyword counts 5. **FAQ**: Must have FAQ section 6. **Images**: Alt text contains keyword + entities ### CORA Targets (Tier-Aware) For each CORA metric (h1_exact, h2_total, h2_entities, etc.): - **Tier 1**: FAIL if actual < target (rounded down) - **Tier 2+**: WARN if actual < target (but pass) ### Keyword Matching - Case-insensitive - Word boundary detection (avoids partial matches) - Supports related searches and entities ## Acceptance Criteria ✅ System loads "content_rules" from master JSON configuration ✅ Validates H1 tag contains main keyword ✅ Validates at least one H2 starts with main keyword ✅ Validates other H2s incorporate entities and related searches ✅ Validates H3 tags similarly to H2s ✅ Validates FAQ section format ✅ Validates image alt text contains keyword and entities ✅ Tier-based validation (strict for Tier 1, warnings for Tier 2+) ✅ Rounds CORA averages down as configured ✅ All tests passing (21/21) ## Files Modified 1. `src/database/models.py` - Added tier field to Project 2. `master.config.json` - Restructured content_rules 3. `src/core/config.py` - Added config models and get() method 4. `src/generation/rule_engine.py` - Implemented validation engine 5. `scripts/add_tier_to_projects.py` - Database migration 6. `tests/unit/test_rule_engine.py` - Comprehensive test suite ## Next Steps (Story 2.3) The rule engine is ready to be integrated into Story 2.3 (AI-Powered Content Generation): - Story 2.3 will use this engine to validate AI-generated content - Can implement retry logic if validation fails - Engine provides detailed feedback for AI prompt refinement