5.2 KiB

Raw Blame History

Story 2.2: Configurable Content Rule Engine

Overview

Implementation of a CORA-compliant content validation engine that ensures AI-generated HTML meets both universal quality standards and project-specific CORA targets.

Status

COMPLETED

Implementation Details

1. Database Changes

Added tier field to Project model (default=1, indexed)
Created migration script: scripts/add_tier_to_projects.py
Tier 1 = strictest validation (default)
Tier 2+ = warnings only for CORA target misses

2. Configuration Updates

File: master.config.json

Restructured content_rules with two validation levels:

Universal Rules (apply to all tiers, hard failures):

min_content_length: 1000 words minimum
max_content_length: 5000 words maximum
title_exact_match_required: Title must contain main keyword
h1_exact_match_required: H1 must contain main keyword
h2_exact_match_min: At least 1 H2 with main keyword
h3_exact_match_min: At least 1 H3 with main keyword
faq_section_required: Must include FAQ section
faq_question_restatement_required: FAQ answers restate questions
image_alt_text_keyword_required: Alt text must contain keyword
image_alt_text_entity_required: Alt text must contain entities

CORA Validation Config:

enabled: Toggle CORA validation on/off
tier_1_strict: Tier 1 fails on CORA target misses
tier_2_plus_warn_only: Tier 2+ only warns
round_averages_down: Round CORA averages down (e.g., 5.6 → 5)

3. Core Rule Engine

File: src/generation/rule_engine.py

Classes:

ValidationIssue: Single validation error or warning
ValidationResult: Complete validation result with errors/warnings
ContentHTMLParser: Extracts structure from HTML (H1/H2/H3/images/links/text)
ContentRuleEngine: Main validation engine

Key Features:

HTML parsing and element extraction
Keyword/entity counting with word boundary matching
Universal rule validation (hard failures)
CORA target validation (tier-aware)
FAQ section detection
Image alt text validation
Detailed error/warning reporting

4. Config System Updates

File: src/core/config.py

Added:

UniversalRulesConfig model
CORAValidationConfig model
Updated ContentRulesConfig to use nested structure
Added Config.get() method for dot notation access (e.g., config.get("content_rules.universal"))

5. Tests

File: tests/unit/test_rule_engine.py

21 comprehensive tests covering:

HTML parser functionality (6 tests)
ValidationResult class (4 tests)
Universal rules validation (6 tests)
CORA target validation (4 tests)
Fully compliant content (1 test)

All tests passing ✓

Usage Example

from src.generation.rule_engine import ContentRuleEngine
from src.core.config import get_config
from src.database.models import Project

# Initialize engine
config = get_config()
engine = ContentRuleEngine(config)

# Validate content
html_content = "<html>...</html>"
project = # ... load from database
result = engine.validate(html_content, project)

if result.passed:
    print("Content is valid!")
else:
    print(f"Errors: {len(result.errors)}")
    for error in result.errors:
        print(f"  - {error.message}")
    
    print(f"Warnings: {len(result.warnings)}")
    for warning in result.warnings:
        print(f"  - {warning.message}")

# Get detailed report
report = result.to_dict()

Validation Logic

Universal Rules (All Tiers)

Word Count: Content length between min/max bounds
Title: Must contain main keyword
H1: At least one H1 with main keyword
H2/H3 Minimums: Minimum keyword counts
FAQ: Must have FAQ section
Images: Alt text contains keyword + entities

CORA Targets (Tier-Aware)

For each CORA metric (h1_exact, h2_total, h2_entities, etc.):

Tier 1: FAIL if actual < target (rounded down)
Tier 2+: WARN if actual < target (but pass)

Keyword Matching

Case-insensitive
Word boundary detection (avoids partial matches)
Supports related searches and entities

Acceptance Criteria

✅ System loads "content_rules" from master JSON configuration ✅ Validates H1 tag contains main keyword ✅ Validates at least one H2 starts with main keyword ✅ Validates other H2s incorporate entities and related searches ✅ Validates H3 tags similarly to H2s ✅ Validates FAQ section format ✅ Validates image alt text contains keyword and entities ✅ Tier-based validation (strict for Tier 1, warnings for Tier 2+) ✅ Rounds CORA averages down as configured ✅ All tests passing (21/21)

Files Modified

src/database/models.py - Added tier field to Project
master.config.json - Restructured content_rules
src/core/config.py - Added config models and get() method
src/generation/rule_engine.py - Implemented validation engine
scripts/add_tier_to_projects.py - Database migration
tests/unit/test_rule_engine.py - Comprehensive test suite

Next Steps (Story 2.3)

The rule engine is ready to be integrated into Story 2.3 (AI-Powered Content Generation):

Story 2.3 will use this engine to validate AI-generated content
Can implement retry logic if validation fails
Engine provides detailed feedback for AI prompt refinement

5.2 KiB Raw Blame History