5.2 KiB
Story 2.2: Configurable Content Rule Engine
Overview
Implementation of a CORA-compliant content validation engine that ensures AI-generated HTML meets both universal quality standards and project-specific CORA targets.
Status
COMPLETED
Implementation Details
1. Database Changes
- Added
tierfield toProjectmodel (default=1, indexed) - Created migration script:
scripts/add_tier_to_projects.py - Tier 1 = strictest validation (default)
- Tier 2+ = warnings only for CORA target misses
2. Configuration Updates
File: master.config.json
Restructured content_rules with two validation levels:
Universal Rules (apply to all tiers, hard failures):
min_content_length: 1000 words minimummax_content_length: 5000 words maximumtitle_exact_match_required: Title must contain main keywordh1_exact_match_required: H1 must contain main keywordh2_exact_match_min: At least 1 H2 with main keywordh3_exact_match_min: At least 1 H3 with main keywordfaq_section_required: Must include FAQ sectionfaq_question_restatement_required: FAQ answers restate questionsimage_alt_text_keyword_required: Alt text must contain keywordimage_alt_text_entity_required: Alt text must contain entities
CORA Validation Config:
enabled: Toggle CORA validation on/offtier_1_strict: Tier 1 fails on CORA target missestier_2_plus_warn_only: Tier 2+ only warnsround_averages_down: Round CORA averages down (e.g., 5.6 → 5)
3. Core Rule Engine
File: src/generation/rule_engine.py
Classes:
ValidationIssue: Single validation error or warningValidationResult: Complete validation result with errors/warningsContentHTMLParser: Extracts structure from HTML (H1/H2/H3/images/links/text)ContentRuleEngine: Main validation engine
Key Features:
- HTML parsing and element extraction
- Keyword/entity counting with word boundary matching
- Universal rule validation (hard failures)
- CORA target validation (tier-aware)
- FAQ section detection
- Image alt text validation
- Detailed error/warning reporting
4. Config System Updates
File: src/core/config.py
Added:
UniversalRulesConfigmodelCORAValidationConfigmodel- Updated
ContentRulesConfigto use nested structure - Added
Config.get()method for dot notation access (e.g.,config.get("content_rules.universal"))
5. Tests
File: tests/unit/test_rule_engine.py
21 comprehensive tests covering:
- HTML parser functionality (6 tests)
- ValidationResult class (4 tests)
- Universal rules validation (6 tests)
- CORA target validation (4 tests)
- Fully compliant content (1 test)
All tests passing ✓
Usage Example
from src.generation.rule_engine import ContentRuleEngine
from src.core.config import get_config
from src.database.models import Project
# Initialize engine
config = get_config()
engine = ContentRuleEngine(config)
# Validate content
html_content = "<html>...</html>"
project = # ... load from database
result = engine.validate(html_content, project)
if result.passed:
print("Content is valid!")
else:
print(f"Errors: {len(result.errors)}")
for error in result.errors:
print(f" - {error.message}")
print(f"Warnings: {len(result.warnings)}")
for warning in result.warnings:
print(f" - {warning.message}")
# Get detailed report
report = result.to_dict()
Validation Logic
Universal Rules (All Tiers)
- Word Count: Content length between min/max bounds
- Title: Must contain main keyword
- H1: At least one H1 with main keyword
- H2/H3 Minimums: Minimum keyword counts
- FAQ: Must have FAQ section
- Images: Alt text contains keyword + entities
CORA Targets (Tier-Aware)
For each CORA metric (h1_exact, h2_total, h2_entities, etc.):
- Tier 1: FAIL if actual < target (rounded down)
- Tier 2+: WARN if actual < target (but pass)
Keyword Matching
- Case-insensitive
- Word boundary detection (avoids partial matches)
- Supports related searches and entities
Acceptance Criteria
✅ System loads "content_rules" from master JSON configuration ✅ Validates H1 tag contains main keyword ✅ Validates at least one H2 starts with main keyword ✅ Validates other H2s incorporate entities and related searches ✅ Validates H3 tags similarly to H2s ✅ Validates FAQ section format ✅ Validates image alt text contains keyword and entities ✅ Tier-based validation (strict for Tier 1, warnings for Tier 2+) ✅ Rounds CORA averages down as configured ✅ All tests passing (21/21)
Files Modified
src/database/models.py- Added tier field to Projectmaster.config.json- Restructured content_rulessrc/core/config.py- Added config models and get() methodsrc/generation/rule_engine.py- Implemented validation enginescripts/add_tier_to_projects.py- Database migrationtests/unit/test_rule_engine.py- Comprehensive test suite
Next Steps (Story 2.3)
The rule engine is ready to be integrated into Story 2.3 (AI-Powered Content Generation):
- Story 2.3 will use this engine to validate AI-generated content
- Can implement retry logic if validation fails
- Engine provides detailed feedback for AI prompt refinement