diff --git a/docs/prd/epic-2-content-generation.md b/docs/prd/epic-2-content-generation.md index fc1f710..c36ec8a 100644 --- a/docs/prd/epic-2-content-generation.md +++ b/docs/prd/epic-2-content-generation.md @@ -54,3 +54,19 @@ Implement the core workflow for ingesting CORA data and using AI to generate and - The function correctly selects and applies the appropriate template based on the configuration mapping. - The content is structured into a valid HTML document with the selected CSS. - The final HTML content is stored and associated with the project in the database. + +**Dependencies** +- Story 2.5 (optional): If no site_deployment_id is assigned, template selection defaults to random. + +### Story 2.5: Deployment Target Assignment +**As a developer**, I want to assign deployment targets to generated content during the content generation process, so that each article knows which site/bucket it will be deployed to and can use the appropriate template. + +**Acceptance Criteria** +- The job configuration file supports an optional `deployment_targets` array containing site custom_hostnames or site_deployment_ids. +- The job configuration file supports an optional `deployment_overflow` strategy ("round_robin", "random_available", or "none"). +- During content generation, each article is assigned a `site_deployment_id` based on its index in the batch: + - If `deployment_targets` is specified, cycle through the list (round-robin by default). + - If the batch size exceeds the target list, apply the overflow strategy. + - If no `deployment_targets` specified, `site_deployment_id` remains null (random template in Story 2.4). +- The `site_deployment_id` is stored in the `GeneratedContent` record at creation time. +- Invalid site references in `deployment_targets` cause graceful errors with clear messages. \ No newline at end of file diff --git a/docs/stories/story-2.4-html-formatting-templates.md b/docs/stories/story-2.4-html-formatting-templates.md new file mode 100644 index 0000000..c59b3da --- /dev/null +++ b/docs/stories/story-2.4-html-formatting-templates.md @@ -0,0 +1,141 @@ +# Story 2.4: HTML Formatting with Multiple Templates + +## Status +Completed + +## Story +**As a developer**, I want a module that takes the generated text content and formats it into a standard HTML file using one of a few predefined CSS templates, assigning one template per bucket/subdomain, so that all deployed content has a consistent look and feel per site. + +## Acceptance Criteria +- A directory of multiple, predefined HTML/CSS templates exists. +- The master JSON configuration file maps a specific template to each deployment target (e.g., S3 bucket, subdomain). +- A function accepts the generated content and a target identifier (e.g., bucket name). +- The function correctly selects and applies the appropriate template based on the configuration mapping. +- The content is structured into a valid HTML document with the selected CSS. +- The final HTML content is stored and associated with the project in the database. + +## Dependencies +- **Story 2.5**: Deployment Target Assignment must run before this story to set `site_deployment_id` on GeneratedContent +- If `site_deployment_id` is null, a random template will be selected + +## Tasks / Subtasks + +### 1. Create Template Infrastructure +**Effort:** 3 story points + +- [x] Create template file structure under `src/templating/templates/` + - Basic template (default) + - Modern template + - Classic template + - Minimal template +- [x] Each template should include: + - HTML structure with placeholders for title, meta, content + - Embedded or inline CSS for styling + - Responsive design (mobile-friendly) + - SEO-friendly structure (proper heading hierarchy, meta tags) + +### 2. Implement Template Loading Service +**Effort:** 3 story points + +- [x] Implement `TemplateService` class in `src/templating/service.py` +- [x] Add `load_template(template_name: str)` method that reads template file +- [x] Add `get_available_templates()` method that lists all templates +- [x] Handle template file not found errors gracefully with fallback to default +- [x] Cache loaded templates in memory for performance + +### 3. Implement Template Selection Logic +**Effort:** 2 story points + +- [x] Add `select_template_for_content(site_deployment_id: Optional[int])` method +- [x] If `site_deployment_id` exists: + - Query SiteDeployment table for custom_hostname + - Check `master.config.json` templates.mappings for hostname + - If mapping exists, use it + - If no mapping, randomly select template and save to config +- [x] If `site_deployment_id` is null: randomly select template +- [x] Return template name + +### 4. Implement Content Formatting +**Effort:** 5 story points + +- [x] Create `format_content(content: str, title: str, meta_description: str, template_name: str)` method +- [x] Parse HTML content and extract components +- [x] Replace template placeholders with actual content +- [x] Ensure proper escaping of HTML entities where needed +- [x] Validate output is well-formed HTML +- [x] Return formatted HTML string + +### 5. Database Integration +**Effort:** 2 story points + +- [x] Add `formatted_html` field to `GeneratedContent` model (Text type, nullable) +- [x] Add `template_used` field to `GeneratedContent` model (String(50), nullable) +- [x] Add `site_deployment_id` field to `GeneratedContent` model (FK to site_deployments, nullable, indexed) +- [x] Create database migration script +- [x] Update repository to save formatted HTML and template_used alongside raw content + +### 6. Integration with Content Generation Flow +**Effort:** 2 story points + +- [x] Update `src/generation/service.py` to call template service after content generation +- [x] Template service reads `site_deployment_id` from GeneratedContent +- [x] Store formatted HTML and template_used in database +- [x] Handle template formatting errors without breaking content generation + +### 7. Unit Tests +**Effort:** 3 story points + +- [x] Test template loading with valid and invalid names +- [x] Test template selection with site_deployment_id present +- [x] Test template selection with site_deployment_id null (random) +- [x] Test content formatting with different templates +- [x] Test fallback behavior when template not found +- [x] Test error handling for malformed templates +- [x] Achieve >80% code coverage for templating module + +### 8. Integration Tests +**Effort:** 2 story points + +- [x] Test end-to-end flow: content generation → template application → database storage +- [x] Test with site_deployment_id assigned (consistent template per site) +- [x] Test with site_deployment_id null (random template) +- [x] Verify formatted HTML is valid and renders correctly +- [x] Test new site gets random template assigned and persisted to config + +## Dev Notes + +### Current State +- `master.config.json` already has templates section with mappings (lines 52-59) +- `src/templating/service.py` exists but is empty (only 2 lines) +- `src/templating/templates/` directory exists but only contains `__init__.py` +- `GeneratedContent` model stores raw content in Text field but no formatted HTML field yet + +### Dependencies +- Story 2.2/2.3: Content must be generated before it can be formatted +- Story 2.5: Deployment target assignment (optional - defaults to random if not assigned) +- Configuration system: Uses existing master.config.json structure + +### Technical Decisions +1. **Template format:** Jinja2 or simple string replacement (to be decided during implementation) +2. **CSS approach:** Embedded ` + + +
+

{{ title }}

+ {{ content }} +
+ + + diff --git a/src/templating/templates/classic.html b/src/templating/templates/classic.html new file mode 100644 index 0000000..f6853b2 --- /dev/null +++ b/src/templating/templates/classic.html @@ -0,0 +1,105 @@ + + + + + + + {{ title }} + + + +
+

{{ title }}

+ {{ content }} +
+ + + diff --git a/src/templating/templates/minimal.html b/src/templating/templates/minimal.html new file mode 100644 index 0000000..ff84145 --- /dev/null +++ b/src/templating/templates/minimal.html @@ -0,0 +1,86 @@ + + + + + + + {{ title }} + + + +
+

{{ title }}

+ {{ content }} +
+ + + diff --git a/src/templating/templates/modern.html b/src/templating/templates/modern.html new file mode 100644 index 0000000..fd230e7 --- /dev/null +++ b/src/templating/templates/modern.html @@ -0,0 +1,109 @@ + + + + + + + {{ title }} + + + +
+

{{ title }}

+ {{ content }} +
+ + + diff --git a/tests/integration/test_template_integration.py b/tests/integration/test_template_integration.py new file mode 100644 index 0000000..7d60f0a --- /dev/null +++ b/tests/integration/test_template_integration.py @@ -0,0 +1,278 @@ +""" +Integration tests for template service with content generation +""" + +import pytest +from unittest.mock import patch +from src.database.models import Project, User, GeneratedContent, SiteDeployment +from src.database.repositories import ( + ProjectRepository, + GeneratedContentRepository, + SiteDeploymentRepository +) +from src.templating.service import TemplateService +from src.generation.service import ContentGenerator +from src.generation.ai_client import AIClient, PromptManager + + +@pytest.fixture +def test_user(db_session): + """Create a test user""" + user = User( + username="testuser_template", + hashed_password="hashed", + role="User" + ) + db_session.add(user) + db_session.commit() + return user + + +@pytest.fixture +def test_project(db_session, test_user): + """Create a test project""" + project_data = { + "main_keyword": "template testing", + "word_count": 1000, + "term_frequency": 2, + "h2_total": 5, + "h2_exact": 1, + "h3_total": 8, + "h3_exact": 1, + "entities": ["entity1", "entity2"], + "related_searches": ["search1", "search2"] + } + + project_repo = ProjectRepository(db_session) + project = project_repo.create(test_user.id, "Template Test Project", project_data) + + return project + + +@pytest.fixture +def test_site_deployment(db_session): + """Create a test site deployment""" + site_repo = SiteDeploymentRepository(db_session) + site = site_repo.create( + site_name="Test Site", + custom_hostname="test.example.com", + storage_zone_id=12345, + storage_zone_name="test-storage", + storage_zone_password="test-password", + storage_zone_region="DE", + pull_zone_id=67890, + pull_zone_bcdn_hostname="test.b-cdn.net" + ) + return site + + +@pytest.fixture +def test_generated_content(db_session, test_project): + """Create test generated content""" + content_repo = GeneratedContentRepository(db_session) + content = content_repo.create( + project_id=test_project.id, + tier="tier1", + keyword="template testing", + title="Test Article About Template Testing", + outline={"outline": [{"h2": "Introduction", "h3": ["Overview", "Benefits"]}]}, + content="

Introduction

This is test content.

Overview

More content here.

", + word_count=500, + status="generated" + ) + return content + + +@pytest.mark.integration +def test_template_service_with_database(db_session): + """Test TemplateService instantiation with database session""" + content_repo = GeneratedContentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + assert template_service is not None + assert template_service.content_repo == content_repo + assert len(template_service.get_available_templates()) >= 4 + + +@pytest.mark.integration +def test_format_content_end_to_end(db_session, test_generated_content): + """Test formatting content and storing in database""" + content_repo = GeneratedContentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + formatted = template_service.format_content( + content=test_generated_content.content, + title=test_generated_content.title, + meta_description="Test meta description", + template_name="basic" + ) + + assert "" in formatted + assert test_generated_content.title in formatted + assert "Test meta description" in formatted + assert test_generated_content.content in formatted + + test_generated_content.formatted_html = formatted + test_generated_content.template_used = "basic" + content_repo.update(test_generated_content) + + retrieved = content_repo.get_by_id(test_generated_content.id) + assert retrieved.formatted_html is not None + assert retrieved.template_used == "basic" + assert "" in retrieved.formatted_html + + +@pytest.mark.integration +def test_template_selection_with_site_deployment( + db_session, + test_generated_content, + test_site_deployment +): + """Test template selection based on site deployment""" + content_repo = GeneratedContentRepository(db_session) + site_repo = SiteDeploymentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + test_generated_content.site_deployment_id = test_site_deployment.id + content_repo.update(test_generated_content) + + template_name = template_service.select_template_for_content( + site_deployment_id=test_site_deployment.id, + site_deployment_repo=site_repo + ) + + available = template_service.get_available_templates() + assert template_name in available + + +@pytest.mark.integration +def test_template_selection_without_site_deployment(db_session, test_generated_content): + """Test random template selection when no site deployment""" + content_repo = GeneratedContentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + template_name = template_service.select_template_for_content( + site_deployment_id=None, + site_deployment_repo=None + ) + + available = template_service.get_available_templates() + assert template_name in available + + +@pytest.mark.integration +def test_content_generator_apply_template( + db_session, + test_project, + test_generated_content +): + """Test ContentGenerator.apply_template method""" + content_repo = GeneratedContentRepository(db_session) + project_repo = ProjectRepository(db_session) + + mock_ai_client = None + mock_prompt_manager = None + + generator = ContentGenerator( + ai_client=mock_ai_client, + prompt_manager=mock_prompt_manager, + project_repo=project_repo, + content_repo=content_repo + ) + + success = generator.apply_template(test_generated_content.id) + + assert success is True + + retrieved = content_repo.get_by_id(test_generated_content.id) + assert retrieved.formatted_html is not None + assert retrieved.template_used is not None + assert "" in retrieved.formatted_html + assert retrieved.title in retrieved.formatted_html + + +@pytest.mark.integration +def test_multiple_content_different_templates(db_session, test_project): + """Test that multiple content items can use different templates""" + content_repo = GeneratedContentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + content_items = [] + for i in range(4): + content = content_repo.create( + project_id=test_project.id, + tier="tier1", + keyword=f"keyword {i}", + title=f"Test Article {i}", + outline={"outline": [{"h2": "Section", "h3": ["Sub"]}]}, + content=f"

Section {i}

Content {i}

", + word_count=300, + status="generated" + ) + content_items.append(content) + + templates = ["basic", "modern", "classic", "minimal"] + + for content, template_name in zip(content_items, templates): + formatted = template_service.format_content( + content=content.content, + title=content.title, + meta_description=f"Description {content.id}", + template_name=template_name + ) + + content.formatted_html = formatted + content.template_used = template_name + content_repo.update(content) + + for content, expected_template in zip(content_items, templates): + retrieved = content_repo.get_by_id(content.id) + assert retrieved.template_used == expected_template + assert retrieved.formatted_html is not None + + +@pytest.mark.integration +def test_apply_template_with_missing_content(db_session, test_project): + """Test apply_template with non-existent content ID""" + content_repo = GeneratedContentRepository(db_session) + project_repo = ProjectRepository(db_session) + + generator = ContentGenerator( + ai_client=None, + prompt_manager=None, + project_repo=project_repo, + content_repo=content_repo + ) + + success = generator.apply_template(content_id=99999) + + assert success is False + + +@pytest.mark.integration +def test_formatted_html_storage( + db_session, + test_generated_content +): + """Test that formatted HTML is correctly stored in database""" + content_repo = GeneratedContentRepository(db_session) + template_service = TemplateService(content_repo=content_repo) + + formatted = template_service.format_content( + content=test_generated_content.content, + title=test_generated_content.title, + meta_description="Meta description", + template_name="modern" + ) + + test_generated_content.formatted_html = formatted + test_generated_content.template_used = "modern" + test_generated_content.site_deployment_id = None + + updated = content_repo.update(test_generated_content) + + retrieved = content_repo.get_by_id(test_generated_content.id) + assert retrieved.formatted_html == formatted + assert retrieved.template_used == "modern" + assert retrieved.site_deployment_id is None + diff --git a/tests/unit/test_template_service.py b/tests/unit/test_template_service.py new file mode 100644 index 0000000..dd42184 --- /dev/null +++ b/tests/unit/test_template_service.py @@ -0,0 +1,315 @@ +""" +Unit tests for template service +""" + +import pytest +import json +from pathlib import Path +from unittest.mock import Mock, MagicMock, patch, mock_open +from src.templating.service import TemplateService +from src.database.models import GeneratedContent, SiteDeployment + + +@pytest.fixture +def mock_content_repo(): + return Mock() + + +@pytest.fixture +def mock_site_deployment_repo(): + return Mock() + + +@pytest.fixture +def template_service(mock_content_repo): + return TemplateService(content_repo=mock_content_repo) + + +@pytest.fixture +def mock_site_deployment(): + deployment = Mock(spec=SiteDeployment) + deployment.id = 1 + deployment.custom_hostname = "test.example.com" + deployment.site_name = "Test Site" + return deployment + + +class TestGetAvailableTemplates: + def test_returns_list_of_template_names(self, template_service): + """Test that available templates are returned""" + templates = template_service.get_available_templates() + + assert isinstance(templates, list) + assert len(templates) >= 4 + assert "basic" in templates + assert "modern" in templates + assert "classic" in templates + assert "minimal" in templates + + def test_templates_are_sorted(self, template_service): + """Test that templates are returned in sorted order""" + templates = template_service.get_available_templates() + + assert templates == sorted(templates) + + +class TestLoadTemplate: + def test_load_valid_template(self, template_service): + """Test loading a valid template""" + template_content = template_service.load_template("basic") + + assert isinstance(template_content, str) + assert len(template_content) > 0 + assert "" in template_content + assert "{{ title }}" in template_content + assert "{{ content }}" in template_content + + def test_load_all_templates(self, template_service): + """Test that all templates can be loaded""" + for template_name in ["basic", "modern", "classic", "minimal"]: + template_content = template_service.load_template(template_name) + assert len(template_content) > 0 + + def test_template_caching(self, template_service): + """Test that templates are cached after first load""" + template_service.load_template("basic") + + assert "basic" in template_service._template_cache + + cached_content = template_service._template_cache["basic"] + loaded_content = template_service.load_template("basic") + + assert cached_content == loaded_content + + def test_load_nonexistent_template(self, template_service): + """Test that loading nonexistent template raises error""" + with pytest.raises(FileNotFoundError) as exc_info: + template_service.load_template("nonexistent") + + assert "nonexistent" in str(exc_info.value) + assert "not found" in str(exc_info.value).lower() + + +class TestSelectTemplateForContent: + def test_select_random_when_no_site_deployment(self, template_service): + """Test random selection when site_deployment_id is None""" + template_name = template_service.select_template_for_content( + site_deployment_id=None, + site_deployment_repo=None + ) + + available_templates = template_service.get_available_templates() + assert template_name in available_templates + + @patch('src.templating.service.get_config') + def test_use_existing_mapping( + self, + mock_get_config, + template_service, + mock_site_deployment, + mock_site_deployment_repo + ): + """Test using existing template mapping from config""" + mock_config = Mock() + mock_config.templates.mappings = { + "test.example.com": "modern" + } + mock_get_config.return_value = mock_config + + mock_site_deployment_repo.get_by_id.return_value = mock_site_deployment + + template_name = template_service.select_template_for_content( + site_deployment_id=1, + site_deployment_repo=mock_site_deployment_repo + ) + + assert template_name == "modern" + mock_site_deployment_repo.get_by_id.assert_called_once_with(1) + + @patch('src.templating.service.get_config') + @patch('builtins.open', new_callable=mock_open) + @patch('pathlib.Path.exists', return_value=True) + def test_create_new_mapping( + self, + mock_exists, + mock_file, + mock_get_config, + template_service, + mock_site_deployment, + mock_site_deployment_repo + ): + """Test creating new mapping when hostname not in config""" + mock_config = Mock() + mock_config.templates.mappings = {} + mock_get_config.return_value = mock_config + + mock_file.return_value.read.return_value = json.dumps({ + "templates": {"default": "basic", "mappings": {}} + }) + + mock_site_deployment_repo.get_by_id.return_value = mock_site_deployment + + template_name = template_service.select_template_for_content( + site_deployment_id=1, + site_deployment_repo=mock_site_deployment_repo + ) + + available_templates = template_service.get_available_templates() + assert template_name in available_templates + + +class TestFormatContent: + def test_format_with_basic_template(self, template_service): + """Test formatting content with basic template""" + content = "

Test Section

Test paragraph.

" + title = "Test Article" + meta = "Test meta description" + + formatted = template_service.format_content( + content=content, + title=title, + meta_description=meta, + template_name="basic" + ) + + assert "" in formatted + assert "Test Article" in formatted + assert "Test meta description" in formatted + assert "

Test Section

" in formatted + assert "

Test paragraph.

" in formatted + + def test_format_with_all_templates(self, template_service): + """Test formatting with all available templates""" + content = "

Section

Content.

" + title = "Title" + meta = "Description" + + for template_name in ["basic", "modern", "classic", "minimal"]: + formatted = template_service.format_content( + content=content, + title=title, + meta_description=meta, + template_name=template_name + ) + + assert len(formatted) > 0 + assert "Title" in formatted + assert "Description" in formatted + assert content in formatted + + def test_html_escaping_in_title(self, template_service): + """Test that HTML is escaped in title""" + content = "

Content

" + title = "Title with " + meta = "Description" + + formatted = template_service.format_content( + content=content, + title=title, + meta_description=meta, + template_name="basic" + ) + + assert "<script>" in formatted + assert "