diff --git a/CLI_INTEGRATION_EXPLANATION.md b/CLI_INTEGRATION_EXPLANATION.md deleted file mode 100644 index 1802129..0000000 --- a/CLI_INTEGRATION_EXPLANATION.md +++ /dev/null @@ -1,257 +0,0 @@ -# CLI Integration Explanation - Story 3.3 - -## The Problem - -Story 3.3's `inject_interlinks()` function (and Stories 3.1-3.2) are **implemented and tested perfectly**, but they're **never called** in the actual batch generation workflow. - -## Current Workflow - -When you run: -```bash -uv run python main.py generate-batch --job-file jobs/example.json -``` - -Here's what actually happens: - -### Step-by-Step Current Flow - -``` -1. CLI Command (src/cli/commands.py) - └─> generate_batch() function called - └─> Creates BatchProcessor - └─> BatchProcessor.process_job() - -2. BatchProcessor.process_job() (src/generation/batch_processor.py) - └─> Reads job file - └─> For each job: - └─> _process_single_job() - └─> Validates deployment targets - └─> For each tier (tier1, tier2, tier3): - └─> _process_tier() - -3. _process_tier() - └─> For each article (1 to count): - └─> _generate_single_article() - ├─> Generate title - ├─> Generate outline - ├─> Generate content - ├─> Augment if needed - └─> SAVE to database - -4. END! ⚠️ - - Nothing happens after articles are generated! - No URLs, no tiered links, no interlinking! -``` - -## What's Missing - -After all articles are generated for a tier, we need to add Story 3.1-3.3: - -```python -# THIS CODE DOES NOT EXIST YET! -# Needs to be added at the end of _process_tier() or _process_single_job() - -# 1. Get all generated content for this batch -content_records = self.content_repo.get_by_project_and_tier(project_id, tier_name) - -# 2. Assign sites (Story 3.1) -from src.generation.site_assignment import assign_sites_to_batch -assign_sites_to_batch(content_records, job, site_repo, bunny_client, project.main_keyword) - -# 3. Generate URLs (Story 3.1) -from src.generation.url_generator import generate_urls_for_batch -article_urls = generate_urls_for_batch(content_records, site_repo) - -# 4. Find tiered links (Story 3.2) -from src.interlinking.tiered_links import find_tiered_links -tiered_links = find_tiered_links( - content_records, job_config, project_repo, content_repo, site_repo -) - -# 5. Inject interlinks (Story 3.3) -from src.interlinking.content_injection import inject_interlinks -from src.database.repositories import ArticleLinkRepository -link_repo = ArticleLinkRepository(session) -inject_interlinks( - content_records, article_urls, tiered_links, - project, job_config, content_repo, link_repo -) - -# 6. Apply templates (existing functionality) -for content in content_records: - content_generator.apply_template(content.id) -``` - -## Why This Matters - -### Current State -✓ Articles are generated -✗ Articles have NO internal links -✗ Articles have NO tiered links -✗ Articles have NO "See Also" section -✗ Articles have NO final URLs assigned -✗ Templates are NOT applied - -**Result**: Articles sit in database with raw HTML, no links, unusable for deployment - -### With Integration -✓ Articles are generated -✓ Sites are assigned to articles -✓ Final URLs are generated -✓ Tiered links are found -✓ All links are injected -✓ Templates are applied -✓ Articles are ready for deployment - -**Result**: Complete, interlinked articles ready for Story 4.x deployment - -## Where to Add Integration - -### Option 1: End of `_process_tier()` (RECOMMENDED) -Add the integration code at line 162 (after the article generation loop): - -```python -def _process_tier(self, project_id, tier_name, tier_config, ...): - # ... existing article generation loop ... - - # NEW: Post-generation interlinking - click.echo(f" {tier_name}: Injecting interlinks for {tier_config.count} articles...") - self._inject_tier_interlinks(project_id, tier_name, job, debug) -``` - -Then create new method: -```python -def _inject_tier_interlinks(self, project_id, tier_name, job, debug): - """Inject interlinks for all articles in a tier""" - # Get all articles for this tier - content_records = self.content_repo.get_by_project_and_tier( - project_id, tier_name - ) - - if not content_records: - click.echo(f" Warning: No articles found for {tier_name}") - return - - # Steps 1-6 from above... -``` - -### Option 2: End of `_process_single_job()` -Add integration after ALL tiers are generated (processes entire job at once): - -```python -def _process_single_job(self, job, job_idx, debug, continue_on_error): - # ... existing tier processing ... - - # NEW: Process all tiers together - click.echo(f"\nPost-processing: Injecting interlinks...") - for tier_name in job.tiers.keys(): - self._inject_tier_interlinks(job.project_id, tier_name, job, debug) -``` - -## Why It Wasn't Integrated Yet - -Looking at the story implementations, it appears: - -1. **Story 3.1** (URL Generation) - Functions exist but not integrated -2. **Story 3.2** (Tiered Links) - Functions exist but not integrated -3. **Story 3.3** (Content Injection) - Functions exist but not integrated - -This suggests the stories focused on **building the functionality** with the expectation that **Story 4.x (Deployment)** would integrate everything together. - -## Impact of Missing Integration - -### Tests Still Pass ✓ -- Unit tests test functions in isolation -- Integration tests use the functions directly -- All 42 tests pass because the **functions work perfectly** - -### But Real Usage Fails ✗ -When you actually run `generate-batch`: -- Articles are generated -- They're saved to database -- But they have no links, no URLs, nothing -- Story 4.x deployment would fail because articles aren't ready - -## Effort to Fix - -**Time Estimate**: 30-60 minutes - -**Tasks**: -1. Add imports to `batch_processor.py` (2 minutes) -2. Create `_inject_tier_interlinks()` method (15 minutes) -3. Add call at end of `_process_tier()` (2 minutes) -4. Test with real job file (10 minutes) -5. Debug any issues (10-20 minutes) - -**Complexity**: Low - just wiring existing functions together - -## Testing the Integration - -After adding integration: - -```bash -# 1. Run batch generation -uv run python main.py generate-batch \ - --job-file jobs/test_small.json \ - --username admin \ - --password yourpass - -# 2. Check database for links -uv run python -c " -from src.database.session import db_manager -from src.database.repositories import ArticleLinkRepository - -session = db_manager.get_session() -link_repo = ArticleLinkRepository(session) -links = link_repo.get_all() -print(f'Total links: {len(links)}') -for link in links[:5]: - print(f' {link.link_type}: {link.anchor_text} -> {link.to_url or link.to_content_id}') -session.close() -" - -# 3. Verify articles have links in content -uv run python -c " -from src.database.session import db_manager -from src.database.repositories import GeneratedContentRepository - -session = db_manager.get_session() -content_repo = GeneratedContentRepository(session) -articles = content_repo.get_all(limit=1) -if articles: - print('Sample article content:') - print(articles[0].content[:500]) - print(f'Contains links: {\"` menu with Home link -- Template line 113: `
  • Home
  • ` -- This is part of the template wrapper, not injected into article content - -**Old behavior (now removed):** -- Previously, system searched article content for "Home" and tried to link it -- This was redundant since Home is already in the nav menu -- Code has been updated to remove this injection - -## Step-by-Step: What Happens During deploy-batch - -### Step 1: Load Articles from Database -``` -- Project 1 has generated content already -- Tier 1: 5 articles -- Tier 2: 20 articles -- Each article has: title, content (HTML), site_deployment_id -``` - -### Step 2: URL Generation (already done during generate-batch) -``` -Tier 1 URLs (round-robin between getcnc.info and textbullseye.com): -- Article 0: https://getcnc.info/{slug}.html -- Article 1: https://www.textbullseye.com/{slug}.html -- Article 2: https://getcnc.info/{slug}.html -- Article 3: https://www.textbullseye.com/{slug}.html -- Article 4: https://getcnc.info/{slug}.html - -Tier 2 URLs (round-robin): -- Articles 0-19 distributed across both domains -``` - -### Step 3: Tiered Links (already injected during generate-batch) - -**For Tier 1:** -- Target: Money site URL from project database -- Anchor text: main_keyword variations -- Links already in `generated_content.content` HTML - -**For Tier 2:** -- Target: Random selection of tier1 URLs (2-4 per article) -- Anchor text: related_searches from project -- Links already in HTML - -### Step 4: Homepage Links -- Home link is in the navigation menu (template) -- No longer injected into article content - -### Step 5: See Also Section (already injected) -- HTML section with links to other articles in same tier - -### Step 6: Template Application (already done) -- HTML wrapped in template from `src/templating/templates/basic.html` -- Navigation menu added -- Stored in `generated_content.formatted_html` - -### Step 7: Upload to Bunny.net -``` -For each article: - 1. Get site deployment credentials - 2. Upload formatted_html to storage zone - 3. File path: /{slug}.html - 4. Log URL to deployment_logs/ - 5. Update database: deployed_url, status='deployed' - -For each site's boilerplate pages: - 1. Upload index.html (if exists) - 2. Upload about.html - 3. Upload contact.html - 4. Upload privacy.html -``` - -## Database Link Tracking - -All links are tracked in `article_links` table: - -**Tier 1 Article Example (ID: 43):** -``` -| from_content_id | to_content_id | to_url | anchor_text | link_type | -|-----------------|---------------|--------|-------------|-----------| -| 43 | NULL | https://fzemanufacturing.com/... | "shaft machining" | tiered | -| 43 | 44 | NULL | "Understanding CNC..." | wheel_see_also | -| 43 | 45 | NULL | "Advanced Shaft..." | wheel_see_also | -| 43 | 46 | NULL | "Precision Machining..." | wheel_see_also | -| 43 | 47 | NULL | "Modern Shaft..." | wheel_see_also | -``` - -**Tier 2 Article Example (ID: 48):** -``` -| from_content_id | to_content_id | to_url | anchor_text | link_type | -|-----------------|---------------|--------|-------------|-----------| -| 48 | NULL | https://getcnc.info/{slug1}.html | "cnc machining services" | tiered | -| 48 | NULL | https://www.textbullseye.com/{slug2}.html | "precision shaft work" | tiered | -| 48 | NULL | https://getcnc.info/{slug3}.html | "shaft turning operations" | tiered | -| 48 | 49 | NULL | "Tier 2 Article 2 Title" | wheel_see_also | -| ... | ... | ... | ... | ... | -| 48 | 67 | NULL | "Tier 2 Article 20 Title" | wheel_see_also | -``` - -**Note:** Home link is no longer tracked in the database since it's in the template, not injected into content. - -## Your Specific JSON File Analysis - -```json -{ - "jobs": [ - { - "project_id": 1, - "deployment_targets": [ - "getcnc.info", - "www.textbullseye.com" - ], - "tiers": { - "tier1": { - "count": 5, - "min_word_count": 1500, - "max_word_count": 2000, - "models": { - "title": "openai/gpt-4o-mini", - "outline": "openai/gpt-4o-mini", - "content": "anthropic/claude-3.5-sonnet" - } - }, - "tier2": { - "count": 20, - "models": { - "title": "openai/gpt-4o-mini", - "outline": "openai/gpt-4o-mini", - "content": "openai/gpt-4o-mini" - }, - "interlinking": { - "links_per_article_min": 2, - "links_per_article_max": 4 - } - } - } - } - ] -} -``` - -**What This Configuration Does:** - -1. **Tier 1 (5 articles):** - - Uses Claude Sonnet for content, GPT-4o-mini for titles/outlines - - 1500-2000 words per article - - Distributed across getcnc.info and textbullseye.com - - Each links to: money site (1) + See Also (4) = 5 total links (plus Home in nav menu) - -2. **Tier 2 (20 articles):** - - Uses GPT-4o-mini for everything (cheaper) - - Default word count (1100-1500) - - Each links to: 2-4 tier1 articles + See Also (19) = 21-23 total links (plus Home in nav menu) - - Distributed across both domains - -3. **Missing Configurations (using defaults):** - - `tier1.interlinking`: Not specified → uses defaults (but tier1 always gets 1 money site link anyway) - - `anchor_text_config`: Not specified → uses master.config.json rules - -## All JSON Fields That Affect Behavior - -See `MASTER_JSON.json` for the complete reference. Key fields: - -**Top-level job fields:** -- `project_id` - Which project's data to use -- `deployment_targets` - Which domains to deploy to -- `models` - Which AI models to use -- `tiered_link_count_range` - How many tiered links (job-level default) -- `anchor_text_config` - Override anchor text generation -- `interlinking` - Job-level interlinking defaults - -**Tier-level fields:** -- `count` - Number of articles -- `min_word_count`, `max_word_count` - Content length -- `min_h2_tags`, `max_h2_tags`, `min_h3_tags`, `max_h3_tags` - Outline structure -- `models` - Tier-specific model overrides -- `interlinking` - Tier-specific interlinking overrides - -**Fields in master.config.json:** -- `interlinking.tier_anchor_text_rules` - Defines anchor text sources per tier -- `interlinking.include_home_link` - Global default for Home links -- `interlinking.wheel_links` - Enable/disable See Also sections - -**Fields in project database:** -- `main_keyword` - Used for tier1 anchor text -- `related_searches` - Used for tier2 anchor text -- `entities` - Used for tier3+ anchor text -- `money_site_url` - Destination for tier1 links - diff --git a/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md b/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md deleted file mode 100644 index 237b364..0000000 --- a/IMAGE_TEMPLATE_ISSUES_ANALYSIS.md +++ /dev/null @@ -1,89 +0,0 @@ -# Image and Template Issues Analysis - -## Problems Identified - -### 1. Missing Image CSS in Templates -**Issue**: None of the templates (basic, modern, classic) have CSS for `` tags. - -**Impact**: Images display at full size, breaking layout especially in modern template with constrained article width (850px). - -**Solution**: Add responsive image CSS to all templates: -```css -img { - max-width: 100%; - height: auto; - display: block; - margin: 1.5rem auto; - border-radius: 8px; -} -``` - -### 2. Template Storage Inconsistency -**Issue**: `template_used` field is only set when `apply_template()` is called. If: -- Templates are applied at different times -- Some articles skip template application -- Articles are moved between sites with different templates -- Template application fails silently - -Then the database may show incorrect or missing template values. - -**Evidence**: User reports articles showing "basic" when they're actually "modern". - -**Solution**: -- Always apply templates before deployment -- Re-apply templates if `template_used` doesn't match site's `template_name` -- Add validation to ensure `template_used` matches site template - -### 3. Images Lost During Interlink Injection -**Issue**: Processing order: -1. Images inserted into `content` → saved -2. Interlinks injected → BeautifulSoup parses/rewrites HTML → saved -3. Template applied → reads `content` → creates `formatted_html` - -BeautifulSoup parsing may break image tags or lose them during HTML rewriting. - -**Evidence**: User reports images were generated and uploaded (URLs in database) but don't appear in deployed articles. - -**Solution Options**: -- **Option A**: Re-insert images after interlink injection (read from `hero_image_url` and `content_images` fields) -- **Option B**: Use more robust HTML parsing that preserves all tags -- **Option C**: Apply template immediately after image insertion, then inject interlinks into `formatted_html` instead of `content` - -### 4. Image Size Not Constrained -**Issue**: Even if images are present, they're not constrained by template CSS, causing layout issues. - -**Solution**: Add image CSS (see #1) and ensure images are inserted with proper attributes: -```html -... -``` - -## Recommended Fixes - -### Priority 1: Add Image CSS to All Templates -Add responsive image styling to: -- `src/templating/templates/basic.html` -- `src/templating/templates/modern.html` -- `src/templating/templates/classic.html` - -### Priority 2: Fix Image Preservation -Modify `src/interlinking/content_injection.py` to preserve images: -- Use `html.parser` with `preserve_whitespace` or `html5lib` parser -- Or re-insert images after interlink injection using database fields - -### Priority 3: Fix Template Tracking -- Add validation in deployment to ensure `template_used` matches site template -- Re-apply templates if mismatch detected -- Add script to backfill/correct `template_used` values - -### Priority 4: Improve Image Insertion -- Add `max-width` style attribute when inserting images -- Ensure images are inserted with proper responsive attributes - -## Code Locations - -- Image insertion: `src/generation/image_injection.py` -- Interlink injection: `src/interlinking/content_injection.py` (line 53-76) -- Template application: `src/generation/service.py` (line 409-460) -- Template files: `src/templating/templates/*.html` -- Deployment: `src/deployment/deployment_service.py` (uses `formatted_html`) - diff --git a/INTEGRATION_COMPLETE.md b/INTEGRATION_COMPLETE.md deleted file mode 100644 index 5e3a800..0000000 --- a/INTEGRATION_COMPLETE.md +++ /dev/null @@ -1,337 +0,0 @@ -# CLI Integration Complete - Story 3.3 - -## Status: DONE ✅ - -The CLI integration for Story 3.1-3.3 has been successfully implemented and is ready for testing. - ---- - -## What Was Changed - -### 1. Modified `src/database/repositories.py` -**Change**: Added `require_site` parameter to `get_by_project_and_tier()` - -```python -def get_by_project_and_tier(self, project_id: int, tier: str, require_site: bool = True) -``` - -**Purpose**: Allows fetching articles with or without site assignments - -**Impact**: Backward compatible (default `require_site=True` maintains existing behavior) - -### 2. Modified `src/generation/batch_processor.py` -**Changes**: -1. Added imports for Story 3.1-3.3 functions -2. Added `job` parameter to `_process_tier()` -3. Added post-processing call at end of `_process_tier()` -4. Created new `_post_process_tier()` method - -**New Workflow**: -```python -_process_tier(): - 1. Generate all articles (existing) - 2. Handle failures (existing) - 3. ✨ NEW: Call _post_process_tier() - -_post_process_tier(): - 1. Get articles with site assignments - 2. Generate URLs (Story 3.1) - 3. Find tiered links (Story 3.2) - 4. Inject interlinks (Story 3.3) - 5. Apply templates -``` - ---- - -## What Now Happens When You Run `generate-batch` - -### Before Integration ❌ -```bash -uv run python main.py generate-batch --job-file jobs/example.json -``` - -Result: -- ✅ Articles generated -- ❌ No URLs -- ❌ No tiered links -- ❌ No "See Also" section -- ❌ No templates applied - -### After Integration ✅ -```bash -uv run python main.py generate-batch --job-file jobs/example.json -``` - -Result: -- ✅ Articles generated -- ✅ URLs generated for articles with site assignments -- ✅ Tiered links found (T1→money site, T2→T1) -- ✅ All interlinks injected (tiered + homepage + See Also) -- ✅ Templates applied to final HTML - ---- - -## CLI Output Example - -When you run a batch job, you'll now see: - -``` -Processing Job 1/1: Project ID 1 - Validating deployment targets: www.example.com - All deployment targets validated successfully - - tier1: Generating 5 articles - [1/5] Generating title... - [1/5] Generating outline... - [1/5] Generating content... - [1/5] Generated content: 2,143 words - [1/5] Saved (ID: 43, Status: generated) - [2/5] Generating title... - ... (repeat for all articles) - - tier1: Post-processing 5 articles... ← NEW! - Generating URLs... ← NEW! - Generated 5 URLs ← NEW! - Finding tiered links... ← NEW! - Found tiered links for tier 1 ← NEW! - Injecting interlinks... ← NEW! - Interlinks injected successfully ← NEW! - Applying templates... ← NEW! - Applied templates to 5/5 articles ← NEW! - tier1: Post-processing complete ← NEW! - -SUMMARY -Jobs processed: 1/1 -Articles generated: 5/5 -Augmented: 0 -Failed: 0 -``` - ---- - -## Testing the Integration - -### Quick Test - -1. **Create a small test job**: -```json -{ - "jobs": [ - { - "project_id": 1, - "deployment_targets": ["www.testsite.com"], - "tiers": { - "tier1": { - "count": 2 - } - } - } - ] -} -``` - -2. **Run the batch**: -```bash -uv run python main.py generate-batch \ - --job-file jobs/test_integration.json \ - --username admin \ - --password yourpass -``` - -3. **Verify the results**: - -Check for URLs: -```bash -uv run python -c " -from src.database.session import db_manager -from src.database.repositories import GeneratedContentRepository - -session = db_manager.get_session() -repo = GeneratedContentRepository(session) -articles = repo.get_by_project_and_tier(1, 'tier1') -for a in articles: - print(f'Article {a.id}: {a.title[:50]}') - print(f' Has links: {\"See Also` + `