Pre-fix for checking for incomplete from openrouter - version 1.1.2

2025-10-31 11:11:28 -05:00 · 2025-10-31 11:11:28 -05:00 · de6b97dbc1
parent 5eef4fe507
commit de6b97dbc1
9 changed files with 613 additions and 13 deletions
--- a/docs/stories/story-4.2-flexible-content-types-and-external-linking.md
+++ b/docs/stories/story-4.2-flexible-content-types-and-external-linking.md
@ -0,0 +1,462 @@
+# Story 4.2: Flexible Content Types and External Linking
+
+## Status
+Draft - Ready for Review
+
+## Story
+**As a developer**, I want to generate different types of content (articles, directories, best-of lists, etc.) and link them to existing external URLs (like deployed T1 articles), so I can build comprehensive content ecosystems with maximum flexibility.
+
+## Context
+- Story 3.2 handles tiered linking between generated articles in the same project
+- Story 3.3 injects links into generated content
+- Current system only supports linking to money site or lower-tier articles in the same project
+- Need ability to link to external URLs (deployed articles, external resources)
+- Need support for different content types beyond standard articles
+- User has existing T1 URLs deployed and wants to generate T2 content that links to them
+
+## Acceptance Criteria
+
+### Core Functionality
+- **Content Type Support**: Generate different content structures based on `content_type` field
+- **External Link Targets**: Support linking to external URLs from multiple sources
+- **Flexible Link Sources**: Support file, database, and direct URL specification
+- **Template Integration**: Auto-select appropriate templates based on content type
+- **Backward Compatibility**: Existing jobs continue to work unchanged
+
+### Content Types
+- **`article`** (default): Standard article format (current behavior)
+- **`directory`**: Curated list with categories, descriptions, and links
+- **`best_of`**: Ranked list with pros/cons and comparisons
+- **`comparison`**: Side-by-side comparison format
+- **`guide`**: Step-by-step how-to format
+- **`review`**: Product/service review format
+- **`redirect`**: a cloud meta refresh
+
+The content types are a to-do list for the future, but it gives an idea of the future path
+
+### Link Target Sources
+- **File Source**: Read URLs from deployment log files or other txt file
+  ```json
+  "link_targets": {
+    "mode": "external",
+    "source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
+  }
+  ```
+- **Database Source**: Query existing articles from database
+  ```json
+  "link_targets": {
+    "mode": "external", 
+    "source": "database",
+    "project_id": 1,
+    "tier": "tier1"
+  }
+  ```
+- **Direct URLs**: Specify URLs directly in job config
+  ```json
+  "link_targets": {
+    "mode": "external",
+    "source": "urls",
+    "urls": ["https://example.com/article1.html", "https://example.com/article2.html"]
+  }
+  ```
+
+### Job Configuration Extension
+```json
+{
+  "jobs": [{
+    "project_id": 1,
+    "tiers": {
+      "tier2": {
+        "count": 10,
+        "content_type": "directory",
+        "link_targets": {
+          "mode": "external",
+          "source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
+        },
+        "tiered_link_count_range": {"min": 3, "max": 5}
+      }
+    }
+  }]
+}
+```
+
+## Implementation Details
+
+### 1. Job Configuration Extension
+Extend `TierConfig` dataclass:
+```python
+@dataclass
+class TierConfig:
+    # ... existing fields ...
+    content_type: str = "article"  # Default to article
+    link_targets: Optional[Dict[str, Any]] = None
+```
+
+### 2. Link Target Resolver
+Create `src/interlinking/link_target_resolver.py`:
+```python
+def resolve_link_targets(link_targets: Dict, project_id: int) -> List[str]:
+    """
+    Resolve link targets from various sources
+    
+    Args:
+        link_targets: Link target configuration
+        project_id: Project ID for database queries
+    
+    Returns:
+        List of target URLs
+    """
+    mode = link_targets.get("mode", "internal")
+    source = link_targets.get("source", "database")
+    
+    if mode == "external":
+        if source == "file":
+            return _resolve_from_file(link_targets["file_path"])
+        elif source == "database":
+            return _resolve_from_database(link_targets, project_id)
+        elif source == "urls":
+            return link_targets["urls"]
+    
+    # Fallback to existing internal logic
+    return _resolve_internal_targets(link_targets, project_id)
+```
+
+### 3. Content Type Templates - IGNORE FOR NOW
+Create specialized templates in `src/templating/templates/`:
+- `directory.html` - For directory-style listings
+- `best_of.html` - For "best of" lists
+- `comparison.html` - For comparison articles
+- `guide.html` - For how-to guides
+- `review.html` - For review content
+
+### 4. Content Type Prompts - IGNORE FOR NOW
+Extend prompt system in `src/generation/prompts/`:
+```python
+CONTENT_TYPE_PROMPTS = {
+    "directory": """
+    Generate a comprehensive directory-style article about {keyword}.
+    Structure it as a curated list with:
+    - Brief descriptions for each item
+    - Categories/sections
+    - SEO-optimized content
+    - Natural link placement opportunities
+    """,
+    
+    "best_of": """
+    Create a "Best of" article about {keyword}.
+    Include:
+    - Top 10-15 items with rankings
+    - Detailed explanations for each
+    - Pros/cons where relevant
+    - Comparison tables
+    - Expert recommendations
+    """,
+    
+    "comparison": """
+    Write a detailed comparison article about {keyword}.
+    Structure as:
+    - Side-by-side comparisons
+    - Feature breakdowns
+    - Pros and cons
+    - Recommendations
+    - Decision factors
+    """
+}
+```
+
+### 5. Enhanced Tiered Links Logic
+Modify `find_tiered_links()` to support external targets:
+```python
+def find_tiered_links(
+    content_records: List[GeneratedContent],
+    job_config,
+    project_repo: ProjectRepository,
+    content_repo: GeneratedContentRepository,
+    site_repo: SiteDeploymentRepository
+) -> Dict:
+    """Enhanced to support external link targets"""
+    
+    tier = _validate_batch_tier(content_records)
+    tier_int = _extract_tier_number(tier)
+    
+    # Check for external link targets in tier config
+    tier_config = _get_tier_config(job_config, tier)
+    if tier_config and tier_config.get("link_targets"):
+        external_urls = resolve_link_targets(tier_config["link_targets"], project_id)
+        if external_urls:
+            return {
+                "tier": tier_int,
+                "external_urls": external_urls,
+                "link_type": "external"
+            }
+    
+    # Fall back to existing internal logic
+    return _find_internal_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
+```
+
+### 6. Template Selection Logic - IGNORE FOR NOW
+Enhance `TemplateService` to auto-select based on content type:
+```python
+def select_template_for_content(
+    self, 
+    site_deployment_id: Optional[int] = None,
+    site_deployment_repo=None,
+    content_type: str = "article"
+) -> str:
+    """Enhanced to support content type-based template selection"""
+    
+    # Content type templates take priority
+    content_type_templates = {
+        "directory": "directory",
+        "best_of": "best_of", 
+        "comparison": "comparison",
+        "guide": "guide",
+        "review": "review"
+    }
+    
+    if content_type in content_type_templates:
+        template_name = content_type_templates[content_type]
+        if template_name in self.get_available_templates():
+            return template_name
+    
+    # Fall back to existing site-based or random selection
+    return self._select_template_by_site_or_random(site_deployment_id, site_deployment_repo)
+```
+
+## Tasks / Subtasks
+
+### 1. Job Configuration Extension
+**Effort:** 2 story points
+
+- [ ] Add `content_type: str = "article"` to `TierConfig` dataclass
+- [ ] Add `link_targets: Optional[Dict[str, Any]] = None` to `TierConfig`
+- [ ] Update job config parser to handle new fields
+- [ ] Add validation for content_type (must be valid type)
+- [ ] Add validation for link_targets structure
+- [ ] Update job config documentation
+
+### 2. Link Target Resolver
+**Effort:** 3 story points
+
+- [ ] Create `src/interlinking/link_target_resolver.py`
+- [ ] Implement `resolve_link_targets()` function
+- [ ] Implement file source resolver (read deployment logs)
+- [ ] Implement database source resolver (query by project/tier)
+- [ ] Implement direct URLs source resolver
+- [ ] Add error handling for invalid sources
+- [ ] Add logging for link target resolution
+
+### 3. Content Type Templates IGNORE FOR NOW
+**Effort:** 4 story points
+
+- [ ] Create `directory.html` template with list-focused layout
+- [ ] Create `best_of.html` template with ranking layout
+- [ ] Create `comparison.html` template with side-by-side layout
+- [ ] Create `guide.html` template with step-by-step layout
+- [ ] Create `review.html` template with review-focused layout
+- [ ] Ensure all templates are responsive and SEO-friendly
+- [ ] Test template rendering with sample content
+
+### 4. Content Type Prompts IGNORE FOR NOW
+**Effort:** 3 story points
+
+- [ ] Create `src/generation/content_type_prompts.py`
+- [ ] Implement prompt templates for each content type
+- [ ] Integrate with existing prompt system
+- [ ] Add content type-specific generation logic
+- [ ] Test prompt effectiveness with different content types
+
+### 5. Enhanced Tiered Links Logic
+**Effort:** 3 story points
+
+- [ ] Modify `find_tiered_links()` to check for external targets
+- [ ] Integrate with link target resolver
+- [ ] Maintain backward compatibility with existing internal logic
+- [ ] Update return format to include external URLs
+- [ ] Add logging for external vs internal link resolution
+
+### 6. Template Selection Enhancement IGNORE FOR NOW
+**Effort:** 2 story points
+
+- [ ] Enhance `TemplateService.select_template_for_content()`
+- [ ] Add content type-based template selection
+- [ ] Maintain existing site-based selection as fallback
+- [ ] Update template service tests
+
+### 7. Content Injection Updates
+**Effort:** 2 story points
+
+- [ ] Update `inject_interlinks()` to handle external URLs
+- [ ] Ensure external links are properly recorded in `article_links` table
+- [ ] Update link injection logic for different content types
+- [ ] Test link injection with external targets
+
+### 8. Unit Tests
+**Effort:** 4 story points
+
+- [ ] Test job config parsing with new fields
+- [ ] Test link target resolver with all source types
+- [ ] Test content type template selection
+- [ ] Test enhanced tiered links logic
+- [ ] Test external link injection
+- [ ] Test backward compatibility with existing jobs
+- [ ] Test error handling for invalid configurations
+
+### 9. Integration Tests
+**Effort:** 3 story points
+
+- [ ] Test full flow: external targets → content generation → link injection
+- [ ] Test with different content types and templates
+- [ ] Test with various link target sources
+- [ ] Verify external links are properly recorded
+- [ ] Test with real deployment log files
+- [ ] Test mixed internal/external link scenarios
+
+## Dependencies
+- Story 3.2: Tiered link finding must be complete
+- Story 3.3: Content injection must be complete
+- Story 2.4: Template system must be complete
+- Story 2.3: Content generation must be complete
+
+## Future Considerations
+- Additional content types (FAQ, glossary, timeline, etc.)
+- Advanced link target filtering (by domain, keyword, etc.)
+- Content type-specific SEO optimizations
+- Analytics for external link performance
+- Content type templates with dynamic sections
+
+## Total Effort
+26 story points
+
+## Technical Notes
+
+### Content Type Examples
+
+**Directory Content:**
+```html
+<h1>Complete Directory of {keyword}</h1>
+<h2>Category 1</h2>
+<ul>
+  <li><strong>Item 1:</strong> Description with natural link opportunity</li>
+  <li><strong>Item 2:</strong> Another description</li>
+</ul>
+<h2>Category 2</h2>
+<!-- More categories -->
+```
+
+**Best Of Content:**
+```html
+<h1>Best {keyword} of 2024</h1>
+<h2>#1 Top Choice</h2>
+<p>Detailed explanation with pros and cons...</p>
+<h2>#2 Runner Up</h2>
+<p>Another detailed explanation...</p>
+<!-- Rankings continue -->
+```
+
+### Link Target Resolution Examples
+
+**File Source:**
+```python
+# deployment_logs/2025-10-27_tier1_urls.txt
+https://liquid-level-gauge.b-cdn.net/common-vfd-failures.html
+https://workbenchwizard.com/essential-guide-to-troubleshooting.html
+https://robotmowers.top/mastering-vfd-drive-repair.html
+```
+
+**Database Source:**
+```python
+# Query existing T1 articles from project
+SELECT url FROM generated_content 
+WHERE project_id = 1 AND tier = 'tier1' AND status = 'deployed'
+```
+
+**Direct URLs:**
+```json
+{
+  "link_targets": {
+    "mode": "external",
+    "source": "urls", 
+    "urls": [
+      "https://example.com/article1.html",
+      "https://example.com/article2.html"
+    ]
+  }
+}
+```
+
+### Job Configuration Examples
+
+**Directory Generation:**
+```json
+{
+  "jobs": [{
+    "project_id": 1,
+    "tiers": {
+      "tier2": {
+        "count": 5,
+        "content_type": "directory",
+        "link_targets": {
+          "mode": "external",
+          "source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
+        },
+        "tiered_link_count_range": {"min": 3, "max": 5}
+      }
+    }
+  }]
+}
+```
+
+**Best Of Lists:**
+```json
+{
+  "jobs": [{
+    "project_id": 1,
+    "tiers": {
+      "tier2": {
+        "count": 3,
+        "content_type": "best_of",
+        "link_targets": {
+          "mode": "external",
+          "source": "database",
+          "project_id": 1,
+          "tier": "tier1"
+        }
+      }
+    }
+  }]
+}
+```
+
+**Mixed Content Types:**
+```json
+{
+  "jobs": [{
+    "project_id": 1,
+    "tiers": {
+      "tier2": {
+        "count": 10,
+        "content_type": "article"
+      },
+      "tier3": {
+        "count": 5,
+        "content_type": "directory",
+        "link_targets": {
+          "mode": "external",
+          "source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
+        }
+      }
+    }
+  }]
+}
+```
+
+### Backward Compatibility
+All existing job configurations will continue to work unchanged:
+- `content_type` defaults to "article" (current behavior)
+- `link_targets` defaults to `None` (uses existing internal logic)
+- Existing templates and prompts remain unchanged
+- No breaking changes to existing APIs
+
+## Notes
+This story provides maximum flexibility while maintaining backward compatibility. It enables users to generate diverse content types and link to any external URLs, making the system suitable for complex content ecosystem strategies.
--- a/scripts/list_projects.py
+++ b/scripts/list_projects.py
@ -0,0 +1,71 @@
+"""
+List all projects in reverse numerical order (by ID)
+
+Usage:
+    uv run python scripts/list_projects.py
+"""
+
+import sys
+from pathlib import Path
+
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+
+from src.database.session import db_manager
+from src.database.repositories import ProjectRepository
+
+try:
+    import msvcrt
+except ImportError:
+    msvcrt = None
+
+
+def wait_for_key():
+    """Wait for user to press any key"""
+    if msvcrt:
+        print("\nPress any key to continue... (Press 'q' to quit)")
+        key = msvcrt.getch()
+        if key in (b'q', b'Q'):
+            return False
+        return True
+    else:
+        response = input("\nPress Enter to continue (or 'q' to quit): ")
+        return response.lower() != 'q'
+
+
+def list_projects():
+    """List all projects in reverse numerical order (by ID)"""
+    session = db_manager.get_session()
+    try:
+        project_repo = ProjectRepository(session)
+        projects = project_repo.get_all()
+        
+        if not projects:
+            print("No projects found in database")
+            return
+        
+        projects_sorted = sorted(projects, key=lambda p: p.id, reverse=True)
+        
+        print(f"\nTotal projects: {len(projects_sorted)}")
+        print("=" * 100)
+        print(f"{'ID':<6} {'Name':<35} {'Main Keyword':<30} {'Tier':<6} {'User ID':<8} {'Created'}")
+        print("=" * 100)
+        
+        batch_size = 10
+        for i, project in enumerate(projects_sorted, 1):
+            created = project.created_at.strftime("%Y-%m-%d %H:%M:%S")
+            print(f"{project.id:<6} {project.name[:34]:<35} {project.main_keyword[:29]:<30} {project.tier:<6} {project.user_id:<8} {created}")
+            
+            if i % batch_size == 0 and i < len(projects_sorted):
+                if not wait_for_key():
+                    break
+        
+        print("=" * 100)
+        
+    finally:
+        session.close()
+
+
+if __name__ == "__main__":
+    list_projects()
+
--- a/scripts/update_project_entities.py
+++ b/scripts/update_project_entities.py
@ -0,0 +1,67 @@
+"""
+Update entities for a project
+"""
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import click
+from src.database.session import db_manager
+from src.database.repositories import ProjectRepository
+
+@click.command()
+@click.argument('project_id', type=int)
+@click.option('--entities', '-e', multiple=True, help='Entity to add (can be used multiple times)')
+@click.option('--file', '-f', type=click.Path(exists=True), help='File with entities (one per line)')
+@click.option('--replace', is_flag=True, help='Replace existing entities instead of appending')
+def main(project_id: int, entities: tuple, file: str, replace: bool):
+    """Update entities for PROJECT_ID"""
+    db_manager.initialize()
+    session = db_manager.get_session()
+    
+    try:
+        repo = ProjectRepository(session)
+        project = repo.get_by_id(project_id)
+        
+        if not project:
+            click.echo(f"Error: Project {project_id} not found", err=True)
+            return
+        
+        # Collect entities from arguments or file
+        new_entities = list(entities)
+        
+        if file:
+            with open(file, 'r', encoding='utf-8') as f:
+                file_entities = [line.strip() for line in f if line.strip()]
+                new_entities.extend(file_entities)
+        
+        if not new_entities:
+            click.echo("No entities provided. Use --entities or --file", err=True)
+            return
+        
+        # Update project entities
+        if replace:
+            project.entities = new_entities
+            click.echo(f"Replaced entities for project {project_id}")
+        else:
+            existing = project.entities or []
+            project.entities = existing + new_entities
+            click.echo(f"Added {len(new_entities)} entities to project {project_id}")
+        
+        session.commit()
+        
+        click.echo(f"\nTotal entities: {len(project.entities)}")
+        click.echo("Entities:")
+        for entity in project.entities:
+            click.echo(f"  - {entity}")
+        
+    except Exception as e:
+        session.rollback()
+        click.echo(f"Error: {e}", err=True)
+        raise
+    finally:
+        session.close()
+
+if __name__ == "__main__":
+    main()
+
--- a/src/cli/commands.py
+++ b/src/cli/commands.py
@ -921,7 +921,7 @@ def list_projects(username: Optional[str], password: Optional[str]):
@click.option('--continue-on-error', is_flag=True, 
              help='Continue processing if article generation fails')
@click.option('--model', '-m', default='gpt-4o-mini',
-              help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
+              help='AI model to use (gpt-4o-mini, x-ai/grok-4-fast)')
 def generate_batch(
    job_file: str, 
    username: Optional[str], 
--- a/src/core/config.py
+++ b/src/core/config.py
@ -20,8 +20,8 @@ class DatabaseConfig(BaseModel):
 class AIServiceConfig(BaseModel):
    provider: str = "openrouter"
    base_url: str = "https://openrouter.ai/api/v1"
-    model: str = "anthropic/claude-3.5-sonnet"
-    max_tokens: int = 4000
+    model: str = "gpt-4o-mini"
+    max_tokens: int = 6000
    temperature: float = 0.7
    timeout: int = 30
    available_models: Dict[str, str] = Field(default_factory=dict)
--- a/src/generation/ai_client.py
+++ b/src/generation/ai_client.py
@ -11,7 +11,8 @@ from src.core.config import get_config

 AVAILABLE_MODELS = {
    "gpt-4o-mini": "openai/gpt-4o-mini",
-    "claude-sonnet-4.5": "anthropic/claude-3.5-sonnet"
+    "claude-sonnet-3.5": "anthropic/claude-3.5-sonnet",
+    "grok-4-fast": "x-ai/grok-4-fast"
 }


--- a/src/generation/batch_processor.py
+++ b/src/generation/batch_processor.py
@ -259,7 +259,8 @@ class BatchProcessor:
                'title': titles[article_index],
                'keyword': keyword,
                'resolved_targets': targets_for_tier,
-                'debug': debug
+                'debug': debug,
+                'models': models
            })
        
        if self.max_workers > 1:
@ -285,13 +286,12 @@ class BatchProcessor:
        title: str,
        keyword: str,
        resolved_targets: Dict[str, int],
-        debug: bool
+        debug: bool,
+        models = None
    ):
        """Generate a single article with pre-generated title"""
        prefix = f"    [{article_num}/{tier_config.count}]"
        
-        models = self.current_job.models if hasattr(self, 'current_job') and self.current_job.models else None
-        
        site_deployment_id = assign_site_for_article(article_index, resolved_targets)
        
        if site_deployment_id:
@ -453,7 +453,8 @@ class BatchProcessor:
        title: str,
        keyword: str,
        resolved_targets: Dict[str, int],
-        debug: bool
+        debug: bool,
+        models = None
    ):
        """
        Thread-safe wrapper for article generation
@ -482,8 +483,6 @@ class BatchProcessor:
            
            prefix = f"    [{article_num}/{tier_config.count}]"
            
-            models = self.current_job.models if hasattr(self, 'current_job') and self.current_job.models else None
-            
            site_deployment_id = assign_site_for_article(article_index, resolved_targets)
            
            if site_deployment_id:
--- a/src/generation/prompts/content_generation.json
+++ b/src/generation/prompts/content_generation.json
@ -1,5 +1,5 @@
 {
  "system_message": "You are an expert content writer who creates engaging, informative, and SEO-optimized articles that provide real value to readers while incorporating relevant keywords naturally.",
-  "user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include naturally: {entities}\nRelated searches to address: {related_searches}\n\nTarget word count range: {min_word_count} to {max_word_count} words.\n\nIMPORTANT: Write approximately {words_per_section} words per H3 section to meet the target word count. Be thorough and substantive in each section.\n\nReturn as an HTML fragment with <h2>, <h3>, and <p> tags. Do NOT include <!DOCTYPE>, <html>, <head>, or <body> tags. Start directly with the first <h2> heading.\n\nWrite naturally and informatively. Incorporate the keyword, entities, and related searches organically throughout the content."
+  "user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include naturally: {entities}\nRelated searches to address: {related_searches}\n\nTarget word count range: {min_word_count} to {max_word_count} words.\n\nIMPORTANT: Write approximately {words_per_section} words per H3 section to meet the target word count. Be thorough and substantive in each section.\n\nReturn as an HTML fragment with <h2>, <h3>, and <p> tags. Do NOT include <!DOCTYPE>, <html>, <head>, or <body> tags. Start directly with the first <h2> heading.\n\nWrite naturally and informatively. Incorporate the keyword, entities, and related searches organically throughout the content. You need more words than the minimum word count."
 }

--- a/src/generation/service.py
+++ b/src/generation/service.py
@ -314,7 +314,7 @@ class ContentGenerator:
        content = self.ai_client.generate_completion(
            prompt=user_prompt,
            system_message=system_msg,
-            max_tokens=8000,
+            max_tokens=12000,
            temperature=0.7,
            override_model=model
        )