Pre-fix for checking for incomplete from openrouter - version 1.1.2

main
PeninsulaInd 2025-10-31 11:11:28 -05:00
parent 5eef4fe507
commit de6b97dbc1
9 changed files with 613 additions and 13 deletions

View File

@ -0,0 +1,462 @@
# Story 4.2: Flexible Content Types and External Linking
## Status
Draft - Ready for Review
## Story
**As a developer**, I want to generate different types of content (articles, directories, best-of lists, etc.) and link them to existing external URLs (like deployed T1 articles), so I can build comprehensive content ecosystems with maximum flexibility.
## Context
- Story 3.2 handles tiered linking between generated articles in the same project
- Story 3.3 injects links into generated content
- Current system only supports linking to money site or lower-tier articles in the same project
- Need ability to link to external URLs (deployed articles, external resources)
- Need support for different content types beyond standard articles
- User has existing T1 URLs deployed and wants to generate T2 content that links to them
## Acceptance Criteria
### Core Functionality
- **Content Type Support**: Generate different content structures based on `content_type` field
- **External Link Targets**: Support linking to external URLs from multiple sources
- **Flexible Link Sources**: Support file, database, and direct URL specification
- **Template Integration**: Auto-select appropriate templates based on content type
- **Backward Compatibility**: Existing jobs continue to work unchanged
### Content Types
- **`article`** (default): Standard article format (current behavior)
- **`directory`**: Curated list with categories, descriptions, and links
- **`best_of`**: Ranked list with pros/cons and comparisons
- **`comparison`**: Side-by-side comparison format
- **`guide`**: Step-by-step how-to format
- **`review`**: Product/service review format
- **`redirect`**: a cloud meta refresh
The content types are a to-do list for the future, but it gives an idea of the future path
### Link Target Sources
- **File Source**: Read URLs from deployment log files or other txt file
```json
"link_targets": {
"mode": "external",
"source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
}
```
- **Database Source**: Query existing articles from database
```json
"link_targets": {
"mode": "external",
"source": "database",
"project_id": 1,
"tier": "tier1"
}
```
- **Direct URLs**: Specify URLs directly in job config
```json
"link_targets": {
"mode": "external",
"source": "urls",
"urls": ["https://example.com/article1.html", "https://example.com/article2.html"]
}
```
### Job Configuration Extension
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier2": {
"count": 10,
"content_type": "directory",
"link_targets": {
"mode": "external",
"source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
},
"tiered_link_count_range": {"min": 3, "max": 5}
}
}
}]
}
```
## Implementation Details
### 1. Job Configuration Extension
Extend `TierConfig` dataclass:
```python
@dataclass
class TierConfig:
# ... existing fields ...
content_type: str = "article" # Default to article
link_targets: Optional[Dict[str, Any]] = None
```
### 2. Link Target Resolver
Create `src/interlinking/link_target_resolver.py`:
```python
def resolve_link_targets(link_targets: Dict, project_id: int) -> List[str]:
"""
Resolve link targets from various sources
Args:
link_targets: Link target configuration
project_id: Project ID for database queries
Returns:
List of target URLs
"""
mode = link_targets.get("mode", "internal")
source = link_targets.get("source", "database")
if mode == "external":
if source == "file":
return _resolve_from_file(link_targets["file_path"])
elif source == "database":
return _resolve_from_database(link_targets, project_id)
elif source == "urls":
return link_targets["urls"]
# Fallback to existing internal logic
return _resolve_internal_targets(link_targets, project_id)
```
### 3. Content Type Templates - IGNORE FOR NOW
Create specialized templates in `src/templating/templates/`:
- `directory.html` - For directory-style listings
- `best_of.html` - For "best of" lists
- `comparison.html` - For comparison articles
- `guide.html` - For how-to guides
- `review.html` - For review content
### 4. Content Type Prompts - IGNORE FOR NOW
Extend prompt system in `src/generation/prompts/`:
```python
CONTENT_TYPE_PROMPTS = {
"directory": """
Generate a comprehensive directory-style article about {keyword}.
Structure it as a curated list with:
- Brief descriptions for each item
- Categories/sections
- SEO-optimized content
- Natural link placement opportunities
""",
"best_of": """
Create a "Best of" article about {keyword}.
Include:
- Top 10-15 items with rankings
- Detailed explanations for each
- Pros/cons where relevant
- Comparison tables
- Expert recommendations
""",
"comparison": """
Write a detailed comparison article about {keyword}.
Structure as:
- Side-by-side comparisons
- Feature breakdowns
- Pros and cons
- Recommendations
- Decision factors
"""
}
```
### 5. Enhanced Tiered Links Logic
Modify `find_tiered_links()` to support external targets:
```python
def find_tiered_links(
content_records: List[GeneratedContent],
job_config,
project_repo: ProjectRepository,
content_repo: GeneratedContentRepository,
site_repo: SiteDeploymentRepository
) -> Dict:
"""Enhanced to support external link targets"""
tier = _validate_batch_tier(content_records)
tier_int = _extract_tier_number(tier)
# Check for external link targets in tier config
tier_config = _get_tier_config(job_config, tier)
if tier_config and tier_config.get("link_targets"):
external_urls = resolve_link_targets(tier_config["link_targets"], project_id)
if external_urls:
return {
"tier": tier_int,
"external_urls": external_urls,
"link_type": "external"
}
# Fall back to existing internal logic
return _find_internal_tiered_links(content_records, job_config, project_repo, content_repo, site_repo)
```
### 6. Template Selection Logic - IGNORE FOR NOW
Enhance `TemplateService` to auto-select based on content type:
```python
def select_template_for_content(
self,
site_deployment_id: Optional[int] = None,
site_deployment_repo=None,
content_type: str = "article"
) -> str:
"""Enhanced to support content type-based template selection"""
# Content type templates take priority
content_type_templates = {
"directory": "directory",
"best_of": "best_of",
"comparison": "comparison",
"guide": "guide",
"review": "review"
}
if content_type in content_type_templates:
template_name = content_type_templates[content_type]
if template_name in self.get_available_templates():
return template_name
# Fall back to existing site-based or random selection
return self._select_template_by_site_or_random(site_deployment_id, site_deployment_repo)
```
## Tasks / Subtasks
### 1. Job Configuration Extension
**Effort:** 2 story points
- [ ] Add `content_type: str = "article"` to `TierConfig` dataclass
- [ ] Add `link_targets: Optional[Dict[str, Any]] = None` to `TierConfig`
- [ ] Update job config parser to handle new fields
- [ ] Add validation for content_type (must be valid type)
- [ ] Add validation for link_targets structure
- [ ] Update job config documentation
### 2. Link Target Resolver
**Effort:** 3 story points
- [ ] Create `src/interlinking/link_target_resolver.py`
- [ ] Implement `resolve_link_targets()` function
- [ ] Implement file source resolver (read deployment logs)
- [ ] Implement database source resolver (query by project/tier)
- [ ] Implement direct URLs source resolver
- [ ] Add error handling for invalid sources
- [ ] Add logging for link target resolution
### 3. Content Type Templates IGNORE FOR NOW
**Effort:** 4 story points
- [ ] Create `directory.html` template with list-focused layout
- [ ] Create `best_of.html` template with ranking layout
- [ ] Create `comparison.html` template with side-by-side layout
- [ ] Create `guide.html` template with step-by-step layout
- [ ] Create `review.html` template with review-focused layout
- [ ] Ensure all templates are responsive and SEO-friendly
- [ ] Test template rendering with sample content
### 4. Content Type Prompts IGNORE FOR NOW
**Effort:** 3 story points
- [ ] Create `src/generation/content_type_prompts.py`
- [ ] Implement prompt templates for each content type
- [ ] Integrate with existing prompt system
- [ ] Add content type-specific generation logic
- [ ] Test prompt effectiveness with different content types
### 5. Enhanced Tiered Links Logic
**Effort:** 3 story points
- [ ] Modify `find_tiered_links()` to check for external targets
- [ ] Integrate with link target resolver
- [ ] Maintain backward compatibility with existing internal logic
- [ ] Update return format to include external URLs
- [ ] Add logging for external vs internal link resolution
### 6. Template Selection Enhancement IGNORE FOR NOW
**Effort:** 2 story points
- [ ] Enhance `TemplateService.select_template_for_content()`
- [ ] Add content type-based template selection
- [ ] Maintain existing site-based selection as fallback
- [ ] Update template service tests
### 7. Content Injection Updates
**Effort:** 2 story points
- [ ] Update `inject_interlinks()` to handle external URLs
- [ ] Ensure external links are properly recorded in `article_links` table
- [ ] Update link injection logic for different content types
- [ ] Test link injection with external targets
### 8. Unit Tests
**Effort:** 4 story points
- [ ] Test job config parsing with new fields
- [ ] Test link target resolver with all source types
- [ ] Test content type template selection
- [ ] Test enhanced tiered links logic
- [ ] Test external link injection
- [ ] Test backward compatibility with existing jobs
- [ ] Test error handling for invalid configurations
### 9. Integration Tests
**Effort:** 3 story points
- [ ] Test full flow: external targets → content generation → link injection
- [ ] Test with different content types and templates
- [ ] Test with various link target sources
- [ ] Verify external links are properly recorded
- [ ] Test with real deployment log files
- [ ] Test mixed internal/external link scenarios
## Dependencies
- Story 3.2: Tiered link finding must be complete
- Story 3.3: Content injection must be complete
- Story 2.4: Template system must be complete
- Story 2.3: Content generation must be complete
## Future Considerations
- Additional content types (FAQ, glossary, timeline, etc.)
- Advanced link target filtering (by domain, keyword, etc.)
- Content type-specific SEO optimizations
- Analytics for external link performance
- Content type templates with dynamic sections
## Total Effort
26 story points
## Technical Notes
### Content Type Examples
**Directory Content:**
```html
<h1>Complete Directory of {keyword}</h1>
<h2>Category 1</h2>
<ul>
<li><strong>Item 1:</strong> Description with natural link opportunity</li>
<li><strong>Item 2:</strong> Another description</li>
</ul>
<h2>Category 2</h2>
<!-- More categories -->
```
**Best Of Content:**
```html
<h1>Best {keyword} of 2024</h1>
<h2>#1 Top Choice</h2>
<p>Detailed explanation with pros and cons...</p>
<h2>#2 Runner Up</h2>
<p>Another detailed explanation...</p>
<!-- Rankings continue -->
```
### Link Target Resolution Examples
**File Source:**
```python
# deployment_logs/2025-10-27_tier1_urls.txt
https://liquid-level-gauge.b-cdn.net/common-vfd-failures.html
https://workbenchwizard.com/essential-guide-to-troubleshooting.html
https://robotmowers.top/mastering-vfd-drive-repair.html
```
**Database Source:**
```python
# Query existing T1 articles from project
SELECT url FROM generated_content
WHERE project_id = 1 AND tier = 'tier1' AND status = 'deployed'
```
**Direct URLs:**
```json
{
"link_targets": {
"mode": "external",
"source": "urls",
"urls": [
"https://example.com/article1.html",
"https://example.com/article2.html"
]
}
}
```
### Job Configuration Examples
**Directory Generation:**
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier2": {
"count": 5,
"content_type": "directory",
"link_targets": {
"mode": "external",
"source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
},
"tiered_link_count_range": {"min": 3, "max": 5}
}
}
}]
}
```
**Best Of Lists:**
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier2": {
"count": 3,
"content_type": "best_of",
"link_targets": {
"mode": "external",
"source": "database",
"project_id": 1,
"tier": "tier1"
}
}
}
}]
}
```
**Mixed Content Types:**
```json
{
"jobs": [{
"project_id": 1,
"tiers": {
"tier2": {
"count": 10,
"content_type": "article"
},
"tier3": {
"count": 5,
"content_type": "directory",
"link_targets": {
"mode": "external",
"source": "file://deployment_logs/2025-10-27_tier1_urls.txt"
}
}
}
}]
}
```
### Backward Compatibility
All existing job configurations will continue to work unchanged:
- `content_type` defaults to "article" (current behavior)
- `link_targets` defaults to `None` (uses existing internal logic)
- Existing templates and prompts remain unchanged
- No breaking changes to existing APIs
## Notes
This story provides maximum flexibility while maintaining backward compatibility. It enables users to generate diverse content types and link to any external URLs, making the system suitable for complex content ecosystem strategies.

View File

@ -0,0 +1,71 @@
"""
List all projects in reverse numerical order (by ID)
Usage:
uv run python scripts/list_projects.py
"""
import sys
from pathlib import Path
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
from src.database.session import db_manager
from src.database.repositories import ProjectRepository
try:
import msvcrt
except ImportError:
msvcrt = None
def wait_for_key():
"""Wait for user to press any key"""
if msvcrt:
print("\nPress any key to continue... (Press 'q' to quit)")
key = msvcrt.getch()
if key in (b'q', b'Q'):
return False
return True
else:
response = input("\nPress Enter to continue (or 'q' to quit): ")
return response.lower() != 'q'
def list_projects():
"""List all projects in reverse numerical order (by ID)"""
session = db_manager.get_session()
try:
project_repo = ProjectRepository(session)
projects = project_repo.get_all()
if not projects:
print("No projects found in database")
return
projects_sorted = sorted(projects, key=lambda p: p.id, reverse=True)
print(f"\nTotal projects: {len(projects_sorted)}")
print("=" * 100)
print(f"{'ID':<6} {'Name':<35} {'Main Keyword':<30} {'Tier':<6} {'User ID':<8} {'Created'}")
print("=" * 100)
batch_size = 10
for i, project in enumerate(projects_sorted, 1):
created = project.created_at.strftime("%Y-%m-%d %H:%M:%S")
print(f"{project.id:<6} {project.name[:34]:<35} {project.main_keyword[:29]:<30} {project.tier:<6} {project.user_id:<8} {created}")
if i % batch_size == 0 and i < len(projects_sorted):
if not wait_for_key():
break
print("=" * 100)
finally:
session.close()
if __name__ == "__main__":
list_projects()

View File

@ -0,0 +1,67 @@
"""
Update entities for a project
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
import click
from src.database.session import db_manager
from src.database.repositories import ProjectRepository
@click.command()
@click.argument('project_id', type=int)
@click.option('--entities', '-e', multiple=True, help='Entity to add (can be used multiple times)')
@click.option('--file', '-f', type=click.Path(exists=True), help='File with entities (one per line)')
@click.option('--replace', is_flag=True, help='Replace existing entities instead of appending')
def main(project_id: int, entities: tuple, file: str, replace: bool):
"""Update entities for PROJECT_ID"""
db_manager.initialize()
session = db_manager.get_session()
try:
repo = ProjectRepository(session)
project = repo.get_by_id(project_id)
if not project:
click.echo(f"Error: Project {project_id} not found", err=True)
return
# Collect entities from arguments or file
new_entities = list(entities)
if file:
with open(file, 'r', encoding='utf-8') as f:
file_entities = [line.strip() for line in f if line.strip()]
new_entities.extend(file_entities)
if not new_entities:
click.echo("No entities provided. Use --entities or --file", err=True)
return
# Update project entities
if replace:
project.entities = new_entities
click.echo(f"Replaced entities for project {project_id}")
else:
existing = project.entities or []
project.entities = existing + new_entities
click.echo(f"Added {len(new_entities)} entities to project {project_id}")
session.commit()
click.echo(f"\nTotal entities: {len(project.entities)}")
click.echo("Entities:")
for entity in project.entities:
click.echo(f" - {entity}")
except Exception as e:
session.rollback()
click.echo(f"Error: {e}", err=True)
raise
finally:
session.close()
if __name__ == "__main__":
main()

View File

@ -921,7 +921,7 @@ def list_projects(username: Optional[str], password: Optional[str]):
@click.option('--continue-on-error', is_flag=True,
help='Continue processing if article generation fails')
@click.option('--model', '-m', default='gpt-4o-mini',
help='AI model to use (gpt-4o-mini, claude-sonnet-4.5)')
help='AI model to use (gpt-4o-mini, x-ai/grok-4-fast)')
def generate_batch(
job_file: str,
username: Optional[str],

View File

@ -20,8 +20,8 @@ class DatabaseConfig(BaseModel):
class AIServiceConfig(BaseModel):
provider: str = "openrouter"
base_url: str = "https://openrouter.ai/api/v1"
model: str = "anthropic/claude-3.5-sonnet"
max_tokens: int = 4000
model: str = "gpt-4o-mini"
max_tokens: int = 6000
temperature: float = 0.7
timeout: int = 30
available_models: Dict[str, str] = Field(default_factory=dict)

View File

@ -11,7 +11,8 @@ from src.core.config import get_config
AVAILABLE_MODELS = {
"gpt-4o-mini": "openai/gpt-4o-mini",
"claude-sonnet-4.5": "anthropic/claude-3.5-sonnet"
"claude-sonnet-3.5": "anthropic/claude-3.5-sonnet",
"grok-4-fast": "x-ai/grok-4-fast"
}

View File

@ -259,7 +259,8 @@ class BatchProcessor:
'title': titles[article_index],
'keyword': keyword,
'resolved_targets': targets_for_tier,
'debug': debug
'debug': debug,
'models': models
})
if self.max_workers > 1:
@ -285,13 +286,12 @@ class BatchProcessor:
title: str,
keyword: str,
resolved_targets: Dict[str, int],
debug: bool
debug: bool,
models = None
):
"""Generate a single article with pre-generated title"""
prefix = f" [{article_num}/{tier_config.count}]"
models = self.current_job.models if hasattr(self, 'current_job') and self.current_job.models else None
site_deployment_id = assign_site_for_article(article_index, resolved_targets)
if site_deployment_id:
@ -453,7 +453,8 @@ class BatchProcessor:
title: str,
keyword: str,
resolved_targets: Dict[str, int],
debug: bool
debug: bool,
models = None
):
"""
Thread-safe wrapper for article generation
@ -482,8 +483,6 @@ class BatchProcessor:
prefix = f" [{article_num}/{tier_config.count}]"
models = self.current_job.models if hasattr(self, 'current_job') and self.current_job.models else None
site_deployment_id = assign_site_for_article(article_index, resolved_targets)
if site_deployment_id:

View File

@ -1,5 +1,5 @@
{
"system_message": "You are an expert content writer who creates engaging, informative, and SEO-optimized articles that provide real value to readers while incorporating relevant keywords naturally.",
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include naturally: {entities}\nRelated searches to address: {related_searches}\n\nTarget word count range: {min_word_count} to {max_word_count} words.\n\nIMPORTANT: Write approximately {words_per_section} words per H3 section to meet the target word count. Be thorough and substantive in each section.\n\nReturn as an HTML fragment with <h2>, <h3>, and <p> tags. Do NOT include <!DOCTYPE>, <html>, <head>, or <body> tags. Start directly with the first <h2> heading.\n\nWrite naturally and informatively. Incorporate the keyword, entities, and related searches organically throughout the content."
"user_prompt": "Write a complete article based on:\nTitle: {title}\nOutline: {outline}\nKeyword: {keyword}\n\nEntities to include naturally: {entities}\nRelated searches to address: {related_searches}\n\nTarget word count range: {min_word_count} to {max_word_count} words.\n\nIMPORTANT: Write approximately {words_per_section} words per H3 section to meet the target word count. Be thorough and substantive in each section.\n\nReturn as an HTML fragment with <h2>, <h3>, and <p> tags. Do NOT include <!DOCTYPE>, <html>, <head>, or <body> tags. Start directly with the first <h2> heading.\n\nWrite naturally and informatively. Incorporate the keyword, entities, and related searches organically throughout the content. You need more words than the minimum word count."
}

View File

@ -314,7 +314,7 @@ class ContentGenerator:
content = self.ai_client.generate_completion(
prompt=user_prompt,
system_message=system_msg,
max_tokens=8000,
max_tokens=12000,
temperature=0.7,
override_model=model
)