Compare commits

..

No commits in common. "master" and "cora-start" have entirely different histories.

80 changed files with 1101 additions and 16320 deletions

View File

@ -1,182 +0,0 @@
# CNC Swiss Screw Machining: Precision, Process, and When to Use It
CNC Swiss screw machining is a precision turning process for producing small, complex parts at tight tolerances and high volumes. This guide covers how Swiss screw machines work, what makes them different from conventional CNC turning, and how to evaluate a machining partner.
---
## What Is CNC Swiss Screw Machining and How Does It Work?
Swiss screw machining is a CNC turning process that uses a sliding headstock and guide bushing to support bar stock close to the cutting point. The result is reduced deflection, minimal vibration, and tolerances that conventional lathes struggle to achieve.
### Origins and Definition
The Swiss screw machine was developed in Switzerland in the 1800s to produce the tiny screws and pins required for watchmaking. This early form of precision metalworking used cam-driven automatic lathes — mechanical automation that could repeat the same cuts with consistent accuracy. The design became the foundation for precision small-part manufacturing and fabrication worldwide.
Today's CNC Swiss lathes add programmable multi-axis motion, live tooling, and sub-spindle capability. These Swiss lathes handle complex geometries, tight tolerances, and high production volumes that cam-driven machines could not. Modern CNC machining controls allow manufacturers to program intricate tool paths across multiple axes, producing parts that would have been impossible on earlier automatic lathe designs.
The key distinction from a conventional CNC lathe: on a Swiss lathe, the workpiece moves through a guide bushing while the tools remain in a fixed cutting zone. On a conventional lathe, the tools traverse along a stationary workpiece held by the tailstock and headstock. This difference determines how much deflection occurs during cutting.
### The Sliding Headstock and Guide Bushing
Bar stock feeds through a collet in the sliding headstock, which moves along the Z-axis to advance material into the cutting zone. A guide bushing supports the bar just 13mm from where the tool contacts the workpiece.
With the material held rigidly near the cutting point, there is almost no leverage for cutting forces to deflect the workpiece. Vibration is dampened and chatter is reduced, delivering tighter tolerances and better surface finish than conventional turning on the same part geometry.
Guide bushings come in two types. Rotary guide bushings rotate with the workpiece and deliver tolerances of ±0.0005" or better. Fixed guide bushings do not rotate and are used when even tighter tolerances are required.
### Multi-Tool Simultaneous Operation
CNC Swiss screw machines can mount up to 20 tools and operate several simultaneously. A main spindle handles turning while a sub-spindle machines the back end — all in one setup. This level of automation eliminates manual handling and keeps cycle times short.
Live tooling adds milling, cross-drilling, threading, and tapping directly on the Swiss lathe. Parts that would require three or four setups across different CNC machines come off a Swiss screw machine complete, with no secondary operations needed.
---
## Benefits of CNC Swiss Screw Machining
### Precision and Production Advantages
- **Tolerances of ±0.0002"** are standard, with tighter tolerances achievable on specific features
- **Spindle speeds up to 10,000 RPM** enable efficient cutting of both metals and engineering plastics
- **Continuous bar-fed operation** — bar stock feeds automatically, parts drop off complete, minimal operator intervention
- **Reduced secondary operations** eliminate the cost of moving parts between machines
- **Automation** — bar feeders and CNC machining controls enable lights-out production, reducing labor costs on long runs
Setup is the largest cost driver. After that, per-part costs drop significantly, making Swiss screw machining cost-effective at medium to high production volumes.
### Materials for Swiss Screw Machining
Swiss screw machines work with a broad range of materials:
- **Stainless steel** — 303, 304, and 316 grades
- **Aluminum** — lightweight aerospace and electronics parts
- **Brass and copper** — electrical contacts, fittings, and connectors
- **Titanium** — medical implants and aerospace fasteners
- **Nickel alloys** — corrosion-resistant components for harsh environments
- **Bronze** — bushings, bearings, and wear components
- **Engineering plastics** — PEEK, Delrin, and nylon
Bar stock must be centerless-ground to ±0.0002" diametric tolerance to feed smoothly through the guide bushing. Exotic alloys like Inconel are workable but require specialized carbide tooling and experienced programming.
### Industries and Common Applications
- **Medical devices** — bone screws, dental implants, surgical instrument shafts, cannulas, and orthopedic pins. Medical applications often require biocompatible materials like titanium or surgical-grade stainless steel, plus full lot traceability.
- **Aerospace** — fasteners, sensor housings, hydraulic fittings, and electrical connectors. Aerospace machining demands tight tolerances, exotic materials, and documented quality processes.
- **Automotive** — fuel injector components, transmission pins, and valve parts produced in high volumes with consistent quality.
- **Electronics** — connector pins, contact sockets, terminal posts, and micro-components where dimensional precision directly affects electrical performance.
- **Defense** — ITAR-compliant precision components for weapons systems, communication equipment, and guidance systems.
Common machined parts include screws, pins, shafts, bushings, contacts, fittings, and cylindrical components with high length-to-diameter ratios.
---
## CNC Swiss Machining vs. Conventional CNC Turning
### Key Differences
| Factor | CNC Swiss Screw Machining | Conventional CNC Turning |
| ------ | ------------------------- | ------------------------ |
| Part diameter | Up to ~32mm (1.25") | Larger parts, no practical limit |
| Tolerances | ±0.0002" standard | ±0.001" typical |
| Complexity | Multi-axis, live tooling, sub-spindle | Typically 2-axis |
| Best volume | Medium to high | Flexible |
| L/D ratio | Excels at 10:1 or more | Limited by deflection |
| Setup cost | Higher | Lower |
| Per-part cost | Lower for small, complex parts | Lower for larger, simpler parts |
The guide bushing is the fundamental differentiator. It allows Swiss lathes to cut long, thin parts without the deflection that makes the same part impossible to hold tolerance on a conventional CNC lathe.
### When NOT to Use Swiss Screw Machining
Consider conventional CNC turning or milling when:
- **Parts exceed 32mm diameter** — larger parts need a conventional CNC lathe or mill
- **Production runs are very short** — for 1050 pieces, a conventional CNC lathe is more economical
- **Tolerances are relaxed** — if the spec calls for ±0.005" or wider, Swiss machining is overkill
- **The geometry is not cylindrical** — prismatic parts are better suited to 3-axis or 5-axis CNC milling
- **No features benefit from simultaneous operations** — simple turned profiles cost less on a conventional lathe
---
## Quality, Certification, and Choosing a Partner
### Industry Certifications and Inspection
Quality in Swiss screw machining depends on the quality management systems behind the machines.
Key certifications:
- **ISO 9001:2015** — baseline quality management system standard
- **ISO 13485** — required for medical device component manufacturing
- **ITAR registration** — mandatory for defense-related machining
- **IATF 16949** — automotive quality standard with defect prevention requirements
Inspection methods to ask about:
- **Statistical process control (SPC)** — monitors dimensional trends during production
- **Coordinate measuring machines (CMM)** — 3D dimensional verification of finished parts
- **First article inspection (FAI)** — full dimensional report verifying the setup matches the print
Material traceability is standard in medical and aerospace work and increasingly expected across all industries.
### What to Look for in a Swiss Screw Machining Supplier
- **Machine fleet** — modern CNC Swiss lathes with multi-axis capability, live tooling, and sub-spindles
- **Relevant certifications** — ISO 9001 baseline, plus ISO 13485, ITAR, or IATF 16949 as your industry requires
- **Demonstrated tolerance capability** — sample parts or dimensional reports in your materials
- **In-house secondary operations** — deburring, heat treating, plating, and passivation under one roof
- **Engineering support** — a good partner reviews prints and suggests design optimizations for manufacturability
---
## Get Started with CNC Swiss Screw Machining
CNC Swiss screw machining delivers precision, speed, and repeatability for small-diameter parts that demand tight tolerances. Whether you are producing medical implants, aerospace fasteners, or high-volume electronic connectors, Swiss machining is a proven process for turning complex designs into finished components. Contact us to discuss your project and request a quote.
---
<!-- FOQ SECTION START -->
## Frequently Asked Questions About CNC Swiss Screw Machining
### What Is the Difference Between Swiss Screw Machining and CNC Turning?
Swiss screw machining differs from conventional CNC turning in how the workpiece is supported during cutting. A Swiss screw machine uses a guide bushing to hold the bar stock within 13mm of the cutting tool, virtually eliminating deflection and enabling tolerances of ±0.0002". Conventional CNC turning clamps the workpiece without a guide bushing, which limits precision on long, slender parts and typically holds tolerances of ±0.001".
### How Tight Are Swiss Screw Machining Tolerances?
Swiss screw machining tolerances are typically ±0.0002" as a standard capability. This precision is possible because the guide bushing supports the workpiece close to the cutting tool, reducing deflection and vibration that would otherwise compromise dimensional accuracy.
### What Materials Can Be Swiss Screw Machined?
Swiss screw machines can process stainless steel, aluminum, brass, copper, titanium, nickel alloys, bronze, and engineering plastics like PEEK, Delrin, and nylon. Bar stock must be centerless-ground to ±0.0002" diametric tolerance to feed properly through the guide bushing.
### Is Swiss Screw Machining Cost-Effective for Small Production Runs?
Swiss screw machining is generally not cost-effective for very small runs due to significant setup time and tooling costs. The process becomes economical at medium to high volumes where setup cost is amortized across many parts. For runs under 50 pieces, conventional CNC turning is often more economical.
### What Industries Use CNC Swiss Screw Machining?
CNC Swiss screw machining is used extensively in medical device, aerospace, automotive, electronics, and defense manufacturing. These industries require small, complex, precision components produced at tight tolerances and in high volumes — exactly the part profile Swiss screw machines are designed to handle.
### How Does a Guide Bushing Work on a Swiss Screw Machine?
A guide bushing on a Swiss screw machine acts as a stationary support that holds the bar stock just 13mm from the cutting tool. As the sliding headstock feeds the workpiece through the bushing along the Z-axis, the bushing prevents the material from deflecting, enabling tighter tolerances and smoother surface finishes.
### What Part Sizes Can a Swiss Screw Machine Handle?
Swiss screw machines handle bar stock up to 32mm (1.25") in diameter. They excel at parts with high length-to-diameter ratios — 10:1 or greater — where conventional lathes would struggle with deflection. Larger parts are better suited to conventional CNC turning or milling.
### Does Swiss Screw Machining Require Secondary Operations?
Swiss screw machining often eliminates secondary operations entirely. With live tooling, sub-spindles, and multi-axis capability, a CNC Swiss machine can perform turning, milling, cross-drilling, threading, tapping, and knurling in a single setup. Parts frequently come off the machine complete.
### What Certifications Should a Swiss Screw Machining Supplier Have?
A Swiss screw machining supplier should hold ISO 9001:2015 as a baseline. Medical work requires ISO 13485, defense applications require ITAR registration, and automotive work calls for IATF 16949. Look for documented inspection processes including SPC, CMM measurement, and first article inspection.
### When Should You Choose Conventional CNC Over Swiss Machining?
Choose conventional CNC turning or milling over Swiss machining when parts exceed 32mm in diameter, production volumes are very low, tolerances are wider than ±0.005", or the geometry is primarily non-cylindrical. Conventional CNC is also better for simple turned profiles that don't benefit from simultaneous multi-tool operations.
<!-- FOQ SECTION END -->

View File

@ -1,167 +0,0 @@
# Outline: CNC Swiss Screw Machining
**Format:** Comprehensive Guide
**Target word count:** ~1,400 words (cluster target from Cora: 1,342)
**Primary keyword:** cnc swiss screw machining
**Target audience:** Engineers, procurement professionals, and manufacturing decision-makers evaluating Swiss screw machining for their parts
**Heading targets (from Cora Structure):** 1 H1, 4+ H2s, ~10 H3s
---
## H1: CNC Swiss Screw Machining: Precision, Process, and When to Use It
Brief intro: What Swiss screw machining is in one sentence, why it matters for precision small parts, and what the reader will learn.
---
## H2: What Is CNC Swiss Screw Machining and How Does It Work?
Definition + the mechanical process combined into one major section.
### H3: Origins and Definition
- Precision turning process using a sliding headstock and guide bushing
- Developed in Switzerland in the 1800s for watchmaking
- Key distinction from conventional CNC lathes: the workpiece moves, not just the tool
- Modern CNC Swiss machines: programmable, multi-axis, live tooling capable
### H3: The Sliding Headstock and Guide Bushing
- Bar stock feeds through collet in the sliding headstock
- Guide bushing supports material 1-3mm from the cutting tool
- Headstock moves along Z-axis, feeding stock into the tooling zone
- Result: minimal deflection, vibration dampened, tighter tolerances possible
- Guide bushing types: rotary (>±0.0005") vs. fixed (tighter tolerances)
### H3: Multi-Tool Simultaneous Operation
- Up to 20 tools can operate simultaneously
- Main spindle + sub-spindle: machine both ends of a part in one setup
- Live tooling: milling, cross-drilling, threading, tapping without removing the part
- Parts come off the machine complete — minimal secondary operations
~350 words
---
## H2: Benefits of CNC Swiss Screw Machining
### H3: Precision and Production Advantages
- **Precision:** ±0.0002" tolerances, up to 10,000 RPM, micron-level accuracy
- **Reduced secondary operations:** complete parts in one chucking
- **Production speed:** continuous bar-fed operation, minimal downtime
- **Material efficiency:** less waste than conventional machining
- **Cost-effective at volume:** low per-part cost once setup is complete
### H3: Materials for Swiss Screw Machining
- Metals: stainless steel, aluminum, brass, copper, bronze, titanium, nickel alloys
- Plastics: PEEK, Delrin, nylon
- Bar stock requirements: must be centerless-ground to ±0.0002" for optimal results
- Exotic alloys are workable but require specific tooling and speeds
### H3: Industries and Common Applications
- **Medical:** surgical instruments, implants, bone screws, dental components
- **Aerospace:** fasteners, connectors, sensor housings
- **Automotive:** high-volume small precision parts, fuel system components
- **Electronics:** pins, connectors, contacts, micro-components
- **Defense:** ITAR-compliant precision components
- Common part types: screws, pins, shafts, bushings, contacts, fittings
~350 words
---
## H2: CNC Swiss Machining vs. Conventional CNC Turning
### H3: Key Differences
| Factor | Swiss CNC | Conventional CNC |
| ------ | --------- | ---------------- |
| Part diameter | Up to ~32mm (1.25") | Larger parts |
| Tolerances | ±0.0002" standard | ±0.001" typical |
| Complexity | High (multi-axis, live tooling) | Moderate |
| Volume | Best at high volume | Better for short runs |
| Length-to-diameter ratio | Excels at high L/D ratios | Limited by deflection |
### H3: When NOT to Use Swiss Screw Machining
Parts larger than 32mm diameter, very short production runs where setup cost doesn't amortize, parts that don't require tight tolerances, non-cylindrical geometries better suited to 3- or 5-axis milling.
~250 words
---
## H2: Quality, Certification, and Choosing a Partner
### H3: Industry Certifications and Inspection
- ISO 9001:2015 (general quality management)
- ISO 13485 (medical device manufacturing)
- ITAR registration (defense applications)
- IATF 16949 (automotive)
- Inspection methods: SPC, CMM, optical measurement, laser micrometers
- First article inspection, in-process monitoring, material traceability
### H3: What to Look for in a Swiss Screw Machining Supplier
- Machine fleet: modern CNC Swiss machines with multi-axis capability
- Certifications relevant to your industry
- Tolerance capabilities demonstrated with similar materials
- Secondary operations available in-house
- Production volume capacity and lead times
~200 words
---
## Conclusion
Recap + CTA. ~50 words
---
## Structure Summary
| Level | Count | Cora Target (min) |
| ----- | ----- | ----------------- |
| H1 | 1 | 1 |
| H2 | 5 | 4 |
| H3 | 11 | 10 |
## Unique Angles
1. **"When NOT to use Swiss"** — honest guidance that builds trust and captures comparison traffic
2. **Quality/inspection detail** — goes beyond just listing ISO numbers
3. **Supplier selection guidance** — practical buyer help that competitors skip
---
## Fan-Out Query Headings
Separate from main content. Do NOT count against word count or heading targets.
Style as accordions, FAQs, or hidden divs.
Answer format: restate the question in the answer ("How does X work? X works by...").
Each answer: 2-3 sentences max, self-contained.
### H3: What Is the Difference Between Swiss Screw Machining and CNC Turning?
### H3: How Tight Are Swiss Screw Machining Tolerances?
### H3: What Materials Can Be Swiss Screw Machined?
### H3: Is Swiss Screw Machining Cost-Effective for Small Production Runs?
### H3: What Industries Use CNC Swiss Screw Machining?
### H3: How Does a Guide Bushing Work on a Swiss Screw Machine?
### H3: What Part Sizes Can a Swiss Screw Machine Handle?
### H3: Does Swiss Screw Machining Require Secondary Operations?
### H3: What Certifications Should a Swiss Screw Machining Supplier Have?
### H3: When Should You Choose Conventional CNC Over Swiss Machining?

View File

@ -1,122 +0,0 @@
# Research Summary: CNC Swiss Screw Machining
## Search Term
cnc swiss screw machining
## Sources Analyzed
| Source | URL | Word Count | Angle |
|--------|-----|------------|-------|
| Kerr Screw | kerrscrew.com/swiss-screw-machining-explained/ | ~1,300 | Historical context, automation evolution, applications |
| Avanti Engineering | avantiengineering.com/swiss-screw-machining-benefits-applications/ | ~900 | Benefits, applications, how it works |
| IQS Directory | iqsdirectory.com/.../swiss-screw-machining.html | ~6,500 | Deep technical guide: process, types, tools, materials, prep |
| Hogge Precision | hoggeprecision.com/benefits-of-cnc-swiss-screw-machining/ | ~800 | CNC vs automatic types, benefits, capabilities |
| Cox Manufacturing | coxmanufacturing.com/blog/what-is-swiss-screw-machining/ | ~250 | Brief intro, guide bushing emphasis |
| Nolte Precise | nolteprecise.com/cnc-swiss-screw-machining/ | ~1,100 | High-volume production focus |
| Hartford Technologies | resources.hartfordtechnologies.com/... | — | Swiss vs traditional machining comparison |
| Impro Precision | improprecision.com/introduction-swiss-screw-machining/ | — | Industry applications deep dive |
---
## Common Themes (what everyone covers)
### 1. Definition & History
Every competitor explains that Swiss screw machining originated in Switzerland in the late 1800s for watchmaking. They define it as a precision turning process using a sliding headstock and guide bushing. This is table stakes — must be covered.
### 2. How It Works (Guide Bushing + Sliding Headstock)
Core technical differentiator from conventional CNC lathes:
- Bar stock feeds through a chucking collet in the sliding headstock
- Guide bushing supports the workpiece 1-3mm from the cutting tool
- Headstock moves along Z-axis (vs. conventional lathes where the tool moves)
- Reduces deflection and vibration, enabling tighter tolerances
- Guide bushing types: synchronous rotary (for >±0.0005") and fixed (for tighter tolerances)
### 3. Precision & Tolerances
Consistently cited numbers:
- ±0.0002" to ±0.0005" tolerances standard
- Up to 10,000 RPM spindle speeds
- Bar stock must be centerless-ground to ±0.0002" diametric tolerance
- Surface finish quality superior to conventional turning
### 4. Benefits Over Conventional CNC
Every competitor lists some version of:
- Tighter tolerances (guide bushing reduces deflection)
- Reduced secondary operations (multi-spindle, live tooling)
- Higher production speed for small parts
- Lower per-part cost at volume
- Less material waste
- Simultaneous multi-tool operation (up to 20 tools at once)
### 5. Materials
Standard list: stainless steel, aluminum, brass, copper, bronze, titanium, nickel alloys, and engineering plastics (PEEK, Delrin, nylon). Exotic alloys also mentioned.
### 6. Industries & Applications
Medical (implants, surgical instruments), aerospace (fasteners, connectors), automotive (high-volume small parts), electronics (connectors, pins), defense, hydraulics, telecommunications.
### 7. CNC vs. Automatic (Cam-Driven)
Most competitors distinguish between:
- Automatic/cam-driven machines: simpler geometry, extremely high volume, lower setup flexibility
- CNC Swiss machines: complex geometry, tighter tolerances, programmable, more flexible
---
## Content Structure Patterns
**Short-form competitors** (~250-800 words): Kerr Screw, Hogge, Cox
- Definition → Benefits list → Industries → CTA
- Minimal technical depth, service-page style
**Mid-form competitors** (~900-1,400 words): Avanti, Nolte, Hartford
- Definition → How it works → Benefits → Applications → Swiss vs. conventional comparison
- Moderate technical depth, educational blog style
**Long-form competitors** (~6,500 words): IQS Directory
- Comprehensive guide with chapters: definition → process → types → tools → materials → components → benefits → preparation
- Deep technical reference, encyclopedia style
**Observation:** Most competitors are in the 800-1,400 word range. IQS is an outlier at 6,500+. There's a gap in the 2,000-3,000 word range — content that's thorough enough to be a real resource but not a textbook chapter.
---
## Gaps (what competitors miss or cover poorly)
### 1. Design for Swiss Machining
Only IQS Directory touches on preparation/design considerations. Nobody provides practical guidance for engineers on how to design parts specifically for Swiss screw machining (feature sizes, wall thickness, corner radii, tolerance callouts that are realistic).
### 2. When NOT to Use Swiss Machining
Competitors focus on benefits but rarely discuss limitations or when conventional CNC is actually better (larger parts, short runs, parts without rotational symmetry).
### 3. Cost Breakdown / Economics
Everyone says "cost-effective" but nobody provides actual cost drivers: setup costs, material costs (centerless-ground bar stock premium), tooling costs, volume thresholds where Swiss becomes economical vs. conventional CNC.
### 4. Quality & Inspection Process
Certifications get mentioned (ISO 9001, ISO 13485, ITAR) but the actual inspection process — SPC, CMM measurement, optical inspection, first article inspection — is barely explained.
### 5. Machine Selection (Brand/Model Landscape)
Brief mentions of Tsugami, Citizen, Star, Tornos — but no meaningful comparison of what machines are used or why. Buyers researching this topic often need to understand what machine capabilities their supplier should have.
### 6. Modern Capabilities Beyond Turning
Swiss machines today can do milling, drilling, cross-drilling, threading, knurling, and even gear cutting — but most competitors undersell these capabilities, making Swiss machining sound like it's only for round turned parts.
---
## Potential Unique Angles
1. **"Design for Swiss" section** — Practical engineering guidance on how to design parts that are optimized for Swiss screw machining. This is genuinely useful and nobody covers it well.
2. **Economics / When to Choose Swiss** — Honest cost analysis: volume thresholds, setup costs, when conventional CNC or multi-spindle screw machines are actually better choices. This builds trust and captures comparison-search traffic.
3. **Modern Swiss capabilities** — Position Swiss machining as more than just turning. Cover live tooling, secondary operations, and complex multi-axis work that today's CNC Swiss machines can handle.
---
## Entity Landscape (from competitor content)
Frequently mentioned entities across sources:
- **Machine components:** guide bushing, sliding headstock, spindle, collet, bar feeder, turret, live tooling
- **Materials:** stainless steel, aluminum, brass, titanium, PEEK, Delrin, copper, bronze, nickel
- **Industries:** medical devices, aerospace, automotive, electronics, defense, telecommunications
- **Processes:** turning, milling, drilling, threading, tapping, knurling, parting
- **Quality:** ISO 9001, ISO 13485, ITAR, SPC, CMM, first article inspection
- **Machine brands:** Tsugami, Citizen, Star, Tornos
- **Specifications:** tolerance (±0.0002"), RPM (10,000), bar stock diameter (up to 32mm or 1.25")

View File

@ -1,180 +0,0 @@
# Brand Voice & Tone Guidelines
Reference for maintaining consistent voice across all written content. These are defaults — override with client-specific guidelines when available.
---
## Voice Archetypes
Choose one primary archetype per brand. A secondary archetype can add nuance but should never dominate.
### Expert
- **Sounds like:** A senior practitioner sharing hard-won knowledge.
- **Characteristics:** Precise, evidence-backed, confident without arrogance. Cites data, references real-world experience, and isn't afraid to say "it depends."
- **Typical vocabulary:** "In practice," "the tradeoff is," "based on our benchmarks," "here's why this matters."
- **Risk to avoid:** Coming across as condescending or overly academic.
- **Best for:** Technical audiences, B2B SaaS, engineering blogs, whitepapers.
### Guide
- **Sounds like:** A patient teacher walking you through something step by step.
- **Characteristics:** Clear, encouraging, anticipates confusion. Breaks complex ideas into digestible pieces. Uses analogies.
- **Typical vocabulary:** "Let's start with," "think of it like," "the key thing to remember," "don't worry if this seems complex."
- **Risk to avoid:** Being patronizing or oversimplifying for an advanced audience.
- **Best for:** Tutorials, onboarding content, documentation, beginner-to-intermediate audiences.
### Innovator
- **Sounds like:** Someone who sees around corners and wants to bring you along.
- **Characteristics:** Forward-looking, curious, willing to challenge assumptions. Connects dots across domains. Thinks in systems.
- **Typical vocabulary:** "What if," "the shift we're seeing," "this changes the calculus," "the next wave."
- **Risk to avoid:** Sounding like hype or vaporware. Must ground vision in evidence.
- **Best for:** Thought leadership, industry analysis, product vision content, founder blogs.
### Friend
- **Sounds like:** A sharp colleague sharing advice over coffee.
- **Characteristics:** Warm, direct, conversational. Uses "you" and "we." Comfortable with humor when it's natural. Doesn't hide behind jargon.
- **Typical vocabulary:** "Here's the thing," "honestly," "we've all been there," "the trick is."
- **Risk to avoid:** Being too casual for high-stakes topics or enterprise audiences.
- **Best for:** Community content, newsletters, brand blogs aimed at practitioners.
### Motivator
- **Sounds like:** A coach who believes in your potential and pushes you to act.
- **Characteristics:** Energetic, action-oriented, focused on outcomes. Uses imperatives. Celebrates progress.
- **Typical vocabulary:** "Start today," "you can do this," "here's your edge," "stop waiting for perfect."
- **Risk to avoid:** Empty cheerleading. Must pair motivation with substance.
- **Best for:** Career content, productivity content, entrepreneurship, course marketing.
---
## Core Writing Principles
These apply regardless of archetype.
### 1. Clarity First
- If a sentence can be misread, rewrite it.
- Use the simplest word that conveys the precise meaning. "Use" over "utilize." "Start" over "commence."
- One idea per paragraph. One purpose per section.
- Define jargon on first use, or skip it entirely.
### 2. Customer-Centric
- Frame everything from the reader's perspective, not the company's.
- **Instead of:** "We built a new feature that enables real-time collaboration."
- **Write:** "You can now edit documents with your team in real time."
- Lead with the reader's problem or goal, not the product or solution.
### 3. Active Voice
- Active voice is the default. Passive voice is acceptable only when the actor is unknown or irrelevant.
- **Active:** "The script generates a report every morning."
- **Passive (acceptable):** "The logs are rotated every 24 hours." (The actor doesn't matter.)
- **Passive (avoid):** "A decision was made to deprecate the endpoint." (Who decided?)
### 4. Show, Don't Claim
- Replace vague claims with specific evidence.
- **Claim:** "Our platform is incredibly fast."
- **Show:** "Queries return in under 50ms at the 99th percentile."
- If you can't provide evidence, soften the language or cut the sentence.
---
## Tone Attributes
Tone shifts based on content type and audience. Use these spectrums to calibrate.
### Formality Spectrum
```
Casual -------|-------|-------|-------|------- Formal
1 2 3 4 5
```
| Level | Description | Use When |
|-------|-------------|----------|
| 1 | Slang OK, sentence fragments, first person | Internal team comms, very informal blogs |
| 2 | Conversational, contractions, direct address | Newsletters, community posts, most blog content |
| 3 | Professional but approachable, minimal contractions | Product announcements, mid-funnel content |
| 4 | Polished, structured, no contractions | Whitepapers, enterprise case studies, executive briefs |
| 5 | Formal, third person, precise terminology | Legal, compliance, academic partnerships |
**Default for most blog/article content: Level 2-3.**
### Technical Depth Spectrum
```
General -------|-------|-------|-------|------- Deep Technical
1 2 3 4 5
```
| Level | Description | Use When |
|-------|-------------|----------|
| 1 | No jargon, analogy-heavy, conceptual | Non-technical stakeholders, general audience |
| 2 | Light jargon (defined inline), practical focus | Business audience with some domain familiarity |
| 3 | Industry-standard terminology, code snippets OK | Practitioners who do the work daily |
| 4 | Assumes working knowledge, implementation details | Developers, engineers, technical decision-makers |
| 5 | Deep internals, performance analysis, tradeoff math | Senior engineers, architects, researchers |
**Default: Match the audience. When unsure, aim one level below what you think the audience can handle. Accessibility wins.**
---
## Language Preferences
### Use Action Verbs
Lead sentences — especially headings and CTAs — with strong verbs.
| Weak | Strong |
|------|--------|
| There is a way to improve | Improve |
| This section is a discussion of | This section covers |
| You should consider using | Use |
| It is important to note that | Note: |
| We are going to walk through | Let's walk through |
### Be Concrete and Specific
Vague language erodes trust. Replace generalities with specifics.
| Vague | Concrete |
|-------|----------|
| "significantly faster" | "3x faster" or "reduced from 12s to 2s" |
| "a large number of users" | "over 40,000 monthly active users" |
| "best-in-class" | describe the specific advantage |
| "seamless integration" | "connects via a single API call" |
| "in the near future" | "by Q2" or "in the next release" |
### Avoid These Patterns
- **Weasel words:** "very," "really," "extremely," "quite," "somewhat" — cut them or replace with data.
- **Nominalizations:** "implementation" when you mean "implement," "utilization" when you mean "use."
- **Hedge stacking:** "It might potentially be possible to perhaps consider..." — commit to a position or state the uncertainty once, clearly.
- **Buzzword chains:** "AI-powered next-gen synergistic platform" — describe what it actually does.
---
## Pre-Publication Checklist
Run through this before publishing any piece of content.
### Voice Consistency
- [ ] Does the piece sound like one person wrote it, beginning to end?
- [ ] Does it match the target voice archetype?
- [ ] Are there jarring shifts in tone between sections?
- [ ] If multiple authors contributed, has it been edited for a unified voice?
### Clarity
- [ ] Can a reader in the target audience understand every sentence on the first read?
- [ ] Is jargon defined or avoided?
- [ ] Are all acronyms expanded on first use?
- [ ] Do headings accurately describe the content beneath them?
- [ ] Is the article scannable? (subheadings every 2-4 paragraphs, short paragraphs, lists where appropriate)
### Value
- [ ] Does the introduction make clear what the reader will gain?
- [ ] Does every section earn its place? (Cut anything that doesn't serve the reader's goal.)
- [ ] Are claims supported by evidence, examples, or data?
- [ ] Is the advice actionable — can the reader do something with it today?
- [ ] Does the conclusion provide a clear next step?
### Formatting
- [ ] Title is under 70 characters and includes the core keyword or topic.
- [ ] Meta description is 140-160 characters and summarizes the value proposition.
- [ ] Headings use parallel structure (all questions, all noun phrases, or all verb phrases — not mixed).
- [ ] Code blocks, tables, and images have context (a sentence before them explaining what the reader is looking at).
- [ ] Links use descriptive anchor text, not "click here."
- [ ] No walls of text — maximum 4 sentences per paragraph for web content.

View File

@ -1,267 +0,0 @@
# Content Frameworks Reference
Quick-reference guide for structuring blog posts and articles. Use these templates as starting points, then adapt to the topic and audience.
---
## Article Templates
### How-To Guide
```
Title: How to [Achieve Specific Outcome] (in [Timeframe/Steps])
Introduction
- State the outcome the reader will achieve
- Briefly explain why this matters or who this is for
- Set expectations: what they need, how long it takes
Prerequisites / What You'll Need (optional)
- Tools, knowledge, or setup required before starting
Step 1: [Action Verb] + [Object]
- What to do and why
- Concrete details, examples, or code snippets
- Common mistake to avoid at this step
Step 2: [Action Verb] + [Object]
- (same pattern)
... (repeat for each step)
Troubleshooting / Common Issues (optional)
- Problem → Cause → Fix, in a quick table or list
Conclusion
- Recap what the reader accomplished
- Suggest a logical next step or related guide
```
**Key principle:** Each step starts with an action verb. One action per step. If a step has sub-steps, break it out.
---
### Listicle
```
Title: [Number] [Adjective] [Things] for [Audience/Goal]
Examples: "9 Underrated Tools for Frontend Performance"
"5 Strategies That Reduced Our Build Time by 60%"
Introduction (2-3 sentences)
- Who this list is for
- What criteria you used to select items
Item 1: [Name or Short Description]
- What it is (1 sentence)
- Why it matters or when to use it (1-2 sentences)
- Concrete example, stat, or tip
Item 2: ...
(repeat)
Wrap-Up
- Quick summary of top picks or situational recommendations
- CTA: ask readers to share their own picks, or link to a deeper dive
```
**Key principle:** Each item must stand alone. Readers skim listicles — front-load the value in each entry. Order by impact (strongest first or last) or by logical progression.
---
### Comparison / Vs Article
```
Title: [Option A] vs [Option B]: [Decision Context]
Example: "Postgres vs MySQL: Which Database Fits Your SaaS in 2026?"
Introduction
- The decision the reader faces
- Who this comparison is for (skill level, use case)
- Summary verdict (give the answer up front, then prove it)
Quick Comparison Table
| Criteria | Option A | Option B |
|-----------------|----------------|----------------|
| [Criterion 1] | ... | ... |
| [Criterion 2] | ... | ... |
| Pricing | ... | ... |
| Best for | ... | ... |
Section: [Criterion 1] Deep Dive
- How A handles it
- How B handles it
- Verdict for this criterion
(repeat for each major criterion)
When to Choose A
- Bullet list of scenarios, use cases, or team profiles
When to Choose B
- Same structure
Final Recommendation
- Restate the summary verdict with nuance
- Suggest next steps (trial links, related guides)
```
**Key principle:** Be opinionated. Readers come to comparison articles for a recommendation, not a feature dump. State your pick early, then support it.
---
### Case Study
```
Title: How [Company/Person] [Achieved Result] with [Method/Tool]
Snapshot (sidebar or callout box)
- Company/person profile
- Challenge in one line
- Result in one line (with numbers)
- Timeline
The Challenge
- Situation before: pain points, constraints, failed attempts
- Why existing solutions weren't working
- Stakes: what would happen if unsolved
The Approach
- What they decided to do and why
- Implementation details (tools, process, decisions)
- Obstacles encountered during execution
The Results
- Quantified outcomes (before/after metrics)
- Qualitative outcomes (team sentiment, workflow changes)
- Timeline to results
Key Takeaways
- 2-4 lessons the reader can apply to their own situation
- What the subject would do differently next time (if anything)
```
**Key principle:** Specifics beat generalities. Use real numbers, timelines, and named tools. A case study without measurable results is just a testimonial.
---
### Thought Leadership
```
Title: [Contrarian Claim] or [Reframed Problem]
Examples: "Your Microservices Migration Will Fail — Here's Why"
"We've Been Thinking About Developer Productivity Wrong"
The Hook
- A bold claim, surprising stat, or industry assumption to challenge
- One paragraph max
The Conventional View
- What most people believe or do today
- Why it seems reasonable on the surface
The Shift
- What's changed (new data, your experience, a trend)
- Why the conventional view no longer holds
- Evidence: data, examples, analogies
The New Mental Model
- Your proposed way of thinking about this
- How it changes decisions or priorities
- 1-2 concrete examples of the new model applied
Implications
- What readers should do differently starting now
- What this means for the industry over the next 1-3 years
Close
- Restate the core insight in one sentence
- Invite discussion or point to your deeper work on this topic
```
**Key principle:** Thought leadership requires a genuine point of view. The article should change how the reader thinks, not just inform them.
---
## Persuasion Frameworks
### AIDA (Attention, Interest, Desire, Action)
Use AIDA to structure the emotional arc of an article, especially product-adjacent or tutorial content.
| Stage | Purpose | Tactics |
|-------|---------|---------|
| **Attention** | Stop the scroll. Earn the click. | Surprising stat, bold claim, relatable pain point in the title and opening line. |
| **Interest** | Convince them to keep reading. | Show you understand their situation. Introduce the core concept or framework. Use subheadings that promise value. |
| **Desire** | Make them want the outcome. | Show results: examples, screenshots, before/after. Paint a picture of life after applying the advice. |
| **Action** | Tell them what to do next. | Specific, low-friction CTA. One action, not five. "Clone the repo," "Try this query," "Read part 2." |
---
### PAS (Problem, Agitate, Solution)
Use PAS for introductions, email content, and articles addressing a known pain point.
| Stage | Purpose | Tactics |
|-------|---------|---------|
| **Problem** | Name the pain clearly. | Describe the situation in the reader's own words. Be specific — "your CI pipeline takes 40 minutes" beats "slow builds." |
| **Agitate** | Make the pain feel urgent. | Show the consequences: wasted time, lost revenue, compounding tech debt. Use "what happens if you don't fix this" framing. |
| **Solution** | Present the path forward. | Introduce your approach, tool, or framework. Transition into the body of the article. |
PAS works best in the first 3-5 paragraphs, then hand off to a structural template (How-To, Listicle, etc.) for the body.
---
## Introduction Patterns
Use one of these patterns for the opening 2-4 sentences. Match the pattern to the article type and audience.
**The Stat Drop**
Open with a surprising number, then connect it to the reader's world.
> "73% of API integrations fail in the first year — not because of bad code, but because of bad documentation."
**The Contrarian Hook**
Challenge a common belief head-on.
> "You don't need a content calendar. What you need is a content system."
**The Pain Mirror**
Describe the reader's frustration in their own words.
> "You've rewritten the onboarding flow three times this quarter. Each time, engagement drops again within a month."
**The Outcome Lead**
Start with the result, then explain how to get there.
> "Our deploy frequency went from weekly to 12x per day. Here's the infrastructure change that made it possible."
**The Story Open**
Begin with a brief, relevant anecdote (3 sentences max).
> "Last March, our team pushed a migration that broke checkout for 6 hours. The post-mortem revealed something we didn't expect."
**The Question**
Ask a question the reader is already asking themselves.
> "Why does every database migration guide assume you have zero traffic?"
---
## Conclusion Patterns
Every conclusion should do two things: (1) reinforce the core takeaway, and (2) give the reader a next step.
**The Recap + CTA**
Summarize the 2-3 key points, then give one clear action.
> "To recap: validate early, test with real data, and deploy incrementally. Ready to try it? Start with [specific first step]."
**The Implication Close**
Zoom out. Connect the article's advice to a bigger trend or outcome.
> "This isn't just about faster deploys — it's about building a team that ships with confidence."
**The Next Step Bridge**
Point to a logical follow-up resource or action.
> "Now that your monitoring is in place, the next step is setting up alerting thresholds. We cover that in [linked article]."
**The Challenge Close**
Issue a direct, friendly challenge to the reader.
> "Pick one of these patterns and apply it to your next pull request. See what changes."
**The Open Loop**
Tease upcoming content or unresolved questions to drive return visits.
> "We've covered the read path. In part 2, we'll tackle the write path — where the real complexity lives."

View File

@ -1,160 +0,0 @@
# Brand Voice & Tone Guidelines
Reference for maintaining consistent voice across all written content. These are defaults — override with client-specific guidelines when available.
---
## Voice Archetypes
Start with Expert but also work in Guide when appliciable.
### Expert
- **Sounds like:** A senior practitioner sharing hard-won knowledge.
- **Characteristics:** Precise, evidence-backed, confident without arrogance. Cites data, references real-world experience, and isn't afraid to say "it depends."
- **Typical vocabulary:** "In practice," "the tradeoff is," "based on our benchmarks," "here's why this matters."
- **Risk to avoid:** Coming across as condescending or overly academic.
- **Best for:** Technical audiences, B2B SaaS, engineering blogs, whitepapers.
### Guide
- **Sounds like:** A patient teacher walking you through something step by step.
- **Characteristics:** Clear, encouraging, anticipates confusion. Breaks complex ideas into digestible pieces. Uses analogies.
- **Typical vocabulary:** "Let's start with," "think of it like," "the key thing to remember," "don't worry if this seems complex."
- **Risk to avoid:** Being patronizing or oversimplifying for an advanced audience.
- **Best for:** Tutorials, onboarding content, documentation, beginner-to-intermediate audiences.
---
## Core Writing Principles
These apply regardless of archetype.
### 1. Clarity First
- If a sentence can be misread, rewrite it.
- Use the simplest word that conveys the precise meaning. "Use" over "utilize." "Start" over "commence."
- One idea per paragraph. One purpose per section.
- Define jargon on first use, or skip it entirely.
### 2. Customer-Centric
- Frame everything from the reader's perspective, not the company's.
- **Instead of:** "We built a new feature that enables real-time collaboration."
- **Write:** "You can now edit documents with your team in real time."
- Lead with the reader's problem or goal, not the product or solution.
### 3. Active Voice
- Active voice is the default. Passive voice is acceptable only when the actor is unknown or irrelevant.
- **Active:** "The script generates a report every morning."
- **Passive (acceptable):** "The logs are rotated every 24 hours." (The actor doesn't matter.)
- **Passive (avoid):** "A decision was made to deprecate the endpoint." (Who decided?)
### 4. Show, Don't Claim
- Replace vague claims with specific evidence.
- **Claim:** "Our platform is incredibly fast."
- **Show:** "Queries return in under 50ms at the 99th percentile."
- If you can't provide evidence, soften the language or cut the sentence.
---
## Tone Attributes
Tone shifts based on content type and audience. Use these spectrums to calibrate.
### Formality Spectrum
```
Casual -------|-------|-------|-------|------- Formal
1 2 3 4 5
```
| Level | Description | Use When |
|-------|-------------|----------|
| 1 | Slang OK, sentence fragments, first person | Internal team comms, very informal blogs |
| 2 | Conversational, contractions, direct address | Newsletters, community posts, most blog content |
| 3 | Professional but approachable, minimal contractions | Product announcements, mid-funnel content |
| 4 | Polished, structured, no contractions | Whitepapers, enterprise case studies, executive briefs |
| 5 | Formal, third person, precise terminology | Legal, compliance, academic partnerships |
**Default for most blog/article content: Level 2-3.**
### Technical Depth Spectrum
```
General -------|-------|-------|-------|------- Deep Technical
1 2 3 4 5
```
| Level | Description | Use When |
|-------|-------------|----------|
| 1 | No jargon, analogy-heavy, conceptual | Non-technical stakeholders, general audience |
| 2 | Light jargon (defined inline), practical focus | Business audience with some domain familiarity |
| 3 | Industry-standard terminology, code snippets OK | Practitioners who do the work daily |
| 4 | Assumes working knowledge, implementation details | Developers, engineers, technical decision-makers |
| 5 | Deep internals, performance analysis, tradeoff math | Senior engineers, architects, researchers |
**Default: Match the audience. When unsure, aim at what you think the audience can handle. We are mostly B2B.**
---
## Language Preferences
### Use Action Verbs
Lead sentences — especially headings and CTAs — with strong verbs.
| Weak | Strong |
|------|--------|
| There is a way to improve | Improve |
| This section is a discussion of | This section covers |
| You should consider using | Use |
| It is important to note that | Note: |
| We are going to walk through | Let's walk through |
### Be Concrete and Specific
Vague language erodes trust. Replace generalities with specifics.
| Vague | Concrete |
|-------|----------|
| "significantly faster" | "3x faster" or "reduced from 12s to 2s" |
| "a large number of users" | "over 40,000 monthly active users" |
| "best-in-class" | describe the specific advantage |
| "seamless integration" | "connects via a single API call" |
| "in the near future" | "by Q2" or "in the next release" |
### Avoid These Patterns
- **Weasel words:** "very," "really," "extremely," "quite," "somewhat" — cut them or replace with data.
- **Nominalizations:** "implementation" when you mean "implement," "utilization" when you mean "use."
- **Hedge stacking:** "It might potentially be possible to perhaps consider..." — commit to a position or state the uncertainty once, clearly.
- **Buzzword chains:** "AI-powered next-gen synergistic platform" — describe what it actually does.
---
## Pre-Publication Checklist
Run through this before publishing any piece of content.
### Voice Consistency
- [ ] Does the piece sound like one person wrote it, beginning to end?
- [ ] Does it match the target voice archetype?
- [ ] Are there jarring shifts in tone between sections?
### Clarity
- [ ] Can a reader in the target audience understand every sentence on the first read?
- [ ] Is jargon defined or avoided?
- [ ] Are all acronyms expanded on first use?
- [ ] Do headings accurately describe the content beneath them?
- [ ] Is the article scannable? (subheadings every 2-4 paragraphs, short paragraphs, lists where appropriate)
### Value
- [ ] Does the introduction make clear what the reader will gain?
- [ ] Does every section earn its place? (Cut anything that doesn't serve the reader's goal.)
- [ ] Are claims supported by evidence, examples, or data?
- [ ] Is the advice actionable — can the reader do something with it today?
- [ ] Does the conclusion provide a clear next step?
### Formatting
- [ ] Title includes the core keyword or topic and at least 2 closely related keyword's/topics.
- [ ] Meta description summarizes the value proposition.
- [ ] Code blocks, tables, and images have context (a sentence before them explaining what the reader is looking at).
- [ ] Links use descriptive anchor text, not "click here."
- [ ] No walls of text — maximum 5 sentences per paragraph for web content. Use a minimum of 2 sentences.

View File

@ -1,292 +0,0 @@
"""
Competitor Content Scraper
Fetches web pages and extracts clean text content for analysis.
Used as a utility when the user provides a list of URLs to examine.
Usage:
uv run --with requests,beautifulsoup4 python competitor_scraper.py URL1 URL2 ...
[--output-dir ./working/competitor_content/]
[--format json|text]
"""
import argparse
import json
import re
import sys
import time
from pathlib import Path
from urllib.parse import urlparse
try:
import requests
from bs4 import BeautifulSoup
except ImportError:
print(
"Error: requests and beautifulsoup4 are required.\n"
"Install with: uv add requests beautifulsoup4",
file=sys.stderr,
)
sys.exit(1)
UNWANTED_TAGS = [
"nav", "footer", "header", "aside", "script", "style", "noscript",
"iframe", "form", "button", "svg", "img", "video", "audio",
]
UNWANTED_CLASSES = [
"nav", "navbar", "navigation", "menu", "sidebar", "footer", "header",
"breadcrumb", "cookie", "popup", "modal", "advertisement", "ad-",
"social", "share", "comment", "related-posts",
]
DEFAULT_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
class CompetitorScraper:
"""Fetches and cleans web page content for competitor analysis."""
def __init__(self, timeout: int = 15, delay: float = 1.0):
"""
Args:
timeout: Request timeout in seconds.
delay: Delay between requests in seconds (rate limiting).
"""
self.timeout = timeout
self.delay = delay
self.session = requests.Session()
self.session.headers.update(DEFAULT_HEADERS)
def scrape_url(self, url: str) -> dict:
"""Scrape a single URL and extract clean content.
Returns:
Dict with: url, host, title, meta_description, headings, text, word_count, error
"""
result = {
"url": url,
"host": urlparse(url).netloc,
"title": "",
"meta_description": "",
"headings": [],
"text": "",
"word_count": 0,
"error": None,
}
try:
response = self.session.get(url, timeout=self.timeout)
response.raise_for_status()
response.encoding = response.apparent_encoding or "utf-8"
html = response.text
except requests.RequestException as e:
result["error"] = str(e)
return result
soup = BeautifulSoup(html, "html.parser")
# Extract title
title_tag = soup.find("title")
if title_tag:
result["title"] = title_tag.get_text(strip=True)
# Extract meta description
meta_desc = soup.find("meta", attrs={"name": "description"})
if meta_desc and meta_desc.get("content"):
result["meta_description"] = meta_desc["content"].strip()
# Extract headings before cleaning
result["headings"] = self._extract_headings(soup)
# Clean the HTML and extract main text
result["text"] = self._extract_text(soup)
result["word_count"] = len(result["text"].split())
return result
def scrape_urls(self, urls: list[str]) -> list[dict]:
"""Scrape multiple URLs with rate limiting.
Args:
urls: List of URLs to scrape.
Returns:
List of result dicts from scrape_url.
"""
results = []
for i, url in enumerate(urls):
if i > 0:
time.sleep(self.delay)
print(f" Scraping [{i + 1}/{len(urls)}]: {url}", file=sys.stderr)
result = self.scrape_url(url)
if result["error"]:
print(f" Error: {result['error']}", file=sys.stderr)
else:
print(f" OK: {result['word_count']} words", file=sys.stderr)
results.append(result)
return results
def save_results(self, results: list[dict], output_dir: str) -> list[str]:
"""Save scraped results as individual text files.
Args:
results: List of result dicts from scrape_urls.
output_dir: Directory to write files to.
Returns:
List of file paths written.
"""
out_path = Path(output_dir)
out_path.mkdir(parents=True, exist_ok=True)
saved = []
for result in results:
if result["error"] or not result["text"]:
continue
# Create filename from host
host = result["host"].replace("www.", "")
safe_name = re.sub(r'[^\w\-.]', '_', host)
filepath = out_path / f"{safe_name}.txt"
content = self._format_output(result)
filepath.write_text(content, encoding="utf-8")
saved.append(str(filepath))
return saved
def _extract_headings(self, soup: BeautifulSoup) -> list[dict]:
"""Extract all headings (h1-h6) with their level and text."""
headings = []
for tag in soup.find_all(re.compile(r'^h[1-6]$')):
level = int(tag.name[1])
text = tag.get_text(strip=True)
if text:
headings.append({"level": level, "text": text})
return headings
def _extract_text(self, soup: BeautifulSoup) -> str:
"""Extract clean body text from HTML, stripping navigation and boilerplate."""
# Remove unwanted tags
for tag_name in UNWANTED_TAGS:
for tag in soup.find_all(tag_name):
tag.decompose()
# Remove elements with unwanted class names
for element in list(soup.find_all(True)):
if element.attrs is None:
continue
classes = element.get("class", [])
if isinstance(classes, list):
class_str = " ".join(classes).lower()
else:
class_str = str(classes).lower()
el_id = str(element.get("id", "")).lower()
for pattern in UNWANTED_CLASSES:
if pattern in class_str or pattern in el_id:
element.decompose()
break
# Try to find main content area
main_content = (
soup.find("main")
or soup.find("article")
or soup.find("div", {"role": "main"})
or soup.find("div", class_=re.compile(r'content|article|post|entry', re.I))
or soup.body
or soup
)
# Extract text with some structure preserved
text = main_content.get_text(separator="\n", strip=True)
# Clean up excessive whitespace
lines = []
for line in text.splitlines():
line = line.strip()
if line:
lines.append(line)
return "\n".join(lines)
def _format_output(self, result: dict) -> str:
"""Format a single result as a readable text file."""
lines = [
f"URL: {result['url']}",
f"Title: {result['title']}",
f"Meta Description: {result['meta_description']}",
f"Word Count: {result['word_count']}",
"",
"--- HEADINGS ---",
]
for h in result["headings"]:
indent = " " * (h["level"] - 1)
lines.append(f"{indent}H{h['level']}: {h['text']}")
lines.extend(["", "--- CONTENT ---", "", result["text"]])
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(description="Scrape competitor web pages for content analysis")
parser.add_argument("urls", nargs="+", help="URLs to scrape")
parser.add_argument(
"--output-dir",
default="./working/competitor_content",
help="Directory to save scraped content (default: ./working/competitor_content/)",
)
parser.add_argument(
"--format",
choices=["json", "text"],
default="text",
help="Output format for stdout (default: text)",
)
parser.add_argument(
"--timeout",
type=int,
default=15,
help="Request timeout in seconds (default: 15)",
)
parser.add_argument(
"--delay",
type=float,
default=1.0,
help="Delay between requests in seconds (default: 1.0)",
)
args = parser.parse_args()
scraper = CompetitorScraper(timeout=args.timeout, delay=args.delay)
results = scraper.scrape_urls(args.urls)
# Save files
saved = scraper.save_results(results, args.output_dir)
print(f"\nSaved {len(saved)} files to {args.output_dir}", file=sys.stderr)
# Output to stdout
successful = [r for r in results if not r["error"]]
if args.format == "json":
print(json.dumps(successful, indent=2))
else:
for r in successful:
print(scraper._format_output(r))
print("\n" + "=" * 80 + "\n")
if __name__ == "__main__":
main()

View File

@ -1,984 +0,0 @@
"""
Cora SEO Report Parser
Reads a Cora XLSX file and extracts structured data from relevant sheets.
Used as a foundation module by entity_optimizer, lsi_optimizer, and seo_optimizer.
Usage:
uv run --with openpyxl python cora_parser.py <xlsx_path> [--sheet SHEET] [--format FORMAT]
Options:
--sheet Which data to extract: entities, lsi, variations, results, tunings,
structure, densities, targets, summary, all (default: summary)
--format Output format: json, text (default: text)
"""
import argparse
import json
import math
import re
import sys
from pathlib import Path
try:
import openpyxl
except ImportError:
print("Error: openpyxl is required. Install with: uv add openpyxl", file=sys.stderr)
sys.exit(1)
# =============================================================================
# Optimization Rules
#
# Hard-wired overrides that apply regardless of what Cora data says.
# These encode expert SEO knowledge and practical constraints.
# =============================================================================
OPTIMIZATION_RULES = {
# Heading rules
"h1_max": 1, # Never more than 1 H1
"h1_min": 1, # Always have exactly 1 H1
"optimize_headings": ["h1", "h2", "h3"], # Primary optimization targets
"low_priority_headings": ["h4"], # Only add if most competitors have them
"ignore_headings": ["h5", "h6"], # Skip entirely
# Keyword density
"exact_match_density_min": 0.02, # 2% minimum for exact match keyword
"no_keyword_stuffing_limit": True, # Do NOT flag for keyword stuffing
# Variations capture exact match, so hitting variation density covers it
# Word count strategy
"word_count_strategy": "cluster", # "cluster" = nearest competitive cluster, not raw average
"word_count_acceptable_max": 1500, # Up to 1500 is always acceptable even if target is lower
# Density awareness
"density_interdependent": True, # Adding content changes all density calculations
# Entity / LSI filtering
"exclude_competitor_entities": True, # Never use competitor company names as entities or LSI
"exclude_measurement_entities": True, # Ignore measurements (dimensions, tolerances) as entities
"allow_organization_entities": True, # Organizations like ISO, ANSI, etc. are OK
"never_mention_competitors": True, # Never mention competitors by name in content
# Entity correlation threshold
# Best of Both = lower of Spearman's or Pearson's correlation.
# Measures correlation to ranking position (1=top, 100=bottom), so negative = better ranking.
# Only include entities with Best of Both <= this value.
# Set to None to disable filtering.
"entity_correlation_threshold": -0.19,
}
class CoraReport:
"""Parses a Cora SEO XLSX report and provides structured access to its data."""
def __init__(self, xlsx_path: str):
self.path = Path(xlsx_path)
if not self.path.exists():
raise FileNotFoundError(f"XLSX file not found: {xlsx_path}")
self.wb = openpyxl.load_workbook(str(self.path), data_only=True)
self._site_domain = None # Cached after first detection
# -------------------------------------------------------------------------
# Core metadata
# -------------------------------------------------------------------------
def get_sheet_names(self) -> list[str]:
return self.wb.sheetnames
def get_search_term(self) -> str:
"""Extract the target keyword from the report."""
for sheet_name in ["Basic Tunings", "Strategic Overview", "Structure"]:
if sheet_name not in self.wb.sheetnames:
continue
ws = self.wb[sheet_name]
for row in ws.iter_rows(min_row=1, max_row=10, values_only=True):
if row and row[0] == "Search Terms" and len(row) > 1 and row[1]:
return str(row[1])
return ""
def get_variations_list(self) -> list[str]:
"""Extract the keyword variations list from Strategic Overview B10.
These are pipe-delimited inside curly braces:
{cnc screw|cnc screw machining|cnc swiss|...}
"""
if "Strategic Overview" not in self.wb.sheetnames:
return []
ws = self.wb["Strategic Overview"]
rows = list(ws.iter_rows(min_row=1, max_row=12, values_only=True))
for row in rows:
if row and row[0] == "Keywords" and len(row) > 1 and row[1]:
raw = str(row[1]).strip()
# Remove curly braces and split on pipe
raw = raw.strip("{}")
return [v.strip() for v in raw.split("|") if v.strip()]
return []
def get_site_domain(self) -> str:
"""Detect the user's site domain from the report.
Looks for the domain in the Entities sheet header (column with a .com/.net etc.
that isn't a standard Cora column) or the site column in other sheets.
"""
if self._site_domain:
return self._site_domain
# Try Entities sheet first
if "Entities" in self.wb.sheetnames:
ws = self.wb["Entities"]
rows = list(ws.iter_rows(min_row=1, max_row=5, values_only=True))
for row in rows:
if row and row[0] == "Entity":
for h in row:
if h and isinstance(h, str):
h = h.strip()
if re.match(r'^[a-zA-Z0-9-]+\.[a-zA-Z]{2,}$', h):
self._site_domain = h
return h
# Try LSI Keywords sheet — header like "#40.7 hoggeprecision.com"
if "LSI Keywords" in self.wb.sheetnames:
ws = self.wb["LSI Keywords"]
rows = list(ws.iter_rows(min_row=1, max_row=10, values_only=True))
for row in rows:
if row and row[0] == "LSI Keyword":
for h in row:
if h and isinstance(h, str):
match = re.search(r'([a-zA-Z0-9-]+\.[a-zA-Z]{2,})', h.strip())
if match:
self._site_domain = match.group(1)
return self._site_domain
return ""
# -------------------------------------------------------------------------
# Entities
# -------------------------------------------------------------------------
def get_entities(self) -> list[dict]:
"""Extract entities from the Entities sheet.
Returns list of dicts with: name, freebase_id, wikidata_id, wiki_link,
relevance, confidence, type, correlation, current_count, max_count, deficit
"""
if "Entities" not in self.wb.sheetnames:
return []
ws = self.wb["Entities"]
rows = list(ws.iter_rows(values_only=True))
# Find header row containing "Entity", "Freebase ID", etc.
header_idx = None
for i, row in enumerate(rows):
if row and row[0] == "Entity" and len(row) > 1 and row[1] == "Freebase ID":
header_idx = i
break
if header_idx is None:
return []
headers = rows[header_idx]
col_map = {str(h).strip(): j for j, h in enumerate(headers) if h}
# Find the site-specific column (domain name like "hoggeprecision.com")
site_col_idx = None
site_domain = self.get_site_domain()
if site_domain:
site_col_idx = col_map.get(site_domain)
entities = []
for row in rows[header_idx + 1:]:
if not row or not row[0]:
continue
name = str(row[0]).strip()
if not name:
continue
# Skip rows that look like metadata (e.g., "critical values: ...")
if name.startswith("critical") or name.startswith("http"):
continue
correlation = _safe_float(row, col_map.get("Best of Both"))
# Filter by Best of Both correlation threshold.
# Lower (more negative) = stronger ranking signal (correlates with
# position 1 vs 100). Only keep entities at or below the threshold.
threshold = OPTIMIZATION_RULES.get("entity_correlation_threshold")
if threshold is not None and (correlation is None or correlation > threshold):
continue
entity = {
"name": name,
"freebase_id": _safe_str(row, col_map.get("Freebase ID")),
"wikidata_id": _safe_str(row, col_map.get("Wikidata ID")),
"wiki_link": _safe_str(row, col_map.get("Wiki Link")),
"relevance": _safe_float(row, col_map.get("Relevance")),
"confidence": _safe_float(row, col_map.get("Confidence")),
"type": _safe_str(row, col_map.get("Type")),
"correlation": correlation,
"current_count": _safe_int(row, site_col_idx),
"max_count": _safe_int(row, col_map.get("Max")),
"deficit": _safe_int(row, col_map.get("Deficit")),
}
entities.append(entity)
return entities
# -------------------------------------------------------------------------
# LSI Keywords
# -------------------------------------------------------------------------
def get_lsi_keywords(self) -> list[dict]:
"""Extract LSI keywords from the LSI Keywords sheet.
Returns list of dicts with: keyword, spearmans, pearsons, best_of_both,
pages, max, avg, current_count, deficit
"""
if "LSI Keywords" not in self.wb.sheetnames:
return []
ws = self.wb["LSI Keywords"]
rows = list(ws.iter_rows(values_only=True))
# Find header row containing "LSI Keyword", "Spearmans", etc.
header_idx = None
for i, row in enumerate(rows):
if row and row[0] == "LSI Keyword":
header_idx = i
break
if header_idx is None:
return []
headers = rows[header_idx]
col_map = {str(h).strip(): j for j, h in enumerate(headers) if h}
# Find site column — pattern like "#40.7 hoggeprecision.com"
site_col_idx = None
site_domain = self.get_site_domain()
if site_domain:
for j, h in enumerate(headers):
if h and isinstance(h, str) and site_domain in h:
site_col_idx = j
break
if site_col_idx is None:
site_col_idx = _find_site_col_idx(headers)
lsi_keywords = []
for row in rows[header_idx + 1:]:
if not row or not row[0]:
continue
keyword = str(row[0]).strip()
if not keyword:
continue
lsi = {
"keyword": keyword,
"spearmans": _safe_float(row, col_map.get("Spearmans")),
"pearsons": _safe_float(row, col_map.get("Pearsons")),
"best_of_both": _safe_float(row, col_map.get("Best of Both")),
"pages": _safe_int(row, col_map.get("Pages")),
"max": _safe_int(row, col_map.get("Max")),
"avg": _safe_float(row, col_map.get("Avg")),
"current_count": _safe_int(row, site_col_idx),
"deficit": _safe_float(row, col_map.get("Deficit")),
}
lsi_keywords.append(lsi)
return lsi_keywords
# -------------------------------------------------------------------------
# Keyword Variations
# -------------------------------------------------------------------------
def get_keyword_variations(self) -> list[dict]:
"""Extract keyword variation counts from the Variations sheet.
Returns list of dicts with: variation, page1_max, page1_avg
"""
if "Variations" not in self.wb.sheetnames:
return []
ws = self.wb["Variations"]
rows = list(ws.iter_rows(values_only=True))
if not rows or len(rows) < 3:
return []
header_row = rows[0]
# Find where variation columns start (after "# used" column)
var_start = 3 # default
for j, h in enumerate(header_row):
if h and str(h).strip() == "# used":
var_start = j + 1
break
max_row = rows[1] if len(rows) > 1 else None
avg_row = rows[2] if len(rows) > 2 else None
variations = []
for j in range(var_start, len(header_row)):
name = header_row[j]
if not name:
continue
variation = {
"variation": str(name).strip(),
"page1_max": _safe_int(max_row, j) if max_row else 0,
"page1_avg": _safe_int(avg_row, j) if avg_row else 0,
}
variations.append(variation)
return variations
# -------------------------------------------------------------------------
# Structure Targets (per-element targets from Structure sheet)
# -------------------------------------------------------------------------
def get_structure_targets(self) -> dict:
"""Extract per-element optimization targets from the Structure sheet.
Returns a dict keyed by element type with sub-targets:
{
"title_tag": {"exact_match": 0.2, "variations": 1.3, "entities": 5.8, "lsi_words": 10.7},
"meta_description": {...},
"all_h_tags": {"count": 20.7, "exact_match": 0.4, "variations": 5.7, "entities": 45.8, "lsi_words": 77.4},
"h1": {"count": 1.1, "exact_match": 0.1, "variations": 1, "entities": 3.8, "lsi_words": 7.3},
"h2": {...},
"h3": {...},
"h4": {...},
}
Page 1 Average values are in column D (index 3).
"""
if "Structure" not in self.wb.sheetnames:
return {}
ws = self.wb["Structure"]
rows = list(ws.iter_rows(values_only=True))
# Find the header row with "Factor Name", "Page 1 Avg" etc.
header_idx = None
for i, row in enumerate(rows):
if row and len(row) > 3:
if row[2] == "Factor Name" or (row[1] == "Factor ID" and row[2] == "Factor Name"):
header_idx = i
break
# Also check for the combined "Best of Both Correlation" header
if row[0] and "Best of Both" in str(row[0]):
header_idx = i
break
if header_idx is None:
return {}
# Parse factor rows into sections
# Section headers: "TITLE TAG", "META DESCRIPTION", "TOTAL FOR ALL H TAGS",
# "H1 Data", "H2 Data", "H3 Data", "H4 Data", "H5 Data", "H6 Data"
section_map = {
"TITLE TAG": "title_tag",
"META DESCRIPTION": "meta_description",
"TOTAL FOR ALL H TAGS": "all_h_tags",
"H1 Data": "h1",
"H2 Data": "h2",
"H3 Data": "h3",
"H4 Data": "h4",
}
# Factor name patterns to field names
factor_patterns = {
"Number of": "count",
"Exact Match": "exact_match",
"Variation": "variations",
"Entities": "entities",
"LSI": "lsi_words",
"Search Term": "search_terms",
"Keywords": "keywords",
}
targets = {}
current_section = None
for row in rows[header_idx + 1:]:
if not row or len(row) < 4:
continue
factor_name = _safe_str(row, 2)
# Check if this is a section header
if factor_name in section_map:
current_section = section_map[factor_name]
targets[current_section] = {}
continue
# Skip sections we don't care about (H5, H6)
if factor_name in ("H5 Data", "H6 Data"):
current_section = None
continue
if current_section is None:
continue
# Get the Page 1 Average (column D, index 3)
avg_val = _safe_float(row, 3)
if avg_val is None:
continue
# Map factor name to field
field_name = None
for pattern, field in factor_patterns.items():
if pattern.lower() in factor_name.lower():
field_name = field
break
if field_name and current_section:
# Also grab correlation from column A
correlation = _safe_float(row, 0)
# Outlier detection: check if one of the top 10 results
# contributes >50% of the sum. If so, exclude it and
# recompute the average — that outlier is skewing the target.
top10 = [_safe_float(row, j) or 0 for j in range(4, 14)]
top10_sum = sum(top10)
adjusted_avg = avg_val
outlier_detected = False
if top10_sum > 0:
max_val = max(top10)
if max_val > top10_sum * 0.5 and avg_val > 1:
# One result is >50% of the total — outlier.
# Skip adjustment when avg <= 1: a single "1" among
# zeros triggers the rule but the target is already
# small enough that adjustment would zero it out.
remaining = [v for v in top10 if v != max_val]
# If max_val appears multiple times, only remove one
if len(remaining) == len(top10):
remaining = top10[:]
remaining.remove(max_val)
if remaining:
adjusted_avg = sum(remaining) / len(remaining)
outlier_detected = True
target_val = math.ceil(adjusted_avg)
entry = {
"avg": avg_val,
"target": target_val,
"correlation": correlation,
}
if outlier_detected:
entry["outlier_adjusted"] = True
entry["original_target"] = math.ceil(avg_val)
targets[current_section][field_name] = entry
return targets
# -------------------------------------------------------------------------
# Density Targets (from Strategic Overview rows 46-48)
# -------------------------------------------------------------------------
def get_density_targets(self) -> dict:
"""Extract density targets from Strategic Overview rows 46-48.
Row 46: Variation density
Row 47: Entity density
Row 48: LSI density
Column D (index 3) = Page 1 Average.
Returns per-result values so we can show distribution.
"""
if "Strategic Overview" not in self.wb.sheetnames:
return {}
ws = self.wb["Strategic Overview"]
rows = list(ws.iter_rows(values_only=True))
# Find the density rows — they're the last 3 non-empty rows in the data section
# Look for them near row 46-48 area, identified by having floats in col D
# and being near the bottom of the data
# Approach: find the row with "Relevant Density" and the 3 rows after the gap
density_area_start = None
for i, row in enumerate(rows):
if row and len(row) > 2 and row[2] == "Relevant Density":
# Density target rows are a few rows below this
density_area_start = i
break
if density_area_start is None:
return {}
# The 3 density rows come after a gap. They have NO values in cols A, B, C —
# only numeric values from col D onward. Row 44 (which has a correlation in
# col A) is a count row, not a density row, so we skip it.
density_rows = []
for i in range(density_area_start + 1, min(density_area_start + 10, len(rows))):
row = rows[i]
if not row:
continue
col_a = row[0] if len(row) > 0 else None
col_b = row[1] if len(row) > 1 else None
col_c = row[2] if len(row) > 2 else None
col_d = row[3] if len(row) > 3 else None
# Density rows have None in A, B, C and a float in D
if col_a is None and col_b is None and col_c is None and col_d is not None:
try:
float(col_d)
density_rows.append(row)
except (ValueError, TypeError):
pass
# Get result domains from row 22 area for the site column
result_start_col = 4 # Results start at col E (index 4)
result = {}
labels = ["variation_density", "entity_density", "lsi_density"]
for idx, label in enumerate(labels):
if idx >= len(density_rows):
break
row = density_rows[idx]
avg = _safe_float(row, 3)
# Collect per-competitor values
competitor_vals = []
for j in range(result_start_col, min(result_start_col + 10, len(row))):
v = _safe_float(row, j)
if v is not None:
competitor_vals.append(v)
result[label] = {
"avg": avg,
"avg_pct": f"{avg * 100:.2f}%" if avg else "N/A",
"competitor_values": competitor_vals,
}
return result
# -------------------------------------------------------------------------
# Content Targets (word count, distinct entities, etc.)
# -------------------------------------------------------------------------
def get_content_targets(self) -> dict:
"""Extract key content-level targets from Strategic Overview.
Includes: word count distribution, distinct entities target, variations in HTML, etc.
"""
if "Strategic Overview" not in self.wb.sheetnames:
return {}
ws = self.wb["Strategic Overview"]
rows = list(ws.iter_rows(values_only=True))
targets = {}
result_start_col = 4
for i, row in enumerate(rows):
if not row or len(row) < 4:
continue
factor_name = _safe_str(row, 2)
factor_id = _safe_str(row, 1)
correlation = _safe_float(row, 0)
avg = _safe_float(row, 3)
if not factor_name or avg is None:
continue
# Key factors we care about
if factor_name == "Number of Distinct Entities Used":
competitor_vals = []
for j in range(result_start_col, min(result_start_col + 10, len(row))):
v = _safe_float(row, j)
if v is not None:
competitor_vals.append(int(v))
targets["distinct_entities"] = {
"factor_id": factor_id,
"avg": avg,
"target": math.ceil(avg),
"correlation": correlation,
"competitor_values": competitor_vals,
}
elif factor_name == "Variations in HTML Tags":
targets["variations_in_html"] = {
"factor_id": factor_id,
"avg": avg,
"target": math.ceil(avg),
"correlation": correlation,
}
elif factor_name == "Entities in the HTML Tag":
targets["entities_in_html"] = {
"factor_id": factor_id,
"avg": avg,
"target": math.ceil(avg),
"correlation": correlation,
}
return targets
def get_word_count_distribution(self) -> dict:
"""Get word count data for competitive cluster analysis.
Returns the clean word count for each competitor from the Keywords sheet,
sorted ascending, plus the Page 1 Average and suggested cluster target.
"""
if "Keywords" not in self.wb.sheetnames:
return {}
ws = self.wb["Keywords"]
rows = list(ws.iter_rows(values_only=True))
if not rows:
return {}
headers = rows[0]
col_map = {str(h).strip(): j for j, h in enumerate(headers) if h}
host_idx = col_map.get("Host")
clean_wc_idx = col_map.get("Clean Word Count")
if host_idx is None or clean_wc_idx is None:
return {}
# Collect word counts for page 1 results (top 10)
competitors = []
for row in rows[1:11]:
if not row or not row[host_idx]:
continue
wc = _safe_int(row, clean_wc_idx)
if wc and wc > 0:
competitors.append({
"host": str(row[host_idx]),
"clean_word_count": wc,
})
if not competitors:
return {}
# Sort by word count
competitors.sort(key=lambda x: x["clean_word_count"])
counts = [c["clean_word_count"] for c in competitors]
# Calculate cluster target
avg = sum(counts) / len(counts)
median = counts[len(counts) // 2]
cluster_target = _find_cluster_target(counts)
return {
"competitors": competitors,
"counts_sorted": counts,
"average": round(avg),
"median": median,
"cluster_target": cluster_target,
"min": counts[0],
"max": counts[-1],
}
# -------------------------------------------------------------------------
# Basic Tunings
# -------------------------------------------------------------------------
def get_basic_tunings(self) -> list[dict]:
"""Extract on-page tuning factors from the Basic Tunings sheet."""
if "Basic Tunings" not in self.wb.sheetnames:
return []
ws = self.wb["Basic Tunings"]
rows = list(ws.iter_rows(values_only=True))
# Find sub-header row with "Factor ID", "Factor"
header_idx = None
for i, row in enumerate(rows):
if row and len(row) > 2 and row[1] == "Factor ID" and row[2] == "Factor":
header_idx = i
break
if header_idx is None:
return []
tunings = []
for row in rows[header_idx + 1:]:
if not row:
continue
factor_id = row[1] if len(row) > 1 else None
if not factor_id or not str(factor_id).strip():
continue
factor_id_str = str(factor_id).strip()
if not re.match(r'^[A-Z]{2,}\d+', factor_id_str):
continue
tuning = {
"factor_id": factor_id_str,
"factor": _safe_str(row, 2),
"current": _safe_str(row, 3),
"goal": _safe_str(row, 4),
"percent": _safe_float(row, 5),
"recommendation": _safe_str(row, 6),
}
tunings.append(tuning)
return tunings
# -------------------------------------------------------------------------
# Competitor URLs (Results sheet)
# -------------------------------------------------------------------------
def get_competitor_urls(self) -> list[dict]:
"""Extract competitor URLs from the Results sheet."""
if "Results" not in self.wb.sheetnames:
return []
ws = self.wb["Results"]
rows = list(ws.iter_rows(values_only=True))
if not rows:
return []
headers = rows[0]
col_map = {str(h).strip(): j for j, h in enumerate(headers) if h}
results = []
for row in rows[1:]:
if not row or not row[0]:
continue
result = {
"rank": _safe_int(row, col_map.get("Rank")),
"host": _safe_str(row, col_map.get("Host")),
"url": _safe_str(row, col_map.get("URL")),
"title": _safe_str(row, col_map.get("Link Text")),
"summary": _safe_str(row, col_map.get("Summary")),
}
results.append(result)
return results
# -------------------------------------------------------------------------
# Summary
# -------------------------------------------------------------------------
def get_summary(self) -> dict:
"""Get a high-level summary of the Cora report with all key targets."""
entities = self.get_entities()
lsi = self.get_lsi_keywords()
variations = self.get_variations_list()
tunings = self.get_basic_tunings()
results = self.get_competitor_urls()
density = self.get_density_targets()
content = self.get_content_targets()
wc_dist = self.get_word_count_distribution()
# Find word count goal from tunings
word_count_goal = None
for t in tunings:
if t["factor"] == "Word Count":
word_count_goal = t["goal"]
break
entities_with_deficit = [e for e in entities if e["deficit"] and e["deficit"] > 0]
lsi_with_deficit = [l for l in lsi if l["deficit"] and l["deficit"] > 0]
return {
"search_term": self.get_search_term(),
"site_domain": self.get_site_domain(),
"keyword_variations": variations,
"total_entities": len(entities),
"entities_with_deficit": len(entities_with_deficit),
"total_lsi_keywords": len(lsi),
"lsi_with_deficit": len(lsi_with_deficit),
"word_count_goal": word_count_goal,
"word_count_cluster_target": wc_dist.get("cluster_target"),
"word_count_distribution": wc_dist.get("counts_sorted", []),
"variation_density_avg": density.get("variation_density", {}).get("avg_pct"),
"entity_density_avg": density.get("entity_density", {}).get("avg_pct"),
"lsi_density_avg": density.get("lsi_density", {}).get("avg_pct"),
"distinct_entities_target": content.get("distinct_entities", {}).get("target"),
"competitors_analyzed": len(results),
"tuning_factors": len(tunings),
"optimization_rules": OPTIMIZATION_RULES,
}
# =============================================================================
# Helper functions
# =============================================================================
def _safe_str(row, idx) -> str:
if idx is None or idx >= len(row) or row[idx] is None:
return ""
return str(row[idx]).strip()
def _safe_float(row, idx) -> float | None:
if idx is None or idx >= len(row) or row[idx] is None:
return None
try:
return float(row[idx])
except (ValueError, TypeError):
return None
def _safe_int(row, idx) -> int | None:
if idx is None or idx >= len(row) or row[idx] is None:
return None
try:
return int(float(row[idx]))
except (ValueError, TypeError):
return None
def _find_site_col_idx(headers) -> int | None:
"""Find site column by looking for domain pattern in header values."""
for j, h in enumerate(headers):
if h and isinstance(h, str):
h_str = h.strip()
if re.search(r'[a-zA-Z0-9-]+\.[a-zA-Z]{2,}', h_str):
# Skip known non-site headers
if h_str in ("Best of Both", "LSI Keyword"):
continue
return j
return None
def _find_cluster_target(counts: list[int]) -> int:
"""Find the nearest competitive cluster target for word count.
Strategy: Don't use the raw average (skewed by outliers).
Instead, find clusters of 3+ competitors within 30% of each other
and target slightly above the nearest cluster's center.
"""
if not counts:
return 0
if len(counts) <= 3:
return math.ceil(max(counts) * 1.05)
# Simple clustering: find the densest grouping
best_cluster = []
for i in range(len(counts)):
cluster = [counts[i]]
for j in range(i + 1, len(counts)):
# Within 40% range of the cluster start
if counts[j] <= counts[i] * 1.4:
cluster.append(counts[j])
else:
break
if len(cluster) >= len(best_cluster):
best_cluster = cluster
if best_cluster:
cluster_avg = sum(best_cluster) / len(best_cluster)
# Target slightly above the cluster average
return math.ceil(cluster_avg * 1.05)
# Fallback: median + 5%
median = counts[len(counts) // 2]
return math.ceil(median * 1.05)
# =============================================================================
# Output formatting
# =============================================================================
def format_text(data, label: str = "") -> str:
"""Format data as human-readable text."""
lines = []
if label:
lines.append(f"=== {label} ===")
lines.append("")
if isinstance(data, dict):
for key, value in data.items():
if isinstance(value, list) and len(value) > 5:
lines.append(f" {key}: [{len(value)} items]")
elif isinstance(value, dict):
lines.append(f" {key}:")
for k2, v2 in value.items():
lines.append(f" {k2}: {v2}")
else:
lines.append(f" {key}: {value}")
elif isinstance(data, list):
for i, item in enumerate(data):
if isinstance(item, dict):
lines.append(f" [{i + 1}]")
for key, value in item.items():
lines.append(f" {key}: {value}")
else:
lines.append(f" [{i + 1}] {item}")
lines.append("")
return "\n".join(lines)
# =============================================================================
# CLI
# =============================================================================
def main():
parser = argparse.ArgumentParser(description="Parse a Cora SEO XLSX report")
parser.add_argument("xlsx_path", help="Path to the Cora XLSX file")
parser.add_argument(
"--sheet",
choices=[
"entities", "lsi", "variations", "results", "tunings",
"structure", "densities", "targets", "wordcount", "summary", "all",
],
default="summary",
help="Which data to extract (default: summary)",
)
parser.add_argument(
"--format",
choices=["json", "text"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--top-n",
type=int,
default=0,
help="Limit output to top N results (0 = all)",
)
args = parser.parse_args()
report = CoraReport(args.xlsx_path)
extractors = {
"entities": ("Entities", report.get_entities),
"lsi": ("LSI Keywords", report.get_lsi_keywords),
"variations": ("Keyword Variations", lambda: report.get_keyword_variations()),
"results": ("Competitor URLs", report.get_competitor_urls),
"tunings": ("Basic Tunings", report.get_basic_tunings),
"structure": ("Structure Targets", report.get_structure_targets),
"densities": ("Density Targets", report.get_density_targets),
"targets": ("Content Targets", report.get_content_targets),
"wordcount": ("Word Count Distribution", report.get_word_count_distribution),
"summary": ("Summary", report.get_summary),
}
if args.sheet == "all":
sheets_to_show = ["summary", "structure", "densities", "targets", "wordcount"]
else:
sheets_to_show = [args.sheet]
for sheet_key in sheets_to_show:
label, extractor = extractors[sheet_key]
data = extractor()
if args.top_n > 0 and isinstance(data, list):
data = data[:args.top_n]
if args.format == "json":
print(json.dumps(data, indent=2, default=str))
else:
print(format_text(data, label))
if __name__ == "__main__":
main()

View File

@ -1,455 +0,0 @@
#!/usr/bin/env python3
"""
Entity Optimizer Cora Entity Analysis for Content Drafts
Counts Cora-defined entities in a markdown content draft and recommends
additions based on relevance and deficit data from a Cora XLSX report.
Usage:
uv run --with openpyxl python entity_optimizer.py <draft_path> <cora_xlsx_path> [--format json|text] [--top-n 30]
Options:
--format Output format: json or text (default: text)
--top-n Number of top recommendations to show (default: 30)
"""
import argparse
import json
import re
import sys
from pathlib import Path
from cora_parser import CoraReport
class EntityOptimizer:
"""Analyzes a content draft against Cora entity targets and recommends additions."""
def __init__(self, cora_xlsx_path: str):
"""Load entity targets from a Cora XLSX report.
Args:
cora_xlsx_path: Path to the Cora SEO XLSX file.
"""
self.report = CoraReport(cora_xlsx_path)
self.entities = self.report.get_entities()
self.search_term = self.report.get_search_term()
# Populated after analyze_draft() is called
self.draft_text = ""
self.sections = [] # list of {"heading": str, "level": int, "text": str}
self.entity_counts = {} # entity name -> {"total": int, "per_section": {heading: count}}
def analyze_draft(self, draft_path: str) -> dict:
"""Run a full analysis of a content draft against Cora entity targets.
Args:
draft_path: Path to a markdown content draft file.
Returns:
dict with keys: summary, entity_counts, deficits, recommendations, section_density
"""
path = Path(draft_path)
if not path.exists():
raise FileNotFoundError(f"Draft file not found: {draft_path}")
self.draft_text = path.read_text(encoding="utf-8")
self.sections = self._parse_sections(self.draft_text)
self.entity_counts = self.count_entities(self.draft_text)
deficits = self.calculate_deficits()
recommendations = self.recommend_additions()
section_density = self._section_density()
# Build summary stats
entities_found = sum(
1 for name, counts in self.entity_counts.items() if counts["total"] > 0
)
entities_with_deficit = sum(1 for d in deficits if d["remaining_deficit"] > 0)
summary = {
"search_term": self.search_term,
"total_entities_tracked": len(self.entities),
"entities_found_in_draft": entities_found,
"entities_with_deficit": entities_with_deficit,
"total_sections": len(self.sections),
}
return {
"summary": summary,
"entity_counts": self.entity_counts,
"deficits": deficits,
"recommendations": recommendations,
"section_density": section_density,
}
def count_entities(self, text: str) -> dict:
"""Count occurrences of each Cora entity in the text, total and per section.
Uses case-insensitive matching with word boundaries so partial matches
inside larger words are excluded.
Args:
text: The full draft text.
Returns:
dict mapping entity name to {"total": int, "per_section": {heading: int}}
"""
counts = {}
sections = self.sections if self.sections else self._parse_sections(text)
for entity in self.entities:
name = entity["name"]
pattern = re.compile(r"\b" + re.escape(name) + r"\b", re.IGNORECASE)
total = len(pattern.findall(text))
per_section = {}
for section in sections:
section_count = len(pattern.findall(section["text"]))
if section_count > 0:
per_section[section["heading"]] = section_count
counts[name] = {
"total": total,
"per_section": per_section,
}
return counts
def calculate_deficits(self) -> list[dict]:
"""Calculate which entities are still below their Cora deficit target.
Compares the count found in the draft against the deficit value from
the Cora report. An entity with a Cora deficit of 20 and a draft count
of 5 has a remaining deficit of 15.
Returns:
List of dicts with: name, relevance, correlation, cora_deficit,
draft_count, remaining_deficit sorted by remaining_deficit descending.
"""
deficits = []
for entity in self.entities:
name = entity["name"]
cora_deficit = entity.get("deficit") or 0
draft_count = self.entity_counts.get(name, {}).get("total", 0)
remaining = max(0, cora_deficit - draft_count)
deficits.append({
"name": name,
"relevance": entity.get("relevance") or 0,
"correlation": entity.get("correlation") or 0,
"cora_deficit": cora_deficit,
"draft_count": draft_count,
"remaining_deficit": remaining,
})
deficits.sort(key=lambda d: d["remaining_deficit"], reverse=True)
return deficits
def recommend_additions(self) -> list[dict]:
"""Generate prioritized recommendations for entity additions.
Priority is calculated as relevance * remaining_deficit, so entities
that are both highly relevant and far below target rank highest.
Each recommendation includes suggested sections where the entity
could naturally be added, based on where related entities already appear.
Returns:
List of recommendation dicts sorted by priority descending. Each dict
has: name, relevance, correlation, cora_deficit, draft_count,
remaining_deficit, priority, suggested_sections.
"""
deficits = self.calculate_deficits()
recommendations = []
for deficit_entry in deficits:
if deficit_entry["remaining_deficit"] <= 0:
continue
relevance = deficit_entry["relevance"]
remaining = deficit_entry["remaining_deficit"]
priority = relevance * remaining
suggested = self._suggest_sections(deficit_entry["name"])
recommendations.append({
"name": deficit_entry["name"],
"relevance": relevance,
"correlation": deficit_entry["correlation"],
"cora_deficit": deficit_entry["cora_deficit"],
"draft_count": deficit_entry["draft_count"],
"remaining_deficit": remaining,
"priority": round(priority, 4),
"suggested_sections": suggested,
})
recommendations.sort(key=lambda r: r["priority"], reverse=True)
return recommendations
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
def _parse_sections(self, text: str) -> list[dict]:
"""Split markdown text into sections by headings.
Each section captures the heading text, heading level, and the body
text under that heading (up to the next heading of equal or higher level).
A virtual "Introduction" section is created for content before the first heading.
Returns:
list of {"heading": str, "level": int, "text": str}
"""
heading_pattern = re.compile(r"^(#{1,6})\s+(.+)$", re.MULTILINE)
matches = list(heading_pattern.finditer(text))
sections = []
# Content before the first heading becomes the Introduction section
if matches:
intro_text = text[:matches[0].start()].strip()
if intro_text:
sections.append({
"heading": "Introduction",
"level": 0,
"text": intro_text,
})
else:
# No headings at all — treat the entire text as one section
return [{
"heading": "Full Document",
"level": 0,
"text": text,
}]
for i, match in enumerate(matches):
level = len(match.group(1))
heading = match.group(2).strip()
start = match.end()
end = matches[i + 1].start() if i + 1 < len(matches) else len(text)
body = text[start:end].strip()
sections.append({
"heading": heading,
"level": level,
"text": body,
})
return sections
def _suggest_sections(self, entity_name: str) -> list[str]:
"""Suggest sections where an entity could naturally be added.
Strategy: find sections that already contain other entities from the
same Cora report. Sections with higher concentrations of related
entities are better candidates because the topic is contextually aligned.
If no sections have related entities, return all non-empty sections
as general candidates.
Args:
entity_name: The entity to find placement for.
Returns:
List of section heading strings, ordered by relevance.
"""
if not self.sections:
return []
# Build a score for each section: count how many other entities appear there
section_scores = []
for section in self.sections:
heading = section["heading"]
other_entity_count = 0
for name, counts in self.entity_counts.items():
if name.lower() == entity_name.lower():
continue
if heading in counts.get("per_section", {}):
other_entity_count += counts["per_section"][heading]
if other_entity_count > 0:
section_scores.append((heading, other_entity_count))
# Sort by entity richness descending
section_scores.sort(key=lambda x: x[1], reverse=True)
if section_scores:
return [heading for heading, _score in section_scores]
# Fallback: return all sections with non-trivial content
return [
s["heading"]
for s in self.sections
if len(s["text"].split()) > 20
]
def _section_density(self) -> list[dict]:
"""Calculate per-section entity density.
Returns:
List of dicts with: heading, level, word_count, entities_found,
entity_mentions, density (mentions per 100 words).
"""
densities = []
for section in self.sections:
heading = section["heading"]
word_count = len(section["text"].split())
entities_found = 0
total_mentions = 0
for name, counts in self.entity_counts.items():
section_count = counts.get("per_section", {}).get(heading, 0)
if section_count > 0:
entities_found += 1
total_mentions += section_count
density = round((total_mentions / word_count) * 100, 2) if word_count > 0 else 0.0
densities.append({
"heading": heading,
"level": section["level"],
"word_count": word_count,
"entities_found": entities_found,
"entity_mentions": total_mentions,
"density_per_100_words": density,
})
return densities
# ------------------------------------------------------------------
# Output formatting
# ------------------------------------------------------------------
def format_text_report(analysis: dict, top_n: int = 30) -> str:
"""Format the analysis result as a human-readable text report."""
lines = []
summary = analysis["summary"]
# --- Header ---
lines.append("=" * 70)
lines.append(" ENTITY OPTIMIZATION REPORT")
if summary.get("search_term"):
lines.append(f" Target keyword: {summary['search_term']}")
lines.append("=" * 70)
lines.append("")
# --- Summary ---
lines.append("SUMMARY")
lines.append("-" * 40)
lines.append(f" Total entities tracked: {summary['total_entities_tracked']}")
lines.append(f" Entities found in draft: {summary['entities_found_in_draft']}")
lines.append(f" Entities with deficit: {summary['entities_with_deficit']}")
lines.append(f" Total sections in draft: {summary['total_sections']}")
lines.append("")
# --- Top Recommendations ---
recommendations = analysis["recommendations"]
shown = recommendations[:top_n]
lines.append(f"TOP {min(top_n, len(recommendations))} RECOMMENDATIONS (sorted by priority)")
lines.append("-" * 70)
if not shown:
lines.append(" No entity deficits found — the draft covers all targets.")
else:
for i, rec in enumerate(shown, 1):
sections_str = ", ".join(rec["suggested_sections"][:3]) if rec["suggested_sections"] else "any section"
lines.append(
f" {i:>3}. Entity '{rec['name']}' found {rec['draft_count']} times, "
f"target deficit is {rec['cora_deficit']}. "
f"Remaining: {rec['remaining_deficit']}. "
f"Priority: {rec['priority']}"
)
lines.append(
f" Relevance: {rec['relevance']} | Correlation: {rec['correlation']}"
)
lines.append(
f" Suggested sections: [{sections_str}]"
)
lines.append("")
# --- Per-Section Entity Density ---
lines.append("PER-SECTION ENTITY DENSITY")
lines.append("-" * 70)
lines.append(f" {'Section':<40} {'Words':>6} {'Entities':>9} {'Mentions':>9} {'Density':>8}")
lines.append(f" {'-' * 40} {'-' * 6} {'-' * 9} {'-' * 9} {'-' * 8}")
for sd in analysis["section_density"]:
indent = " " * sd["level"] if sd["level"] > 0 else ""
heading_display = indent + sd["heading"]
if len(heading_display) > 38:
heading_display = heading_display[:35] + "..."
lines.append(
f" {heading_display:<40} {sd['word_count']:>6} {sd['entities_found']:>9} "
f"{sd['entity_mentions']:>9} {sd['density_per_100_words']:>7.2f}%"
)
lines.append("")
lines.append("=" * 70)
return "\n".join(lines)
def format_json_report(analysis: dict, top_n: int = 30) -> str:
"""Format the analysis result as machine-readable JSON."""
output = {
"summary": analysis["summary"],
"recommendations": analysis["recommendations"][:top_n],
"section_density": analysis["section_density"],
"entity_counts": analysis["entity_counts"],
"deficits": analysis["deficits"],
}
return json.dumps(output, indent=2, default=str)
# ------------------------------------------------------------------
# CLI entry point
# ------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Analyze a content draft against Cora entity targets and recommend additions.",
usage="uv run --with openpyxl python entity_optimizer.py <draft_path> <cora_xlsx_path> [options]",
)
parser.add_argument(
"draft_path",
help="Path to the markdown content draft",
)
parser.add_argument(
"cora_xlsx_path",
help="Path to the Cora SEO XLSX report",
)
parser.add_argument(
"--format",
choices=["json", "text"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--top-n",
type=int,
default=30,
help="Number of top recommendations to display (default: 30)",
)
args = parser.parse_args()
try:
optimizer = EntityOptimizer(args.cora_xlsx_path)
analysis = optimizer.analyze_draft(args.draft_path)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error analyzing draft: {e}", file=sys.stderr)
sys.exit(1)
if args.format == "json":
print(format_json_report(analysis, top_n=args.top_n))
else:
print(format_text_report(analysis, top_n=args.top_n))
if __name__ == "__main__":
main()

View File

@ -1,414 +0,0 @@
"""
LSI Keyword Optimizer
Counts Cora-defined LSI keywords in a content draft and recommends additions.
Reads LSI targets from a Cora XLSX report via cora_parser.CoraReport, then
scans a markdown draft to measure per-keyword usage and calculate deficits.
Recommendations are prioritized by |correlation| x deficit so the most
ranking-impactful gaps surface first.
Usage:
uv run --with openpyxl python lsi_optimizer.py <draft_path> <cora_xlsx_path> \
[--format json|text] [--min-correlation 0.2] [--top-n 50]
"""
import argparse
import json
import re
import sys
from pathlib import Path
from cora_parser import CoraReport
class LSIOptimizer:
"""Analyzes a content draft against Cora LSI keyword targets."""
def __init__(self, cora_xlsx_path: str):
"""Load LSI keyword targets from a Cora XLSX report.
Args:
cora_xlsx_path: Path to the Cora SEO report XLSX file.
"""
self.report = CoraReport(cora_xlsx_path)
self.lsi_keywords = self.report.get_lsi_keywords()
self.draft_text = ""
self.sections: list[dict] = []
self._keyword_counts: dict[str, int] = {}
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def analyze_draft(self, draft_path: str) -> dict:
"""Run full LSI analysis on a markdown draft.
Args:
draft_path: Path to a markdown content draft.
Returns:
Analysis dict with keys: summary, keyword_counts, deficits,
recommendations, section_coverage.
"""
path = Path(draft_path)
if not path.exists():
raise FileNotFoundError(f"Draft file not found: {draft_path}")
self.draft_text = path.read_text(encoding="utf-8")
self.sections = self._parse_sections(self.draft_text)
self._keyword_counts = self.count_lsi_keywords(self.draft_text)
deficits = self.calculate_deficits()
recommendations = self.recommend_additions()
section_coverage = self._section_coverage()
total_tracked = len(self.lsi_keywords)
found_in_draft = sum(1 for c in self._keyword_counts.values() if c > 0)
with_deficit = len(deficits)
return {
"summary": {
"total_lsi_tracked": total_tracked,
"found_in_draft": found_in_draft,
"with_deficit": with_deficit,
"fully_satisfied": total_tracked - with_deficit,
},
"keyword_counts": self._keyword_counts,
"deficits": deficits,
"recommendations": recommendations,
"section_coverage": section_coverage,
}
def count_lsi_keywords(self, text: str) -> dict[str, int]:
"""Count occurrences of each LSI keyword in the given text.
Uses word-boundary-aware regex matching so multi-word phrases like
"part that" are matched correctly and case-insensitively.
Args:
text: The content string to scan.
Returns:
Dict mapping keyword string to its occurrence count.
"""
counts: dict[str, int] = {}
for kw_data in self.lsi_keywords:
keyword = kw_data["keyword"]
pattern = self._keyword_pattern(keyword)
matches = pattern.findall(text)
counts[keyword] = len(matches)
return counts
def calculate_deficits(self) -> list[dict]:
"""Identify LSI keywords whose draft count is below the Cora target.
A keyword has a deficit when the Cora report indicates a positive
deficit value (target minus current usage in the report) AND the
draft count has not yet closed that gap.
Returns:
List of dicts with: keyword, draft_count, target, deficit,
spearmans, pearsons, best_of_both. Only keywords with
remaining deficit > 0 are included.
"""
deficits = []
for kw_data in self.lsi_keywords:
keyword = kw_data["keyword"]
cora_deficit = kw_data.get("deficit") or 0
if cora_deficit <= 0:
continue
# The Cora deficit is based on the original page. The draft may
# have added some occurrences, so we re-compute: how many more
# are still needed?
cora_current = kw_data.get("current_count") or 0
target = cora_current + cora_deficit
draft_count = self._keyword_counts.get(keyword, 0)
remaining_deficit = target - draft_count
if remaining_deficit <= 0:
continue
deficits.append({
"keyword": keyword,
"draft_count": draft_count,
"target": target,
"deficit": remaining_deficit,
"spearmans": kw_data.get("spearmans"),
"pearsons": kw_data.get("pearsons"),
"best_of_both": kw_data.get("best_of_both"),
})
return deficits
def recommend_additions(
self,
min_correlation: float = 0.0,
top_n: int = 0,
) -> list[dict]:
"""Produce a prioritized list of LSI keyword additions.
Priority score = abs(best_of_both) x deficit. Keywords with higher
correlation to ranking AND larger deficits sort to the top.
Args:
min_correlation: Only include keywords whose
abs(best_of_both) >= this threshold.
top_n: Limit to top N results (0 = no limit).
Returns:
Sorted list of dicts with: keyword, priority, deficit,
draft_count, target, best_of_both, spearmans, pearsons.
"""
deficits = self.calculate_deficits()
recommendations = []
for d in deficits:
correlation = abs(d["best_of_both"]) if d["best_of_both"] else 0.0
if correlation < min_correlation:
continue
priority = correlation * d["deficit"]
recommendations.append({
"keyword": d["keyword"],
"priority": round(priority, 4),
"deficit": d["deficit"],
"draft_count": d["draft_count"],
"target": d["target"],
"best_of_both": d["best_of_both"],
"spearmans": d["spearmans"],
"pearsons": d["pearsons"],
})
recommendations.sort(key=lambda r: r["priority"], reverse=True)
if top_n > 0:
recommendations = recommendations[:top_n]
return recommendations
# ------------------------------------------------------------------
# Internal helpers
# ------------------------------------------------------------------
@staticmethod
def _keyword_pattern(keyword: str) -> re.Pattern:
"""Build a word-boundary-aware regex for an LSI keyword.
Handles multi-word phrases by joining escaped tokens with flexible
whitespace. Case-insensitive.
"""
tokens = keyword.strip().split()
escaped = [re.escape(t) for t in tokens]
# Allow flexible whitespace between tokens in multi-word phrases
pattern_str = r"\b" + r"\s+".join(escaped) + r"\b"
return re.compile(pattern_str, re.IGNORECASE)
@staticmethod
def _parse_sections(text: str) -> list[dict]:
"""Split markdown text into sections by headings.
Returns list of dicts with: heading, level, content.
The content before the first heading gets heading="(intro)".
"""
heading_re = re.compile(r"^(#{1,6})\s+(.+)$", re.MULTILINE)
matches = list(heading_re.finditer(text))
sections: list[dict] = []
if not matches:
# No headings — treat entire text as one section
sections.append({
"heading": "(intro)",
"level": 0,
"content": text,
})
return sections
# Content before first heading
if matches[0].start() > 0:
intro = text[: matches[0].start()]
if intro.strip():
sections.append({
"heading": "(intro)",
"level": 0,
"content": intro,
})
for i, match in enumerate(matches):
level = len(match.group(1))
heading = match.group(2).strip()
start = match.end()
end = matches[i + 1].start() if i + 1 < len(matches) else len(text)
content = text[start:end]
sections.append({
"heading": heading,
"level": level,
"content": content,
})
return sections
def _section_coverage(self) -> list[dict]:
"""Calculate LSI keyword coverage per section.
Returns list of dicts with: heading, level, total_keywords_found,
keyword_details (list of keyword/count pairs present in that section).
"""
coverage = []
for section in self.sections:
section_counts = self.count_lsi_keywords(section["content"])
found = {kw: cnt for kw, cnt in section_counts.items() if cnt > 0}
coverage.append({
"heading": section["heading"],
"level": section["level"],
"total_keywords_found": len(found),
"keyword_details": [
{"keyword": kw, "count": cnt}
for kw, cnt in sorted(found.items(), key=lambda x: x[1], reverse=True)
],
})
return coverage
# ----------------------------------------------------------------------
# Output formatting
# ----------------------------------------------------------------------
def format_text_report(analysis: dict) -> str:
"""Format the analysis dict as a human-readable text report."""
lines: list[str] = []
summary = analysis["summary"]
# --- Summary ---
lines.append("=" * 60)
lines.append(" LSI KEYWORD OPTIMIZATION REPORT")
lines.append("=" * 60)
lines.append("")
lines.append(f" Total LSI keywords tracked : {summary['total_lsi_tracked']}")
lines.append(f" Found in draft : {summary['found_in_draft']}")
lines.append(f" With deficit (need more) : {summary['with_deficit']}")
lines.append(f" Fully satisfied : {summary['fully_satisfied']}")
lines.append("")
# --- Top Recommendations ---
recs = analysis["recommendations"]
if recs:
lines.append("-" * 60)
lines.append(" TOP RECOMMENDATIONS (sorted by priority)")
lines.append("-" * 60)
lines.append("")
lines.append(
f" {'#':<4} {'Keyword':<30} {'Priority':>9} "
f"{'Deficit':>8} {'Draft':>6} {'Target':>7} {'Corr':>7}"
)
lines.append(f" {''*4} {''*30} {''*9} {''*8} {''*6} {''*7} {''*7}")
for i, rec in enumerate(recs, 1):
corr = rec["best_of_both"]
corr_str = f"{corr:.3f}" if corr is not None else "N/A"
keyword_display = rec["keyword"]
if len(keyword_display) > 28:
keyword_display = keyword_display[:25] + "..."
lines.append(
f" {i:<4} {keyword_display:<30} {rec['priority']:>9.4f} "
f"{rec['deficit']:>8} {rec['draft_count']:>6} "
f"{rec['target']:>7} {corr_str:>7}"
)
lines.append("")
else:
lines.append(" No recommendations — all LSI targets met or no deficits found.")
lines.append("")
# --- Section Coverage ---
sections = analysis["section_coverage"]
if sections:
lines.append("-" * 60)
lines.append(" PER-SECTION LSI COVERAGE")
lines.append("-" * 60)
lines.append("")
for sec in sections:
indent = " " * (sec["level"] + 1)
heading = sec["heading"]
kw_count = sec["total_keywords_found"]
lines.append(f"{indent}{heading} ({kw_count} LSI keyword{'s' if kw_count != 1 else ''})")
if sec["keyword_details"]:
for detail in sec["keyword_details"][:10]:
lines.append(f"{indent} - \"{detail['keyword']}\" x{detail['count']}")
remaining = len(sec["keyword_details"]) - 10
if remaining > 0:
lines.append(f"{indent} ... and {remaining} more")
lines.append("")
lines.append("=" * 60)
return "\n".join(lines)
# ----------------------------------------------------------------------
# CLI entry point
# ----------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Analyze a content draft against Cora LSI keyword targets.",
)
parser.add_argument(
"draft_path",
help="Path to the markdown content draft",
)
parser.add_argument(
"cora_xlsx_path",
help="Path to the Cora SEO XLSX report",
)
parser.add_argument(
"--format",
choices=["json", "text"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--min-correlation",
type=float,
default=0.2,
help="Minimum |correlation| to include in recommendations (default: 0.2)",
)
parser.add_argument(
"--top-n",
type=int,
default=50,
help="Limit recommendations to top N (default: 50, 0 = unlimited)",
)
args = parser.parse_args()
try:
optimizer = LSIOptimizer(args.cora_xlsx_path)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
try:
analysis = optimizer.analyze_draft(args.draft_path)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
# Apply CLI filters to recommendations
analysis["recommendations"] = optimizer.recommend_additions(
min_correlation=args.min_correlation,
top_n=args.top_n,
)
if args.format == "json":
print(json.dumps(analysis, indent=2, default=str))
else:
print(format_text_report(analysis))
if __name__ == "__main__":
main()

View File

@ -1,402 +0,0 @@
"""
SEO Content Optimizer
Checks keyword density and content structure of a draft against Cora targets.
Usage:
uv run --with openpyxl python seo_optimizer.py <draft_path>
[--keyword <kw>] [--cora-xlsx <path>] [--format json|text]
Works standalone for basic checks, or with a Cora XLSX report for
keyword-specific targets via cora_parser.CoraReport.
"""
import argparse
import json
import re
import sys
from pathlib import Path
# Optional Cora integration — script works without it
try:
from cora_parser import CoraReport
except ImportError:
CoraReport = None
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _split_words(text: str) -> list[str]:
"""Extract words from text (alphabetic sequences)."""
return re.findall(r"[a-zA-Z']+", text)
def _strip_markdown_headings(text: str) -> str:
"""Remove markdown heading markers from text for word counting."""
return re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
def _extract_headings(text: str) -> list[dict]:
"""Extract markdown-style headings with their levels."""
headings = []
for match in re.finditer(r"^(#{1,6})\s+(.+)$", text, re.MULTILINE):
level = len(match.group(1))
headings.append({"level": level, "text": match.group(2).strip()})
return headings
# ---------------------------------------------------------------------------
# SEOOptimizer
# ---------------------------------------------------------------------------
class SEOOptimizer:
"""Analyze a content draft for keyword density and structure."""
def __init__(self):
self._results = {}
# -- public entry point -------------------------------------------------
def analyze(
self,
draft_path: str,
primary_keyword: str | None = None,
cora_xlsx_path: str | None = None,
) -> dict:
"""Run checks on *draft_path* and return an analysis dict."""
path = Path(draft_path)
if not path.exists():
raise FileNotFoundError(f"Draft not found: {draft_path}")
text = path.read_text(encoding="utf-8")
# Optionally load Cora data
cora = None
if cora_xlsx_path:
if CoraReport is None:
print(
"Warning: cora_parser not available. "
"Install openpyxl and ensure cora_parser.py is importable.",
file=sys.stderr,
)
else:
cora = CoraReport(cora_xlsx_path)
# Determine keyword list
keywords = []
if primary_keyword:
keywords.append(primary_keyword)
if cora:
search_term = cora.get_search_term()
if search_term and search_term.lower() not in [k.lower() for k in keywords]:
keywords.insert(0, search_term)
for var in cora.get_keyword_variations():
v = var["variation"]
if v.lower() not in [k.lower() for k in keywords]:
keywords.append(v)
# If still no keywords but Cora gave a search term, use it
if not keywords and cora:
st = cora.get_search_term()
if st:
keywords.append(st)
# Word-count target from Cora
word_count_target = None
if cora:
for t in cora.get_basic_tunings():
if t["factor"] == "Word Count":
try:
word_count_target = int(float(t["goal"]))
except (ValueError, TypeError):
pass
break
# Build Cora keyword targets (page1_avg) for comparison
cora_keyword_targets = {}
if cora:
for var in cora.get_keyword_variations():
cora_keyword_targets[var["variation"].lower()] = {
"page1_avg": var.get("page1_avg", 0),
"page1_max": var.get("page1_max", 0),
}
# Run checks
self._results["content_length"] = self.check_content_length(text, target=word_count_target)
self._results["structure"] = self.check_structure(text)
self._results["keyword_density"] = self.check_keyword_density(
text, keywords=keywords or None, cora_targets=cora_keyword_targets,
)
return self._results
# -- individual checks --------------------------------------------------
def check_keyword_density(
self,
text: str,
keywords: list[str] | None = None,
cora_targets: dict | None = None,
) -> dict:
"""Return per-keyword density information.
Only reports variations that have page1_avg > 0 (competitors actually
use them) when Cora targets are available.
"""
clean_text = _strip_markdown_headings(text).lower()
words = _split_words(clean_text)
total_words = len(words)
if total_words == 0:
return {"total_words": 0, "keywords": []}
results: list[dict] = []
if keywords:
for kw in keywords:
kw_lower = kw.lower()
# Skip zero-avg variations — competitors don't use them
if cora_targets and kw_lower in cora_targets:
if cora_targets[kw_lower].get("page1_avg", 0) == 0:
continue
kw_words = kw_lower.split()
if len(kw_words) > 1:
pattern = re.compile(r"\b" + re.escape(kw_lower) + r"\b")
count = len(pattern.findall(clean_text))
else:
count = sum(1 for w in words if w == kw_lower)
density = (count / total_words) * 100 if total_words else 0
entry = {
"keyword": kw,
"count": count,
"density_pct": round(density, 2),
}
# Add Cora target if available
if cora_targets and kw_lower in cora_targets:
entry["target_avg"] = cora_targets[kw_lower]["page1_avg"]
entry["target_max"] = cora_targets[kw_lower]["page1_max"]
results.append(entry)
else:
# Fallback: top frequent words (>= 4 chars)
freq: dict[str, int] = {}
for w in words:
if len(w) >= 4:
freq[w] = freq.get(w, 0) + 1
top = sorted(freq.items(), key=lambda x: x[1], reverse=True)[:10]
for w, count in top:
density = (count / total_words) * 100
results.append({
"keyword": w,
"count": count,
"density_pct": round(density, 2),
})
return {"total_words": total_words, "keywords": results}
def check_structure(self, text: str) -> dict:
"""Analyze heading hierarchy, paragraph count, and list usage."""
headings = _extract_headings(text)
# Count headings per level
heading_counts = {f"h{i}": 0 for i in range(1, 7)}
for h in headings:
heading_counts[f"h{h['level']}"] += 1
# Check nesting issues
nesting_issues: list[str] = []
if heading_counts["h1"] > 1:
nesting_issues.append(f"Multiple H1 tags found ({heading_counts['h1']}); use exactly one.")
prev_level = 0
for h in headings:
if prev_level > 0 and h["level"] > prev_level + 1:
nesting_issues.append(
f"Heading skip: H{prev_level} -> H{h['level']} "
f"(at \"{h['text'][:40]}...\")"
if len(h["text"]) > 40 else
f"Heading skip: H{prev_level} -> H{h['level']} "
f"(at \"{h['text']}\")"
)
prev_level = h["level"]
# Paragraphs
paragraphs = []
for block in re.split(r"\n\s*\n", text):
block = block.strip()
if not block:
continue
if re.match(r"^#{1,6}\s+", block) and "\n" not in block:
continue
if all(re.match(r"^\s*[-*+]\s|^\s*\d+\.\s", line) for line in block.splitlines() if line.strip()):
continue
paragraphs.append(block)
paragraph_count = len(paragraphs)
# List usage
unordered_items = len(re.findall(r"^\s*[-*+]\s", text, re.MULTILINE))
ordered_items = len(re.findall(r"^\s*\d+\.\s", text, re.MULTILINE))
return {
"heading_counts": heading_counts,
"headings": [{"level": h["level"], "text": h["text"]} for h in headings],
"nesting_issues": nesting_issues,
"paragraph_count": paragraph_count,
"unordered_list_items": unordered_items,
"ordered_list_items": ordered_items,
}
def check_content_length(self, text: str, target: int | None = None) -> dict:
"""Compare word count against an optional target."""
clean = _strip_markdown_headings(text)
words = _split_words(clean)
word_count = len(words)
result: dict = {"word_count": word_count}
if target is not None:
result["target"] = target
result["difference"] = word_count - target
if word_count >= target:
result["status"] = "meets_target"
elif word_count >= target * 0.8:
result["status"] = "close"
else:
result["status"] = "below_target"
return result
# ---------------------------------------------------------------------------
# Text-mode formatting
# ---------------------------------------------------------------------------
def _format_text_report(results: dict) -> str:
"""Format analysis results as a human-readable text report."""
lines: list[str] = []
sep = "-" * 60
# 1. Content Stats
cl = results.get("content_length", {})
lines.append(sep)
lines.append(" CONTENT STATS")
lines.append(sep)
lines.append(f" Word count: {cl.get('word_count', 0)}")
if cl.get("target"):
lines.append(f" Target: {cl['target']} ({cl.get('status', '')})")
diff = cl.get("difference", 0)
sign = "+" if diff >= 0 else ""
lines.append(f" Difference: {sign}{diff}")
lines.append("")
# 2. Structure
st = results.get("structure", {})
lines.append(sep)
lines.append(" STRUCTURE")
lines.append(sep)
hc = st.get("heading_counts", {})
for lvl in range(1, 7):
count = hc.get(f"h{lvl}", 0)
if count > 0:
lines.append(f" H{lvl}: {count}")
issues = st.get("nesting_issues", [])
if issues:
lines.append(" Nesting issues:")
for issue in issues:
lines.append(f" - {issue}")
else:
lines.append(" Nesting: OK")
lines.append("")
# 3. Keyword Density (only variations with targets)
kd = results.get("keyword_density", {})
kw_list = kd.get("keywords", [])
lines.append(sep)
lines.append(" KEYWORD DENSITY")
lines.append(sep)
if kw_list:
lines.append(f" {'Variation':<30s} {'Count':>5s} {'Density':>7s} {'Avg':>5s} {'Max':>5s}")
lines.append(f" {'-'*30} {'-'*5} {'-'*7} {'-'*5} {'-'*5}")
for kw in kw_list:
avg_str = str(kw.get("target_avg", "")) if "target_avg" in kw else ""
max_str = str(kw.get("target_max", "")) if "target_max" in kw else ""
lines.append(
f" {kw['keyword']:<30s} "
f"{kw['count']:>5d} "
f"{kw['density_pct']:>6.2f}% "
f"{avg_str:>5s} "
f"{max_str:>5s}"
)
else:
lines.append(" No keywords specified.")
lines.append("")
lines.append(sep)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Check keyword density and structure of a content draft.",
epilog="Example: uv run --with openpyxl python seo_optimizer.py draft.md --cora-xlsx report.xlsx",
)
parser.add_argument(
"draft_path",
help="Path to the content draft (plain text or markdown)",
)
parser.add_argument(
"--keyword",
dest="keyword",
default=None,
help="Primary keyword to evaluate",
)
parser.add_argument(
"--cora-xlsx",
dest="cora_xlsx",
default=None,
help="Path to a Cora XLSX report for keyword-specific targets",
)
parser.add_argument(
"--format",
choices=["json", "text"],
default="text",
help="Output format (default: text)",
)
args = parser.parse_args()
optimizer = SEOOptimizer()
try:
results = optimizer.analyze(
draft_path=args.draft_path,
primary_keyword=args.keyword,
cora_xlsx_path=args.cora_xlsx,
)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error during analysis: {e}", file=sys.stderr)
sys.exit(1)
if args.format == "json":
print(json.dumps(results, indent=2, default=str))
else:
print(_format_text_report(results))
if __name__ == "__main__":
main()

View File

@ -1,469 +0,0 @@
#!/usr/bin/env python3
"""
Test Block Generator Programmatically Assemble Test Blocks from Templates
Takes LLM-generated sentence templates (with {N} slots for body text) and
pre-written headings, plus an LLM-curated entity list, and assembles a test
block. Tracks aggregate densities in real-time and stops when targets are met.
The LLM handles all intelligence: filtering entities for topical relevance,
writing headings, creating body templates. This script handles all math:
slot filling, density tracking, stop conditions.
Usage:
uv run --with openpyxl python test_block_generator.py <templates_path> <prep_json_path> <cora_xlsx_path>
--entities-file <path> [--output-dir ./working/] [--min-sentences 5]
"""
import argparse
import json
import re
import sys
from pathlib import Path
from cora_parser import CoraReport
# ---------------------------------------------------------------------------
# Term selection
# ---------------------------------------------------------------------------
def load_entity_names(entities_file: str) -> list[str]:
"""Load LLM-curated entity names from file (one per line)."""
path = Path(entities_file)
if not path.exists():
print(f"Error: entities file not found: {path}", file=sys.stderr)
sys.exit(1)
names = []
for line in path.read_text(encoding="utf-8").splitlines():
name = line.strip()
if name:
names.append(name)
return names
def build_term_queue(
filtered_entity_names: list[str],
variations: list[str],
) -> list[str]:
"""Build a flat priority-ordered term list.
Order: filtered entities (LLM-curated, in provided order) -> keyword variations.
"""
terms = []
seen = set()
# 1. Filtered entities from LLM (already curated for topical relevance)
for name in filtered_entity_names:
if name.lower() not in seen:
terms.append(name)
seen.add(name.lower())
# 2. Keyword variations
for v in variations:
if v.lower() not in seen:
terms.append(v)
seen.add(v.lower())
return terms
# ---------------------------------------------------------------------------
# Generator
# ---------------------------------------------------------------------------
class TestBlockGenerator:
"""Fills body templates with entity/variation terms, inserts pre-written
headings, and tracks aggregate densities."""
def __init__(self, cora_xlsx_path: str, prep_data: dict, filtered_entity_names: list[str]):
self.report = CoraReport(cora_xlsx_path)
self.prep = prep_data
self.entities = self.report.get_entities()
self.variations = self.report.get_variations_list()
# Compile regex patterns for counting (built once, used per sentence)
self.entity_patterns = {}
for e in self.entities:
name = e["name"]
self.entity_patterns[name] = re.compile(
r"\b" + re.escape(name) + r"\b", re.IGNORECASE
)
self.variation_patterns = {}
for v in self.variations:
self.variation_patterns[v] = re.compile(
r"\b" + re.escape(v) + r"\b", re.IGNORECASE
)
# Build term queue from LLM-curated entity list
self.term_queue = build_term_queue(filtered_entity_names, self.variations)
self.term_idx = 0
# Track which 0->1 entities have been introduced
# Use the full missing list from prep to track introductions accurately
missing = prep_data.get("distinct_entities", {}).get("missing_entities", [])
self.missing_names = {e["name"] for e in missing}
self.introduced = set()
# Running totals for new content
self.new_words = 0
self.new_entity_mentions = 0
self.new_variation_mentions = 0
self.new_h2_count = 0
self.new_h3_count = 0
# Baseline from prep
self.base_words = prep_data["word_count"]["current"]
self.base_entity_mentions = prep_data["entity_density"]["current_mentions"]
self.base_variation_mentions = prep_data["variation_density"]["current_mentions"]
self.target_entity_d = prep_data["entity_density"]["target_decimal"]
self.target_variation_d = prep_data["variation_density"]["target_decimal"]
def pick_term(self, used_in_sentence: set) -> str:
"""Pick next term from the queue, skipping duplicates within a sentence."""
if not self.term_queue:
return "equipment"
used_lower = {u.lower() for u in used_in_sentence}
for _ in range(len(self.term_queue)):
term = self.term_queue[self.term_idx % len(self.term_queue)]
self.term_idx = (self.term_idx + 1) % len(self.term_queue)
if term.lower() not in used_lower:
return term
# All exhausted for this sentence, return next anyway
term = self.term_queue[self.term_idx % len(self.term_queue)]
self.term_idx = (self.term_idx + 1) % len(self.term_queue)
return term
def fill_template(self, template: str) -> str:
"""Fill a template's {N} slots with terms."""
slots = re.findall(r"\{(\d+)\}", template)
used = set()
filled = template
for slot_num in slots:
term = self.pick_term(used)
used.add(term)
filled = filled.replace(f"{{{slot_num}}}", term, 1)
return filled
def count_sentence(self, text: str) -> tuple[int, int, int]:
"""Count words, entity mentions, and variation mentions in text.
Also tracks which 0->1 entities have been introduced.
Returns: (word_count, entity_mentions, variation_mentions)
"""
entity_mentions = 0
for name, pattern in self.entity_patterns.items():
count = len(pattern.findall(text))
entity_mentions += count
if count > 0 and name in self.missing_names:
self.introduced.add(name)
variation_mentions = 0
for v, pattern in self.variation_patterns.items():
variation_mentions += len(pattern.findall(text))
words = len(re.findall(r"[a-zA-Z']+", text))
return words, entity_mentions, variation_mentions
def projected_density(self, metric: str) -> float:
"""Calculate projected density after current additions."""
total_words = self.base_words + self.new_words
if total_words == 0:
return 0.0
if metric == "entity":
return (self.base_entity_mentions + self.new_entity_mentions) / total_words
elif metric == "variation":
return (self.base_variation_mentions + self.new_variation_mentions) / total_words
return 0.0
def targets_met(self, min_reached: bool) -> bool:
"""Check if all density targets are met and minimums reached."""
if not min_reached:
return False
entity_ok = self.projected_density("entity") >= self.target_entity_d
variation_ok = self.projected_density("variation") >= self.target_variation_d
distinct_deficit = self.prep["distinct_entities"]["deficit"]
distinct_ok = len(self.introduced) >= distinct_deficit
wc_deficit = self.prep["word_count"]["deficit"]
wc_ok = self.new_words >= wc_deficit
return entity_ok and variation_ok and distinct_ok and wc_ok
def generate(
self,
templates: list[str],
min_sentences: int = 5,
) -> dict:
"""Generate the test block by filling body templates and inserting
pre-written headings.
Args:
templates: List of template strings. Lines starting with "H2:" or
"H3:" are pre-written headings (inserted as-is, no slot filling).
Everything else is a body template with {N} slots.
min_sentences: Minimum sentences before checking stop condition.
Returns:
Dict with "sentences" list and "stats" summary.
"""
h2_headings = []
h3_headings = []
body_templates = []
for t in templates:
t = t.strip()
if not t:
continue
if t.upper().startswith("H2:"):
h2_headings.append(t[3:].strip())
elif t.upper().startswith("H3:"):
h3_headings.append(t[3:].strip())
else:
body_templates.append(t)
if not body_templates:
return {"error": "No body templates found", "sentences": [], "stats": {}}
h2_needed = self.prep["headings"]["h2"]["deficit"]
h3_needed = self.prep["headings"]["h3"]["deficit"]
sentences = []
count = 0
body_idx = 0
h2_idx = 0
h3_idx = 0
max_iter = max(len(body_templates) * 3, 60)
for _ in range(max_iter):
# Insert pre-written heading if deficit exists and we're at a paragraph break
if h2_needed > 0 and h2_headings and count % 5 == 0:
text = h2_headings[h2_idx % len(h2_headings)]
w, e, v = self.count_sentence(text)
self.new_words += w
self.new_entity_mentions += e
self.new_variation_mentions += v
self.new_h2_count += 1
h2_needed -= 1
h2_idx += 1
sentences.append({"text": text, "type": "h2"})
count += 1
continue
if h3_needed > 0 and h3_headings and count > 0 and count % 3 == 0:
text = h3_headings[h3_idx % len(h3_headings)]
w, e, v = self.count_sentence(text)
self.new_words += w
self.new_entity_mentions += e
self.new_variation_mentions += v
self.new_h3_count += 1
h3_needed -= 1
h3_idx += 1
sentences.append({"text": text, "type": "h3"})
count += 1
continue
# Body sentence — fill template slots
tmpl = body_templates[body_idx % len(body_templates)]
filled = self.fill_template(tmpl)
w, e, v = self.count_sentence(filled)
self.new_words += w
self.new_entity_mentions += e
self.new_variation_mentions += v
body_idx += 1
sentences.append({"text": filled, "type": "body"})
count += 1
if self.targets_met(count >= min_sentences):
break
return {
"sentences": sentences,
"stats": {
"total_sentences": count,
"new_words": self.new_words,
"new_entity_mentions": self.new_entity_mentions,
"new_variation_mentions": self.new_variation_mentions,
"new_distinct_entities_introduced": len(self.introduced),
"introduced_entities": sorted(self.introduced),
"new_h2_count": self.new_h2_count,
"new_h3_count": self.new_h3_count,
"projected_entity_density_pct": round(
self.projected_density("entity") * 100, 2
),
"projected_variation_density_pct": round(
self.projected_density("variation") * 100, 2
),
"target_entity_density_pct": round(self.target_entity_d * 100, 2),
"target_variation_density_pct": round(self.target_variation_d * 100, 2),
},
}
# ---------------------------------------------------------------------------
# Output formatting
# ---------------------------------------------------------------------------
def format_markdown(sentences: list[dict]) -> str:
"""Convert sentence list to markdown with test block markers."""
lines = ["<!-- HIDDEN TEST BLOCK START -->", ""]
paragraph = []
for s in sentences:
if s["type"] in ("h2", "h3"):
# Flush paragraph before heading
if paragraph:
lines.append(" ".join(paragraph))
lines.append("")
paragraph = []
prefix = "##" if s["type"] == "h2" else "###"
lines.append(f"{prefix} {s['text']}")
lines.append("")
else:
paragraph.append(s["text"])
if len(paragraph) >= 4:
lines.append(" ".join(paragraph))
lines.append("")
paragraph = []
if paragraph:
lines.append(" ".join(paragraph))
lines.append("")
lines.append("<!-- HIDDEN TEST BLOCK END -->")
return "\n".join(lines)
def format_html(sentences: list[dict]) -> str:
"""Convert sentence list to HTML with test block markers."""
lines = ["<!-- HIDDEN TEST BLOCK START -->", ""]
paragraph = []
for s in sentences:
if s["type"] in ("h2", "h3"):
if paragraph:
lines.append("<p>" + " ".join(paragraph) + "</p>")
lines.append("")
paragraph = []
tag = "h2" if s["type"] == "h2" else "h3"
lines.append(f"<{tag}>{s['text']}</{tag}>")
lines.append("")
else:
paragraph.append(s["text"])
if len(paragraph) >= 4:
lines.append("<p>" + " ".join(paragraph) + "</p>")
lines.append("")
paragraph = []
if paragraph:
lines.append("<p>" + " ".join(paragraph) + "</p>")
lines.append("")
lines.append("<!-- HIDDEN TEST BLOCK END -->")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Generate a test block from templates and deficit data.",
)
parser.add_argument("templates_path", help="Path to templates file (one per line)")
parser.add_argument("prep_json_path", help="Path to prep JSON from test_block_prep.py")
parser.add_argument("cora_xlsx_path", help="Path to Cora XLSX report")
parser.add_argument(
"--entities-file", required=True,
help="Path to LLM-curated entity list (one name per line)",
)
parser.add_argument(
"--output-dir", default="./working",
help="Directory for output files (default: ./working)",
)
parser.add_argument(
"--min-sentences", type=int, default=5,
help="Minimum sentences before checking stop condition (default: 5)",
)
args = parser.parse_args()
# Load inputs
templates_path = Path(args.templates_path)
if not templates_path.exists():
print(f"Error: templates file not found: {templates_path}", file=sys.stderr)
sys.exit(1)
templates = [
line.strip()
for line in templates_path.read_text(encoding="utf-8").splitlines()
if line.strip()
]
prep_path = Path(args.prep_json_path)
if not prep_path.exists():
print(f"Error: prep JSON not found: {prep_path}", file=sys.stderr)
sys.exit(1)
prep_data = json.loads(prep_path.read_text(encoding="utf-8"))
# Load LLM-curated entity list
filtered_entity_names = load_entity_names(args.entities_file)
# Generate
gen = TestBlockGenerator(args.cora_xlsx_path, prep_data, filtered_entity_names)
result = gen.generate(templates, min_sentences=args.min_sentences)
if "error" in result and result["error"]:
print(f"Error: {result['error']}", file=sys.stderr)
sys.exit(1)
# Write outputs
out_dir = Path(args.output_dir)
out_dir.mkdir(parents=True, exist_ok=True)
md_path = out_dir / "test_block.md"
html_path = out_dir / "test_block.html"
txt_path = out_dir / "test_block.txt"
stats_path = out_dir / "test_block_stats.json"
md_content = format_markdown(result["sentences"])
html_content = format_html(result["sentences"])
md_path.write_text(md_content, encoding="utf-8")
html_path.write_text(html_content, encoding="utf-8")
txt_path.write_text(html_content, encoding="utf-8")
stats_path.write_text(
json.dumps(result["stats"], indent=2, default=str), encoding="utf-8"
)
# Print summary
stats = result["stats"]
print(f"Test block generated:")
print(f" Sentences: {stats['total_sentences']}")
print(f" Words: {stats['new_words']}")
print(f" Entity mentions: {stats['new_entity_mentions']}")
print(f" Variation mentions: {stats['new_variation_mentions']}")
print(f" New 0->1 entities: {stats['new_distinct_entities_introduced']}")
print(f" Projected entity density: {stats['projected_entity_density_pct']}%"
f" (target: {stats['target_entity_density_pct']}%)")
print(f" Projected variation density: {stats['projected_variation_density_pct']}%"
f" (target: {stats['target_variation_density_pct']}%)")
print(f"\nFiles written:")
print(f" {md_path}")
print(f" {html_path}")
print(f" {txt_path}")
print(f" {stats_path}")
if __name__ == "__main__":
main()

View File

@ -1,578 +0,0 @@
#!/usr/bin/env python3
"""
Test Block Prep Extract Deficit Data for Test Block Generation
Reads existing content (from competitor_scraper.py output or plain text) and a
Cora XLSX report, then calculates all deficit metrics needed to programmatically
generate a test block.
Outputs structured JSON with:
- Word count vs target + deficit
- Distinct entity count vs target + deficit + list of missing entities
- Variation density vs target + deficit (Cora row 46)
- Entity density vs target + deficit (Cora row 47)
- LSI density vs target + deficit (Cora row 48)
- Heading structure deficits
- Template generation instructions (slots per sentence, sentence count, etc.)
Usage:
uv run --with openpyxl python test_block_prep.py <content_path> <cora_xlsx_path>
[--format json|text]
"""
import argparse
import json
import math
import re
import sys
from pathlib import Path
from cora_parser import CoraReport
# ---------------------------------------------------------------------------
# Content parsing
# ---------------------------------------------------------------------------
def parse_scraper_content(file_path: str) -> dict:
"""Parse a competitor_scraper.py output file or plain text/markdown.
Returns dict with: headings, content, word_count, title, meta_description.
"""
text = Path(file_path).read_text(encoding="utf-8")
result = {
"headings": [],
"content": "",
"word_count": 0,
"title": "",
"meta_description": "",
}
if "--- HEADINGS ---" in text and "--- CONTENT ---" in text:
headings_start = text.index("--- HEADINGS ---")
content_start = text.index("--- CONTENT ---")
# Parse metadata
metadata = text[:headings_start]
for line in metadata.splitlines():
if line.startswith("Title: "):
result["title"] = line[7:].strip()
elif line.startswith("Meta Description: "):
result["meta_description"] = line[18:].strip()
# Parse headings
headings_text = text[headings_start + len("--- HEADINGS ---"):content_start].strip()
for line in headings_text.splitlines():
line = line.strip()
match = re.match(r"H(\d):\s+(.+)", line)
if match:
result["headings"].append({
"level": int(match.group(1)),
"text": match.group(2).strip(),
})
# Parse content
result["content"] = text[content_start + len("--- CONTENT ---"):].strip()
else:
# Plain text/markdown
result["content"] = text.strip()
for match in re.finditer(r"^(#{1,6})\s+(.+)$", text, re.MULTILINE):
result["headings"].append({
"level": len(match.group(1)),
"text": match.group(2).strip(),
})
words = re.findall(r"[a-zA-Z']+", result["content"])
result["word_count"] = len(words)
return result
# ---------------------------------------------------------------------------
# Counting functions
# ---------------------------------------------------------------------------
def count_entity_mentions(text: str, entities: list[dict]) -> dict:
"""Count mentions of each Cora entity in text.
Returns: per_entity dict, total_mentions, distinct_count.
"""
per_entity = {}
total_mentions = 0
distinct_count = 0
for entity in entities:
name = entity["name"]
pattern = re.compile(r"\b" + re.escape(name) + r"\b", re.IGNORECASE)
count = len(pattern.findall(text))
per_entity[name] = count
total_mentions += count
if count > 0:
distinct_count += 1
return {
"per_entity": per_entity,
"total_mentions": total_mentions,
"distinct_count": distinct_count,
}
def count_variation_mentions(text: str, variations: list[str]) -> dict:
"""Count mentions of each keyword variation in text.
Returns: per_variation dict, total_mentions.
"""
per_variation = {}
total_mentions = 0
for var in variations:
pattern = re.compile(r"\b" + re.escape(var) + r"\b", re.IGNORECASE)
count = len(pattern.findall(text))
per_variation[var] = count
total_mentions += count
return {
"per_variation": per_variation,
"total_mentions": total_mentions,
}
def count_lsi_mentions(text: str, lsi_keywords: list[dict]) -> dict:
"""Count mentions of each LSI keyword in text.
Returns: per_keyword dict, total_mentions, distinct_count.
"""
per_keyword = {}
total_mentions = 0
distinct_count = 0
for kw_data in lsi_keywords:
keyword = kw_data["keyword"]
tokens = keyword.strip().split()
escaped = [re.escape(t) for t in tokens]
pattern_str = r"\b" + r"\s+".join(escaped) + r"\b"
pattern = re.compile(pattern_str, re.IGNORECASE)
count = len(pattern.findall(text))
per_keyword[keyword] = count
total_mentions += count
if count > 0:
distinct_count += 1
return {
"per_keyword": per_keyword,
"total_mentions": total_mentions,
"distinct_count": distinct_count,
}
def count_terms_in_headings(
headings: list[dict],
entities: list[dict],
variations: list[str],
) -> dict:
"""Count entity and variation mentions in heading text.
Returns total counts and per-level breakdown.
"""
all_heading_text = " ".join(h["text"] for h in headings)
entity_mentions = 0
for entity in entities:
pattern = re.compile(r"\b" + re.escape(entity["name"]) + r"\b", re.IGNORECASE)
entity_mentions += len(pattern.findall(all_heading_text))
variation_mentions = 0
for var in variations:
pattern = re.compile(r"\b" + re.escape(var) + r"\b", re.IGNORECASE)
variation_mentions += len(pattern.findall(all_heading_text))
per_level = {}
for level in [2, 3]:
level_headings = [h for h in headings if h["level"] == level]
level_text = " ".join(h["text"] for h in level_headings)
lev_entity = 0
for entity in entities:
pattern = re.compile(r"\b" + re.escape(entity["name"]) + r"\b", re.IGNORECASE)
lev_entity += len(pattern.findall(level_text))
lev_var = 0
for var in variations:
pattern = re.compile(r"\b" + re.escape(var) + r"\b", re.IGNORECASE)
lev_var += len(pattern.findall(level_text))
per_level[f"h{level}"] = {
"count": len(level_headings),
"entity_mentions": lev_entity,
"variation_mentions": lev_var,
}
return {
"entity_mentions_total": entity_mentions,
"variation_mentions_total": variation_mentions,
"per_level": per_level,
}
# ---------------------------------------------------------------------------
# Template instruction calculation
# ---------------------------------------------------------------------------
def calculate_template_instructions(
current_words: int,
current_entity_mentions: int,
current_variation_mentions: int,
target_entity_density: float,
target_variation_density: float,
distinct_entity_deficit: int,
word_count_deficit: int,
) -> dict:
"""Calculate template parameters for the generator script.
Figures out how many words the test block needs, how many slots per
sentence, and how many sentences so the LLM knows what to generate.
"""
AVG_WORDS_PER_SENTENCE = 15
MAX_SLOTS = 5
MIN_SLOTS = 2
current_entity_density = current_entity_mentions / current_words if current_words > 0 else 0
current_variation_density = current_variation_mentions / current_words if current_words > 0 else 0
# Minimum test block size from word count deficit
min_words = max(word_count_deficit, 150)
# Calculate minimum words needed to close entity density gap
entity_deficit_pct = target_entity_density - current_entity_density
if entity_deficit_pct > 0:
# At max internal density (MAX_SLOTS / AVG_WORDS), how many words?
max_internal = MAX_SLOTS / AVG_WORDS_PER_SENTENCE
if max_internal > target_entity_density:
needed = (target_entity_density * current_words - current_entity_mentions)
words_for_entity = math.ceil(needed / (max_internal - target_entity_density))
min_words = max(min_words, words_for_entity)
# Same for variation density gap
var_deficit_pct = target_variation_density - current_variation_density
if var_deficit_pct > 0:
max_internal = MAX_SLOTS / AVG_WORDS_PER_SENTENCE
if max_internal > target_variation_density:
needed = (target_variation_density * current_words - current_variation_mentions)
words_for_var = math.ceil(needed / (max_internal - target_variation_density))
min_words = max(min_words, words_for_var)
# If only distinct entities are deficit (densities met), smaller block
if entity_deficit_pct <= 0 and var_deficit_pct <= 0 and distinct_entity_deficit > 0:
min_words = max(150, distinct_entity_deficit * AVG_WORDS_PER_SENTENCE)
# Round up to nearest 50
target_words = math.ceil(max(min_words, 150) / 50) * 50
# Required entity mentions in test block
if target_entity_density > 0:
total_needed = math.ceil(target_entity_density * (current_words + target_words))
entity_mentions_needed = max(0, total_needed - current_entity_mentions)
else:
entity_mentions_needed = max(distinct_entity_deficit, 0)
# Required variation mentions in test block
if target_variation_density > 0:
total_needed = math.ceil(target_variation_density * (current_words + target_words))
variation_mentions_needed = max(0, total_needed - current_variation_mentions)
else:
variation_mentions_needed = 0
# Derive slots per sentence
target_sentences = max(1, math.ceil(target_words / AVG_WORDS_PER_SENTENCE))
total_slots = entity_mentions_needed + variation_mentions_needed
# Overlapping terms count toward both, so reduce estimate
total_slots = max(total_slots, entity_mentions_needed)
slots_per_sentence = math.ceil(total_slots / target_sentences) if target_sentences > 0 else MIN_SLOTS
slots_per_sentence = max(MIN_SLOTS, min(MAX_SLOTS, slots_per_sentence))
# Number of templates: derived from two factors
# 1. Word deficit: how many sentences to fill the word gap
word_driven = math.ceil(target_words / AVG_WORDS_PER_SENTENCE)
# 2. Entity deficit: how many sentences to introduce all missing entities
entity_driven = math.ceil(distinct_entity_deficit / slots_per_sentence) if slots_per_sentence > 0 else 0
num_templates = max(word_driven, entity_driven, 5)
return {
"target_word_count": target_words,
"num_templates": num_templates,
"num_templates_reason": "word_deficit" if word_driven >= entity_driven else "entity_deficit",
"slots_per_sentence": slots_per_sentence,
"avg_words_per_template": AVG_WORDS_PER_SENTENCE,
"entity_mentions_needed": entity_mentions_needed,
"variation_mentions_needed": variation_mentions_needed,
"rationale": (
f"Need ~{entity_mentions_needed} entity mentions and "
f"~{variation_mentions_needed} variation mentions "
f"across ~{target_words} words. "
f"Templates: {num_templates} (driven by {'word deficit' if word_driven >= entity_driven else 'entity deficit'}), "
f"{slots_per_sentence} slots each."
),
}
# ---------------------------------------------------------------------------
# Main prep function
# ---------------------------------------------------------------------------
def run_prep(content_path: str, cora_xlsx_path: str) -> dict:
"""Run the full test block prep analysis."""
report = CoraReport(cora_xlsx_path)
entities = report.get_entities()
lsi_keywords = report.get_lsi_keywords()
variations_list = report.get_variations_list()
density_targets = report.get_density_targets()
content_targets = report.get_content_targets()
structure_targets = report.get_structure_targets()
word_count_dist = report.get_word_count_distribution()
# Parse existing content
parsed = parse_scraper_content(content_path)
content_text = parsed["content"]
current_words = parsed["word_count"]
headings = parsed["headings"]
# --- Word count ---
cluster_target = word_count_dist.get("cluster_target", 0)
wc_target = cluster_target if cluster_target else word_count_dist.get("average", 0)
wc_deficit = max(0, wc_target - current_words)
# --- Entity counts ---
entity_data = count_entity_mentions(content_text, entities)
distinct_target = content_targets.get("distinct_entities", {}).get("target", 0)
distinct_deficit = max(0, distinct_target - entity_data["distinct_count"])
# Missing entities (0 count, sorted by relevance)
missing_entities = []
for entity in entities:
if entity_data["per_entity"].get(entity["name"], 0) == 0:
missing_entities.append({
"name": entity["name"],
"relevance": entity.get("relevance") or 0,
"type": entity.get("type", ""),
})
missing_entities.sort(key=lambda e: e["relevance"], reverse=True)
# --- Variation counts ---
variation_data = count_variation_mentions(content_text, variations_list)
# --- LSI counts ---
lsi_data = count_lsi_mentions(content_text, lsi_keywords)
# --- Density calculations ---
cur_entity_d = entity_data["total_mentions"] / current_words if current_words else 0
cur_var_d = variation_data["total_mentions"] / current_words if current_words else 0
cur_lsi_d = lsi_data["total_mentions"] / current_words if current_words else 0
tgt_entity_d = density_targets.get("entity_density", {}).get("avg") or 0
tgt_var_d = density_targets.get("variation_density", {}).get("avg") or 0
tgt_lsi_d = density_targets.get("lsi_density", {}).get("avg") or 0
# --- Heading analysis ---
heading_data = count_terms_in_headings(headings, entities, variations_list)
h2_target = structure_targets.get("h2", {}).get("count", {}).get("target", 0)
h3_target = structure_targets.get("h3", {}).get("count", {}).get("target", 0)
h2_current = heading_data["per_level"].get("h2", {}).get("count", 0)
h3_current = heading_data["per_level"].get("h3", {}).get("count", 0)
all_h_var_target = structure_targets.get("all_h_tags", {}).get("variations", {}).get("target", 0)
all_h_ent_target = structure_targets.get("all_h_tags", {}).get("entities", {}).get("target", 0)
# --- Template instructions ---
template_inst = calculate_template_instructions(
current_words=current_words,
current_entity_mentions=entity_data["total_mentions"],
current_variation_mentions=variation_data["total_mentions"],
target_entity_density=tgt_entity_d,
target_variation_density=tgt_var_d,
distinct_entity_deficit=distinct_deficit,
word_count_deficit=wc_deficit,
)
return {
"search_term": report.get_search_term(),
"content_file": content_path,
"word_count": {
"current": current_words,
"target": wc_target,
"deficit": wc_deficit,
"status": "meets_target" if wc_deficit == 0 else "below_target",
},
"distinct_entities": {
"current": entity_data["distinct_count"],
"target": distinct_target,
"deficit": distinct_deficit,
"total_tracked": len(entities),
"missing_entities": missing_entities,
},
"entity_density": {
"current_pct": round(cur_entity_d * 100, 2),
"target_pct": round(tgt_entity_d * 100, 2),
"deficit_pct": round(max(0, tgt_entity_d - cur_entity_d) * 100, 2),
"current_mentions": entity_data["total_mentions"],
"target_decimal": tgt_entity_d,
"current_decimal": cur_entity_d,
"status": "meets_target" if cur_entity_d >= tgt_entity_d else "below_target",
},
"variation_density": {
"current_pct": round(cur_var_d * 100, 2),
"target_pct": round(tgt_var_d * 100, 2),
"deficit_pct": round(max(0, tgt_var_d - cur_var_d) * 100, 2),
"current_mentions": variation_data["total_mentions"],
"target_decimal": tgt_var_d,
"current_decimal": cur_var_d,
"status": "meets_target" if cur_var_d >= tgt_var_d else "below_target",
},
"lsi_density": {
"current_pct": round(cur_lsi_d * 100, 2),
"target_pct": round(tgt_lsi_d * 100, 2),
"deficit_pct": round(max(0, tgt_lsi_d - cur_lsi_d) * 100, 2),
"current_mentions": lsi_data["total_mentions"],
"target_decimal": tgt_lsi_d,
"current_decimal": cur_lsi_d,
"status": "meets_target" if cur_lsi_d >= tgt_lsi_d else "below_target",
},
"headings": {
"h2": {
"current": h2_current,
"target": h2_target,
"deficit": max(0, h2_target - h2_current),
},
"h3": {
"current": h3_current,
"target": h3_target,
"deficit": max(0, h3_target - h3_current),
},
"variations_in_headings": {
"current": heading_data["variation_mentions_total"],
"target": all_h_var_target,
"deficit": max(0, all_h_var_target - heading_data["variation_mentions_total"]),
},
"entities_in_headings": {
"current": heading_data["entity_mentions_total"],
"target": all_h_ent_target,
"deficit": max(0, all_h_ent_target - heading_data["entity_mentions_total"]),
},
},
"template_instructions": template_inst,
}
# ---------------------------------------------------------------------------
# Output formatting
# ---------------------------------------------------------------------------
def format_text_report(data: dict) -> str:
"""Format prep data as a human-readable text report."""
lines = []
sep = "=" * 65
lines.append(sep)
lines.append(f" TEST BLOCK PREP — {data['search_term']}")
lines.append(sep)
lines.append("")
# Word count
wc = data["word_count"]
lines.append("WORD COUNT")
lines.append(f" Current: {wc['current']} | Target: {wc['target']} | Deficit: {wc['deficit']} [{wc['status']}]")
lines.append("")
# Distinct entities
de = data["distinct_entities"]
lines.append("DISTINCT ENTITIES")
lines.append(f" Current: {de['current']} | Target: {de['target']} | Deficit: {de['deficit']} (of {de['total_tracked']} tracked)")
if de["missing_entities"]:
lines.append(f" Top missing (0->1):")
for ent in de["missing_entities"][:15]:
lines.append(f" - {ent['name']} (relevance: {ent['relevance']}, type: {ent['type']})")
remaining = len(de["missing_entities"]) - 15
if remaining > 0:
lines.append(f" ... and {remaining} more")
lines.append("")
# Entity density
ed = data["entity_density"]
lines.append("ENTITY DENSITY (Cora row 47)")
lines.append(f" Current: {ed['current_pct']}% | Target: {ed['target_pct']}% | Deficit: {ed['deficit_pct']}% [{ed['status']}]")
lines.append(f" Current mentions: {ed['current_mentions']}")
lines.append("")
# Variation density
vd = data["variation_density"]
lines.append("VARIATION DENSITY (Cora row 46)")
lines.append(f" Current: {vd['current_pct']}% | Target: {vd['target_pct']}% | Deficit: {vd['deficit_pct']}% [{vd['status']}]")
lines.append(f" Current mentions: {vd['current_mentions']}")
lines.append("")
# LSI density
ld = data["lsi_density"]
lines.append("LSI DENSITY (Cora row 48)")
lines.append(f" Current: {ld['current_pct']}% | Target: {ld['target_pct']}% | Deficit: {ld['deficit_pct']}% [{ld['status']}]")
lines.append(f" Current mentions: {ld['current_mentions']}")
lines.append("")
# Headings
hd = data["headings"]
lines.append("HEADING DEFICITS")
lines.append(f" H2: {hd['h2']['current']} current / {hd['h2']['target']} target -- deficit {hd['h2']['deficit']}")
lines.append(f" H3: {hd['h3']['current']} current / {hd['h3']['target']} target -- deficit {hd['h3']['deficit']}")
lines.append(f" Variations in headings: {hd['variations_in_headings']['current']} / {hd['variations_in_headings']['target']} -- deficit {hd['variations_in_headings']['deficit']}")
lines.append(f" Entities in headings: {hd['entities_in_headings']['current']} / {hd['entities_in_headings']['target']} -- deficit {hd['entities_in_headings']['deficit']}")
lines.append("")
# Template instructions
ti = data["template_instructions"]
lines.append("TEMPLATE INSTRUCTIONS")
lines.append(f" {ti['rationale']}")
lines.append(f" >> Generate {ti['num_templates']} templates, ~{ti['avg_words_per_template']} words each, {ti['slots_per_sentence']} slots per template")
lines.append("")
lines.append(sep)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Extract deficit data for test block generation.",
)
parser.add_argument("content_path", help="Path to scraper output or content file")
parser.add_argument("cora_xlsx_path", help="Path to Cora XLSX report")
parser.add_argument(
"--format", choices=["json", "text"], default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--output", "-o", default=None,
help="Write output to file instead of stdout",
)
args = parser.parse_args()
try:
data = run_prep(args.content_path, args.cora_xlsx_path)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if args.format == "json":
output = json.dumps(data, indent=2, default=str)
else:
output = format_text_report(data)
if args.output:
Path(args.output).write_text(output, encoding="utf-8")
print(f"Written to {args.output}", file=sys.stderr)
else:
print(output)
if __name__ == "__main__":
main()

View File

@ -1,378 +0,0 @@
#!/usr/bin/env python3
"""
Test Block Validator Before/After Comparison
Runs the same deficit analysis from test_block_prep.py on:
1. Existing content alone (before)
2. Existing content + test block (after)
Produces a deterministic comparison showing exactly how each metric changed.
Usage:
uv run --with openpyxl python test_block_validate.py <content_path> <test_block_path> <cora_xlsx_path>
[--format json|text] [--output PATH]
"""
import argparse
import json
import re
import sys
from pathlib import Path
from cora_parser import CoraReport
from test_block_prep import (
parse_scraper_content,
count_entity_mentions,
count_variation_mentions,
count_lsi_mentions,
count_terms_in_headings,
)
def extract_test_block_text(file_path: str) -> str:
"""Read test block file and return the text content.
Strips HTML tags and test block markers. Returns plain text for counting.
"""
text = Path(file_path).read_text(encoding="utf-8")
# Remove test block markers
text = text.replace("<!-- HIDDEN TEST BLOCK START -->", "")
text = text.replace("<!-- HIDDEN TEST BLOCK END -->", "")
# Remove HTML tags
text = re.sub(r"<[^>]+>", " ", text)
# Remove markdown heading markers
text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
return text.strip()
def extract_test_block_headings(file_path: str) -> list[dict]:
"""Extract heading structure from test block (HTML or markdown)."""
text = Path(file_path).read_text(encoding="utf-8")
headings = []
# Try HTML headings first
for match in re.finditer(r"<h(\d)>(.+?)</h\d>", text, re.IGNORECASE):
headings.append({
"level": int(match.group(1)),
"text": match.group(2).strip(),
})
# If no HTML headings, try markdown
if not headings:
for match in re.finditer(r"^(#{1,6})\s+(.+)$", text, re.MULTILINE):
headings.append({
"level": len(match.group(1)),
"text": match.group(2).strip(),
})
return headings
def run_validation(
content_path: str,
test_block_path: str,
cora_xlsx_path: str,
) -> dict:
"""Run before/after validation.
Returns dict with: before, after, delta, targets, status.
"""
report = CoraReport(cora_xlsx_path)
entities = report.get_entities()
lsi_keywords = report.get_lsi_keywords()
variations_list = report.get_variations_list()
density_targets = report.get_density_targets()
content_targets = report.get_content_targets()
structure_targets = report.get_structure_targets()
word_count_dist = report.get_word_count_distribution()
# --- Parse existing content ---
parsed = parse_scraper_content(content_path)
existing_text = parsed["content"]
existing_headings = parsed["headings"]
# --- Parse test block ---
block_text = extract_test_block_text(test_block_path)
block_headings = extract_test_block_headings(test_block_path)
# --- Combined ---
combined_text = existing_text + "\n\n" + block_text
combined_headings = existing_headings + block_headings
# --- Count words ---
count_words = lambda t: len(re.findall(r"[a-zA-Z']+", t))
before_words = count_words(existing_text)
block_words = count_words(block_text)
after_words = count_words(combined_text)
# --- Count entities ---
before_ent = count_entity_mentions(existing_text, entities)
after_ent = count_entity_mentions(combined_text, entities)
# --- Count variations ---
before_var = count_variation_mentions(existing_text, variations_list)
after_var = count_variation_mentions(combined_text, variations_list)
# --- Count LSI ---
before_lsi = count_lsi_mentions(existing_text, lsi_keywords)
after_lsi = count_lsi_mentions(combined_text, lsi_keywords)
# --- Heading analysis ---
before_hdg = count_terms_in_headings(existing_headings, entities, variations_list)
after_hdg = count_terms_in_headings(combined_headings, entities, variations_list)
# --- Targets ---
tgt_entity_d = density_targets.get("entity_density", {}).get("avg") or 0
tgt_var_d = density_targets.get("variation_density", {}).get("avg") or 0
tgt_lsi_d = density_targets.get("lsi_density", {}).get("avg") or 0
distinct_target = content_targets.get("distinct_entities", {}).get("target", 0)
cluster_target = word_count_dist.get("cluster_target", 0)
wc_target = cluster_target if cluster_target else word_count_dist.get("average", 0)
h2_target = structure_targets.get("h2", {}).get("count", {}).get("target", 0)
h3_target = structure_targets.get("h3", {}).get("count", {}).get("target", 0)
# --- Build comparison ---
def density(mentions, words):
return mentions / words if words > 0 else 0
def pct(d):
return round(d * 100, 2)
# Find new 0->1 entities
new_entities = []
for name, after_count in after_ent["per_entity"].items():
before_count = before_ent["per_entity"].get(name, 0)
if before_count == 0 and after_count > 0:
new_entities.append(name)
before_h2 = len([h for h in existing_headings if h["level"] == 2])
after_h2 = len([h for h in combined_headings if h["level"] == 2])
before_h3 = len([h for h in existing_headings if h["level"] == 3])
after_h3 = len([h for h in combined_headings if h["level"] == 3])
return {
"search_term": report.get_search_term(),
"test_block_words": block_words,
"word_count": {
"before": before_words,
"after": after_words,
"target": wc_target,
"before_status": "meets" if before_words >= wc_target else "below",
"after_status": "meets" if after_words >= wc_target else "below",
},
"distinct_entities": {
"before": before_ent["distinct_count"],
"after": after_ent["distinct_count"],
"target": distinct_target,
"new_0_to_1": len(new_entities),
"new_entity_names": sorted(new_entities),
"before_status": "meets" if before_ent["distinct_count"] >= distinct_target else "below",
"after_status": "meets" if after_ent["distinct_count"] >= distinct_target else "below",
},
"entity_density": {
"before_pct": pct(density(before_ent["total_mentions"], before_words)),
"after_pct": pct(density(after_ent["total_mentions"], after_words)),
"target_pct": pct(tgt_entity_d),
"before_mentions": before_ent["total_mentions"],
"after_mentions": after_ent["total_mentions"],
"delta_mentions": after_ent["total_mentions"] - before_ent["total_mentions"],
"before_status": "meets" if density(before_ent["total_mentions"], before_words) >= tgt_entity_d else "below",
"after_status": "meets" if density(after_ent["total_mentions"], after_words) >= tgt_entity_d else "below",
},
"variation_density": {
"before_pct": pct(density(before_var["total_mentions"], before_words)),
"after_pct": pct(density(after_var["total_mentions"], after_words)),
"target_pct": pct(tgt_var_d),
"before_mentions": before_var["total_mentions"],
"after_mentions": after_var["total_mentions"],
"delta_mentions": after_var["total_mentions"] - before_var["total_mentions"],
"before_status": "meets" if density(before_var["total_mentions"], before_words) >= tgt_var_d else "below",
"after_status": "meets" if density(after_var["total_mentions"], after_words) >= tgt_var_d else "below",
},
"lsi_density": {
"before_pct": pct(density(before_lsi["total_mentions"], before_words)),
"after_pct": pct(density(after_lsi["total_mentions"], after_words)),
"target_pct": pct(tgt_lsi_d),
"before_mentions": before_lsi["total_mentions"],
"after_mentions": after_lsi["total_mentions"],
"delta_mentions": after_lsi["total_mentions"] - before_lsi["total_mentions"],
"before_status": "meets" if density(before_lsi["total_mentions"], before_words) >= tgt_lsi_d else "below",
"after_status": "meets" if density(after_lsi["total_mentions"], after_words) >= tgt_lsi_d else "below",
},
"headings": {
"h2": {
"before": before_h2,
"after": after_h2,
"target": h2_target,
},
"h3": {
"before": before_h3,
"after": after_h3,
"target": h3_target,
},
"entities_in_headings": {
"before": before_hdg["entity_mentions_total"],
"after": after_hdg["entity_mentions_total"],
},
"variations_in_headings": {
"before": before_hdg["variation_mentions_total"],
"after": after_hdg["variation_mentions_total"],
},
},
}
# ---------------------------------------------------------------------------
# Output formatting
# ---------------------------------------------------------------------------
def format_text_report(data: dict) -> str:
"""Format validation as a human-readable before/after comparison."""
lines = []
sep = "=" * 70
lines.append(sep)
lines.append(f" TEST BLOCK VALIDATION -- {data['search_term']}")
lines.append(f" Test block added {data['test_block_words']} words")
lines.append(sep)
lines.append("")
# Helper for status indicator
def status(s):
return "[OK]" if s == "meets" else "[!!]"
# Word count
wc = data["word_count"]
lines.append(f" {'METRIC':<30} {'BEFORE':>10} {'AFTER':>10} {'TARGET':>10} {'STATUS':>8}")
lines.append(f" {'-'*30} {'-'*10} {'-'*10} {'-'*10} {'-'*8}")
lines.append(
f" {'Word count':<30} {wc['before']:>10} {wc['after']:>10} "
f"{wc['target']:>10} {status(wc['after_status']):>8}"
)
# Distinct entities
de = data["distinct_entities"]
lines.append(
f" {'Distinct entities':<30} {de['before']:>10} {de['after']:>10} "
f"{de['target']:>10} {status(de['after_status']):>8}"
)
# Entity density
ed = data["entity_density"]
lines.append(
f" {'Entity density %':<30} {ed['before_pct']:>9}% {ed['after_pct']:>9}% "
f"{ed['target_pct']:>9}% {status(ed['after_status']):>8}"
)
# Variation density
vd = data["variation_density"]
lines.append(
f" {'Variation density %':<30} {vd['before_pct']:>9}% {vd['after_pct']:>9}% "
f"{vd['target_pct']:>9}% {status(vd['after_status']):>8}"
)
# LSI density
ld = data["lsi_density"]
lines.append(
f" {'LSI density %':<30} {ld['before_pct']:>9}% {ld['after_pct']:>9}% "
f"{ld['target_pct']:>9}% {status(ld['after_status']):>8}"
)
lines.append("")
# Mention counts
lines.append(f" {'MENTION COUNTS':<30} {'BEFORE':>10} {'AFTER':>10} {'DELTA':>10}")
lines.append(f" {'-'*30} {'-'*10} {'-'*10} {'-'*10}")
lines.append(
f" {'Entity mentions':<30} {ed['before_mentions']:>10} "
f"{ed['after_mentions']:>10} {'+' + str(ed['delta_mentions']):>10}"
)
lines.append(
f" {'Variation mentions':<30} {vd['before_mentions']:>10} "
f"{vd['after_mentions']:>10} {'+' + str(vd['delta_mentions']):>10}"
)
lines.append(
f" {'LSI mentions':<30} {ld['before_mentions']:>10} "
f"{ld['after_mentions']:>10} {'+' + str(ld['delta_mentions']):>10}"
)
lines.append("")
# Headings
hd = data["headings"]
lines.append(f" {'HEADINGS':<30} {'BEFORE':>10} {'AFTER':>10} {'TARGET':>10}")
lines.append(f" {'-'*30} {'-'*10} {'-'*10} {'-'*10}")
lines.append(f" {'H2 count':<30} {hd['h2']['before']:>10} {hd['h2']['after']:>10} {hd['h2']['target']:>10}")
lines.append(f" {'H3 count':<30} {hd['h3']['before']:>10} {hd['h3']['after']:>10} {hd['h3']['target']:>10}")
lines.append(
f" {'Entities in headings':<30} {hd['entities_in_headings']['before']:>10} "
f"{hd['entities_in_headings']['after']:>10}"
)
lines.append(
f" {'Variations in headings':<30} {hd['variations_in_headings']['before']:>10} "
f"{hd['variations_in_headings']['after']:>10}"
)
lines.append("")
# New entities
de = data["distinct_entities"]
if de["new_entity_names"]:
lines.append(f" NEW ENTITIES INTRODUCED (0->1): {de['new_0_to_1']}")
for name in de["new_entity_names"]:
lines.append(f" + {name}")
lines.append("")
lines.append(sep)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Validate a test block with before/after comparison.",
)
parser.add_argument("content_path", help="Path to existing content (scraper output)")
parser.add_argument("test_block_path", help="Path to test block (.md or .html)")
parser.add_argument("cora_xlsx_path", help="Path to Cora XLSX report")
parser.add_argument(
"--format", choices=["json", "text"], default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--output", "-o", default=None,
help="Write output to file instead of stdout",
)
args = parser.parse_args()
try:
data = run_validation(args.content_path, args.test_block_path, args.cora_xlsx_path)
except FileNotFoundError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if args.format == "json":
output = json.dumps(data, indent=2, default=str)
else:
output = format_text_report(data)
if args.output:
Path(args.output).write_text(output, encoding="utf-8")
print(f"Written to {args.output}", file=sys.stderr)
else:
# Handle Windows encoding
try:
print(output)
except UnicodeEncodeError:
sys.stdout.buffer.write(output.encode("utf-8"))
if __name__ == "__main__":
main()

View File

@ -1,583 +0,0 @@
---
name: content-researcher
description: Research, outline, draft, and optimize SEO web content (service pages, blog posts, product pages) against Cora SEO reports. Create new content. Entity, LSI, and keyword density optimization. Generate entity test blocks (hidden divs).
---
# Content Research & Creation Skill
Write and optimize SEO web content — service pages, blog posts, product pages, landing pages. Covers the full pipeline: competitor research, outline, drafting, and quantitative optimization against a Cora SEO report (XLSX).
---
## Invocation
Use this skill when the user asks to write, research, outline, draft, or optimize web content. Common triggers:
- "Write a service page about [topic]"
- "Let's work on the [topic] page"
- "Create content about [topic] for [company]"
- "I have a Cora report for [keyword]"
- "Optimize this page against the Cora report"
- "Help me build an outline for [topic]"
- "Research [topic] and write an article"
- Any mention of writing web pages, blog posts, or SEO content for a website
**Routing logic — ask two questions up front:**
1. "Do you have a Cora report (XLSX) for this keyword?"
2. "Do you have existing content to optimize?" (could be a URL to a live page, pasted text, or a file path)
| Cora report? | Existing content? | Start at |
|--------------|-------------------|----------|
| No | No | Phase 1, Step 1 (full research → draft workflow) |
| Yes | No | Phase 1, Step 1 (research → outline using Cora targets → draft → optimize) |
| Yes | Yes | Phase 2, Step 6 (load Cora, optimize existing content) |
| No | Yes | Ask user to generate the Cora report first — optimization without Cora targets is guesswork |
**Existing content from a URL:** If the user provides a URL to a live page (e.g. their WordPress site), **always use the BS4 competitor scraper** to pull the content — never `web_fetch`. The `web_fetch` tool runs content through an AI summarization layer that loses heading structure, drops sections, and can hallucinate product details. The scraper returns the actual HTML heading hierarchy and verbatim text.
```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "URL" --output-dir ./working/competitor_content/
```
Read the output file, then use the scraped heading structure and body text to build `./working/draft.md`. Preserve the original text verbatim — do not paraphrase or summarize product descriptions, specifications, or technical details. Only restructure headings and add entity/LSI terms where needed for optimization. The user does NOT need to paste or save the content manually.
---
## Phase 1: Research & First Draft
### Step 1 — Topic Input
Collect from the user:
- **Required:** Topic or keyword
- **Optional:** Competitor URLs to examine, industry context, pasted research they've already done, target audience
- **For service pages:** Company name, what services/capabilities they actually offer, what they do NOT offer. This prevents writing claims about capabilities the company doesn't have. Ask explicitly: "Is this a service page? If so, what does the company offer and what should I avoid mentioning?"
For informational/educational articles, company details are less critical — the content is about the topic, not the company. For service pages, company context is mandatory before drafting.
If the user provides their own research (pasted text, notes, URLs), use that as the primary input. Do not redo research the user has already done.
### Step 2 — Competitor Research
Research what competitors are publishing on this topic. Three modes depending on user input:
**Mode A — Claude researches (default):**
Use `web_search` to find the top competitor content for the topic. Use the BS4 competitor scraper (not `web_fetch`) to read the most relevant 5-10 results — this preserves accurate heading structure and verbatim text. Focus on:
- What subtopics they cover
- How they structure their content (H2/H3 breakdown)
- What angles or claims they make
- What they leave out (gaps)
**Mode B — User provides URLs:**
If the user gives specific URLs, use the competitor scraper to bulk-fetch them:
```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py URL1 URL2 URL3 --output-dir ./working/competitor_content/
```
Then read the output files and analyze them.
**Mode C — User provides research:**
If the user pastes in research, notes, or analysis, skip scraping and work from what they gave you.
**Output:** Write a research summary covering:
1. Common themes across competitors (what everyone covers)
2. Content structure patterns (how they organize it)
3. Key entities, terms, and concepts mentioned repeatedly
4. Gaps — what competitors miss or cover poorly
5. Potential unique angles
Save the research summary to `./working/research_summary.md`.
### Step 3 — Build Outline
Using the research summary, build a structured outline:
1. **Generate fan-out queries** — Before structuring the outline, generate 10-15 search queries you would use to thoroughly research this topic. These are the natural "next searches" someone would run after the primary keyword — questions, comparisons, material/process specifics, use-case queries. Examples for "cnc swiss screw machining":
- "what is swiss screw machining"
- "swiss screw machining vs cnc turning"
- "swiss machining tolerances"
- "what materials can be swiss machined"
- "swiss screw machining for medical devices"
- "when to use swiss machining vs conventional lathe"
These queries represent the search cluster around the topic. The more of them the content answers, the more authoritative it becomes across related searches.
2. **Cover the common ground** — Include the themes that all/most competitors address. Missing these makes content look incomplete.
3. **Identify 1-2 unique angles** — Find something competitors are NOT covering well. This is the content's differentiator.
4. **Shape H3 headings from fan-out queries** — Map the strongest fan-out queries to H3 headings. Headings that match real search patterns give the content more surface area across the query cluster. A heading like "What Materials Can Be Swiss Machined?" is better than "Materials" because it mirrors how people actually search.
5. **Structure for scanning** — Use clear H2 sections with H3 subsections. Each H2 should address one major subtopic.
6. **Include notes on each section** — Brief description of what goes in each section and why.
Consult `references/content_frameworks.md` for structural templates (how-to, listicle, comparison, etc.) and select the best fit for the topic.
**IMPORTANT: YOU NEED A CORA REPORT BEFORE building the outline.** The Cora report provides:
- Heading count targets (H2, H3 counts) that shape the outline structure
- Entity lists that inform heading names (pack entity terms into H2/H3 headings)
- Word count targets that determine section depth
- Structure targets (entities per heading level, variations per heading level) that guide how keyword-rich headings should be
If the user has not yet provided the Cora XLSX, **ask for it before proceeding with the outline.** Research can happen without Cora, but the outline should not be built without it.
Save the outline to `./working/outline.md`.
### Step 4 — HUMAN REVIEW (STOP AND WAIT)
**Present the outline to the user and ask:**
> "Here's the outline based on the research. Review it and let me know:
> 1. Any sections to add, remove, or reorder?
> 2. Are the unique angles worth pursuing?
> 3. Any specific points or data you want included?
> 4. Anything else before I draft?"
**Do NOT proceed until the user responds.** This is a critical gate. Incorporate all feedback before moving on.
### Step 5 — Write First Draft
Write the full content based on the approved outline:
- Follow the structure exactly as approved
- Consult `references/brand_guidelines.md` for voice and tone guidance
- Write in clear, scannable paragraphs (max 4 sentences per paragraph)
- Use subheadings every 2-4 paragraphs
- Include lists, examples, and concrete details where appropriate
- Aim for the word count the user specified.
**Fan-Out Query (FOQ) Section:**
After the main content, write a separate FOQ section using the fan-out queries from the outline. This section is **excluded** from word count and heading count targets — it lives outside the core article.
- Each FOQ is an H3 heading phrased as a question
- Answer in 2-3 sentences max, self-contained
- **Restate the question in the answer** — this is the format LLMs and featured snippets prefer for citation: "How does X work? X works by..."
- The user may style these as accordions, FAQ schema, or hidden divs
- Mark the section clearly (e.g. `<!-- FOQ SECTION START -->`) so it's easy to separate from the main content
Save the draft to `./working/draft.md`.
Tell the user: "First draft is ready. If you have a Cora report for this keyword, provide the XLSX path and I'll optimize against it. Otherwise, let me know what changes you'd like."
---
## Phase 2: Cora Optimization
This phase begins when the user provides a Cora XLSX report. The draft may come from Phase 1, or the user may provide an existing draft to optimize.
### Step 6 — Load Cora Report
Parse the Cora XLSX and display a summary of targets:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet summary
```
Show the user:
- Search term and keyword variations
- Entity count and deficit count
- LSI keyword count and deficit count
- Word count target (cluster target, not raw average)
- Density targets (variation, entity, LSI)
- Key optimization rules that will be applied
### Step 7 — Entity Optimization
Run the entity optimizer against the draft:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python entity_optimizer.py "{draft_path}" "{cora_xlsx_path}" --top-n 30
```
Review the output and apply the top recommendations:
- Focus on entities with high relevance AND high remaining deficit
- Add entities naturally — they must fit the context of the section
- Prioritize adding entities to H2 and H3 headings first (these are primary optimization targets)
- Do NOT force entities where they don't make sense — readability always wins
- H1: exactly 1, always. Do not add a second H1.
- H5, H6: ignore completely
- H4: only add if most competitors have them
After applying entity changes, save the updated draft.
### Step 8 — LSI Keyword Optimization
Run the LSI optimizer:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python lsi_optimizer.py "{draft_path}" "{cora_xlsx_path}" --min-correlation 0.2 --top-n 50
```
Apply LSI keyword recommendations:
- Focus on keywords with strongest correlation (highest absolute value = most ranking impact)
- Many LSI keywords are common phrases that may already appear naturally
- Add missing keywords in body text, not just headings
- Some LSI keywords overlap with entities — count these once, benefit twice
After applying LSI changes, save the updated draft.
### Step 9 — Structure & Density Check
Check the overall structure against Cora targets:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet structure --format json
cd {skill_dir}/scripts && uv run --with openpyxl python cora_parser.py "{cora_xlsx_path}" --sheet densities --format json
```
Verify and adjust:
- **Heading counts:** Compare H1, H2, H3, H4 counts against Page 1 Average targets. Add or consolidate headings as needed.
- **Entities per heading level:** Check that each heading level has enough entity mentions vs. the Structure sheet targets.
- **Variations in headings:** Ensure keyword variations appear in H2/H3 headings at target levels.
- **Density targets:** Check variation density, entity density, and LSI density against the Strategic Overview percentages.
- **Word count:** Compare against the cluster target (NOT the raw average). If below target, identify which sections could be expanded.
**Important density note:** Adding content to meet one target changes the denominator for ALL density calculations. After significant word count changes, re-check densities. Usually 1-2 optimization passes are sufficient.
### Step 10 — Keyword Density Check (Optional)
If a quick keyword density check is useful:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python seo_optimizer.py "{draft_path}" --cora-xlsx "{cora_xlsx_path}"
```
Key rules:
- Exact match keyword density: 2% minimum, no upper limit
- Variations capture exact match — hitting variation density targets covers exact match
- Do NOT flag keyword stuffing. There is no practical upper limit that hurts rankings.
### Step 11 — Meta Title, Meta Description, and URL Slug
Generate meta tags and add them as an HTML comment block at the top of the draft file.
**Meta title format:** Pack keyword variations into a pipe-separated title tag. Google reads far more than the ~60 characters it displays — a long title tag with variations gives the page more surface area across related searches. You can go up to 500 characters but do not have to.
Format: `Exact Search Term | Variation 1 | Variation 2 | ... | Company Name`
Use the keyword variations from the Cora report. Only include variations that have a page1_avg > 0 (competitors actually use them). Put the highest-value variations first.
**Meta description:** Write a keyword-rich summary (~350-500 characters) that hits the primary keyword, key variations, materials, sizes, and company name. This is not just a copy of the intro paragraph — it should be independently optimized.
**URL slug:** Short, keyword-focused. Example: `/custom-spun-hemispheres`
Add to the top of the draft file:
```html
<!--
META TITLE: Exact Search Term | Variation 1 | Variation 2 | Company Name
META DESCRIPTION: Keyword-rich summary here.
URL SLUG: /url-slug-here
-->
```
### Step 12 — Image & Diagram Placement
Read through the draft md file and identify where visuals would enhance the content:
For each recommendation, specify:
- **Location:** After which heading or paragraph
- **Type:** Photo, diagram, chart, infographic, screenshot, illustration
- **Description:** What the visual should show
- **Rationale:** Why it adds value at that point (breaks up text, illustrates a process, makes data tangible, etc.)
Common placement triggers:
- Sections describing a process or workflow (diagram)
- Sections with comparative data (chart or table)
- Long text-only stretches (break up with a relevant image)
- Technical concepts that benefit from visual explanation (diagram)
- Before/after scenarios (side-by-side images)
### Step 13 — HUMAN REVIEW (STOP AND WAIT)
**Present the final draft, optimization summary, and image suggestions to the user:**
> "Here's the optimized draft. Summary of changes:
> - [X] entities added across [Y] sections
> - [X] LSI keywords incorporated
> - Word count: [current] (target: [target])
> - Variation density: [current]% (target: [target]%)
> - Entity density: [current]% (target: [target]%)
> - [X] image/diagram placements suggested
>
> Review the draft. What needs adjusting?"
**Do NOT finalize until the user approves.**
### Step 14 — HTML Export
After the user approves the draft, convert the markdown to plain HTML for WordPress. Save as `./working/draft.html` (or `draft_normal.html`, `draft_storybrand.html` if multiple versions exist).
Rules:
- **Plain HTML only** — no classes, no divs, no wrappers. Just `<h2>`, `<h3>`, `<p>`, `<ul>/<li>`, and `<strong>` tags.
- **Omit the H1** — WordPress sets the page title separately. Do not include an `<h1>` tag in the HTML.
- **Keep the meta comment block** at the top (META TITLE, META DESCRIPTION, URL SLUG).
- **Keep the FOQ comment markers** (`<!-- FOQ SECTION START -->` / `<!-- FOQ SECTION END -->`) so the user can identify that section for special styling.
- The user pastes this HTML into WordPress Gutenberg's Code Editor view, where it maps directly to blocks.
---
## Phase 3: Quick Test Block
A standalone workflow for testing whether adding entities, keywords, and headings moves rankings before investing in full content optimization. The output is a minimal text block placed in a hidden div on the page for A/B testing.
**Key principle:** The LLM handles all intelligence — filtering entities for topical relevance, writing headings, creating body templates. Python scripts handle all math — slot filling, density tracking, stop conditions, validation. There are NO per-entity mention targets — only aggregate density percentages and distinct entity counts.
### When to Use
User says "test block," "hidden div," "quick test," "test the entities," or similar. This is NOT part of Phase 2 — it is an independent workflow. Requirements: a Cora report and existing content (URL or file).
### Step T1 — Load Inputs
- Pull existing content via BS4 scraper if a URL is provided, or read from file if a path is given.
- Save existing content to `{cwd}/working/existing_content.md` if fetched from URL.
```bash
cd {skill_dir}/scripts && uv run --with requests,beautifulsoup4 python competitor_scraper.py "{url}" --output-dir {cwd}/working/
```
Then rename the output file to `{cwd}/working/existing_content.md`.
### Step T2 — Run Prep Script (Programmatic)
Run `test_block_prep.py` to extract all deficit data:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_prep.py "{content_path}" "{cora_xlsx_path}" --format json -o {cwd}/working/prep_data.json
```
This outputs structured JSON with:
- Word count vs target + deficit
- Distinct entity count vs target + deficit + list of missing 0-count entities
- Variation density % vs target (Cora row 46)
- Entity density % vs target (Cora row 47)
- LSI density % vs target (Cora row 48)
- Heading structure deficits (H2, H3 counts; entities/variations in headings)
- **Template instructions**: how many templates to generate, how many slots per template, target word count
Review the prep output. All numbers come from deterministic script analysis — no estimation.
### Step T3 — Filter Entities for Topical Relevance (LLM Step)
Read the `missing_entities` list from `{cwd}/working/prep_data.json`. This list contains ALL entities with 0 mentions on the existing page, sorted by Cora relevance score. **Many of these will be noise** — navigation terms, competitor names, unrelated concepts that happen to appear on ranking pages.
Review every entity and keep ONLY those that are topically relevant to the page's subject matter. Ask: "Would a subject matter expert writing about [page topic] naturally mention this term?"
**Remove:**
- Competitor company names and brands
- People (athletes, historical figures, etc.)
- Web furniture (blog, menu, privacy, FAQ, social media platforms)
- Geographic entities unrelated to the topic
- Software, media, organisms, and other off-topic typed entities
- Generic terms that only appear due to page chrome (calculator, glossary, children, etc.)
**Keep:**
- Terms directly related to the product/service/topic
- Materials, processes, components, and industry terms
- Related applications and industries where the product is used
- Technical specifications and engineering concepts
Save the filtered entity names to `{cwd}/working/filtered_entities.txt`, one entity per line, ordered from most to least relevant.
### Step T4 — Generate Headings and Body Templates (LLM Creative Step)
This step has two parts. Read the prep JSON for the numbers you need:
- `headings.h2.deficit`: how many H2 headings to generate
- `headings.h3.deficit`: how many H3 headings to generate
- `headings.entities_in_headings.deficit`: how many entity mentions needed across all headings
- `template_instructions.num_templates`: how many body templates to create
- `template_instructions.slots_per_sentence`: how many `{N}` slots per body template
- `template_instructions.avg_words_per_template`: target words per template (~15)
**Part 1 — Write headings:**
Using the filtered entity list from T3 and your understanding of the page topic, write topically relevant H2 and H3 headings. These are final text — NOT templates, no `{N}` slots. The headings should:
- Read like real section headings a subject matter expert would write
- Naturally incorporate entities from the filtered list (aim to hit the entities_in_headings deficit)
- Be relevant to the page's topic and the types of content that would appear under them
**Part 2 — Write body templates:**
Generate body sentence templates with numbered placeholder slots. Follow the numbers from `template_instructions`:
- Create `num_templates` templates
- Each template gets `slots_per_sentence` numbered slots: `{1}`, `{2}`, `{3}`, etc. Slots MUST be numbered — the generator regex matches `{1}`, `{2}`, NOT `{N}`.
- Templates must be topically relevant to the page's subject matter
- Templates should be grammatically coherent but brevity wins over polish
- Do NOT try to specify which entities go in which slot — the generator script handles that
Save everything to `{cwd}/working/templates.txt`, one per line. Headings are prefixed with `H2:` or `H3:`, body templates are plain text with `{N}` slots.
Example for an expansion joints page:
```
H2: Bellows Expansion Joints for Industrial Piping Systems
H2: Metal and Rubber Expansion Joint Applications in Water Treatment
H3: Gasket and Flange Connections for Expansion Joints
{1} and {2} are critical components used to absorb thermal movement and reduce stress in piping systems.
{1} provide reliable performance in demanding {2} environments where thermal cycling is constant.
```
### Step T5 — Run Generator Script (Programmatic)
Run `test_block_generator.py` to fill body template slots and assemble the test block. The script requires the LLM-curated entity list from T3:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_generator.py {cwd}/working/templates.txt {cwd}/working/prep_data.json "{cora_xlsx_path}" --entities-file {cwd}/working/filtered_entities.txt --output-dir {cwd}/working/ --min-sentences 5
```
The script:
1. Loads the LLM-curated entity list — uses ONLY these entities for slot filling (no script-level filtering)
2. Builds a term queue: filtered entities first, then keyword variations
3. Inserts pre-written headings as-is (no slot filling on heading lines)
4. Fills body template slots, rotating through the term queue (no duplicates within a sentence)
5. Tracks projected densities: (baseline_mentions + new_mentions) / (baseline_words + new_words)
6. Stops when: all density targets met, distinct entity deficit closed, word count deficit closed, AND minimum sentence count reached
Output files:
- `{cwd}/working/test_block.md` — Markdown version
- `{cwd}/working/test_block.html` — Plain HTML version
- `{cwd}/working/test_block_stats.json` — Generation stats (mentions added, entities introduced, projected densities)
### Step T6 — Rewrite Body Sentences for Readability (LLM Step — use Haiku)
The generator produces grammatically rough sentences because entities get slotted into positions where they don't naturally fit. This step rewrites each body sentence to read naturally while preserving entity strings exactly.
**Use Haiku for this step** — it's fast and cheap enough to handle sentence-by-sentence rewrites.
Read `{cwd}/working/test_block.md`. For each body sentence (NOT headings — leave all H2/H3 lines exactly as they are):
1. Identify which entity terms from `{cwd}/working/filtered_entities.txt` appear in the sentence
2. Rewrite the sentence so it is grammatically correct and reads naturally
3. **Preserve every entity string exactly** — same spelling, same case. Do not paraphrase, hyphenate, abbreviate, or pluralize entity terms. "stainless steel" must remain "stainless steel", not "stainless-steel" or "SS".
4. Keep the sentence under 20 words
5. The rewrite should be topically relevant to the page subject
Reassemble the test block with:
- Same `<!-- HIDDEN TEST BLOCK START -->` / `<!-- HIDDEN TEST BLOCK END -->` markers
- Same headings in the same positions
- Rewritten body sentences grouped into paragraphs (4 sentences per paragraph)
Overwrite both files:
- `{cwd}/working/test_block.md` (markdown format)
- `{cwd}/working/test_block.html` (HTML format with `<h2>`, `<p>` tags)
### Step T7 — Run Validation Script (Programmatic)
Run `test_block_validate.py` for a deterministic before/after comparison:
```bash
cd {skill_dir}/scripts && uv run --with openpyxl python test_block_validate.py "{content_path}" {cwd}/working/test_block.md "{cora_xlsx_path}" --format json -o {cwd}/working/validation_report.json
```
This produces a report showing every metric before and after, with targets and status:
- Word count, distinct entities, entity density %, variation density %, LSI density %
- Heading counts (H2, H3), entities/variations in headings
- List of all new 0->1 entities introduced
- All numbers are from the same counting code — no mixing of data sources
Present the validation report to the user. Flag any metric that dropped below target after the test block was added.
---
## Optimization Rules
These override any data from the Cora report:
| Rule | Detail |
|------|--------|
| H1 count | Exactly 1, always |
| H2, H3 | Primary optimization targets — focus entity/variation additions here |
| H4 | Low priority — only add if most competitors have them |
| H5, H6 | Ignore completely |
| Word count | Target the nearest competitive cluster, not the raw average. Up to ~1,500 words is always acceptable even if the target is lower. |
| Exact match density | 2% minimum, no upper limit |
| Keyword stuffing | Do NOT flag or warn about keyword stuffing |
| Variations include exact match | Optimizing variation density inherently covers exact match |
| Density is interdependent | Adding content changes ALL density calculations — re-check after big changes |
| Optimization passes | 1-2 passes is typically sufficient |
| Competitor names | NEVER use competitor company names as entities or LSI keywords. Do not mention competitors by name in content. |
| Measurement entities | Ignore measurements (dimensions, tolerances, etc.) as entities — skip these in entity optimization |
| Organization entities | Organizations like ISO, ANSI, ASTM are fine — keep these as entities |
| Entity correlation filter | Only entities with Best of Both <= -0.19 are included. Best of Both is the lower of Spearman's or Pearson's correlation to ranking position (1=top, 100=bottom), so more negative = stronger ranking signal. This filter is applied in `cora_parser.py` and affects all downstream consumers. To disable, set `entity_correlation_threshold` to `None` in `OPTIMIZATION_RULES`. Added 2026-03-20 — revert if entity coverage feels too thin. |
---
## Scripts Reference
All scripts are in `{skill_dir}/scripts/`. Run them with `uv run --with openpyxl python` (or `--with requests,beautifulsoup4` for the scraper).
### cora_parser.py
Foundation module. Reads a Cora XLSX and extracts structured data.
```
uv run --with openpyxl python cora_parser.py <xlsx_path> [--sheet SHEET] [--format json|text]
```
Sheets: `summary`, `entities`, `lsi`, `variations`, `structure`, `densities`, `targets`, `wordcount`, `results`, `tunings`, `all`
### entity_optimizer.py
Counts entities in a draft against Cora targets, recommends additions sorted by (relevance x deficit).
```
uv run --with openpyxl python entity_optimizer.py <draft_path> <cora_xlsx_path> [--format json|text] [--top-n 30]
```
### lsi_optimizer.py
Counts LSI keywords in a draft against Cora targets, recommends additions sorted by (|correlation| x deficit).
```
uv run --with openpyxl python lsi_optimizer.py <draft_path> <cora_xlsx_path> [--format json|text] [--min-correlation 0.2] [--top-n 50]
```
### seo_optimizer.py
Keyword density, structure, and readability checks. Optional Cora integration.
```
uv run --with openpyxl python seo_optimizer.py <draft_path> [--keyword <kw>] [--cora-xlsx <path>] [--format json|text]
```
### competitor_scraper.py
Utility for bulk-fetching URLs when the user provides a list.
```
uv run --with requests,beautifulsoup4 python competitor_scraper.py <url1> <url2> ... [--output-dir ./working/competitor_content/]
```
### test_block_prep.py
Extracts all deficit data from existing content + Cora XLSX. Outputs structured JSON with word count, entity/variation/LSI density deficits, heading deficits, missing entities list, and calculated template instructions (num_templates, slots_per_sentence).
```
uv run --with openpyxl python test_block_prep.py <content_path> <cora_xlsx_path> [--format json|text] [-o PATH]
```
### test_block_generator.py
Fills body template slots with entities from an LLM-curated entity list. Inserts pre-written headings as-is (no slot filling). Tracks aggregate densities in real-time, stops when all targets are met. Outputs test_block.md, test_block.html, and test_block_stats.json.
```
uv run --with openpyxl python test_block_generator.py <templates_path> <prep_json_path> <cora_xlsx_path> --entities-file <path> [--output-dir DIR] [--min-sentences N]
```
### test_block_validate.py
Deterministic before/after comparison. Runs the same counting logic on existing content alone vs existing content + test block. Shows every metric with before, after, target, and status.
```
uv run --with openpyxl python test_block_validate.py <content_path> <test_block_path> <cora_xlsx_path> [--format json|text] [-o PATH]
```
---
## Reference Files
- `references/content_frameworks.md` — Article templates (how-to, listicle, comparison, case study, thought leadership), persuasion frameworks (AIDA, PAS), introduction and conclusion patterns.
- `references/brand_guidelines.md` — Voice archetypes, writing principles, tone spectrums, language preferences, pre-publication checklist.
---
## Working Directory
**CRITICAL: All output files MUST be written to `{cwd}/working/` — the `working/` subfolder inside the user's current project directory (where Claude Code was launched). NEVER write files to the skill directory, scripts directory, or any location outside the project folder. When running scripts, always use absolute paths for output flags (`-o`, `--output-dir`) pointing to `{cwd}/working/`.**
All intermediate files go in `{cwd}/working/` (the user's project directory):
- `working/research_summary.md` — Research output from Step 2
- `working/outline.md` — Outline from Step 3
- `working/draft.md` — Content draft (updated in place during optimization)
- `working/competitor_content/` — Scraped competitor text files (if URLs were fetched)
- `working/existing_content.md` — BS4-scraped existing page content (Phase 3)
- `working/prep_data.json` — Deficit analysis output from test_block_prep.py (Phase 3)
- `working/filtered_entities.txt` — LLM-curated entity list, one per line (Phase 3, Step T3)
- `working/templates.txt` — Pre-written headings + body templates with numbered slots (Phase 3, Step T4)
- `working/test_block.md` — Quick test block in markdown (Phase 3)
- `working/test_block.html` — Quick test block in plain HTML (Phase 3)
- `working/test_block_stats.json` — Generation stats: mentions added, entities introduced, projected densities (Phase 3)
- `working/validation_report.json` — Before/after comparison from test_block_validate.py (Phase 3)

View File

@ -93,8 +93,6 @@ uv add --group test <package>
| `identity/SOUL.md` | Agent personality | | `identity/SOUL.md` | Agent personality |
| `identity/USER.md` | User profile | | `identity/USER.md` | User profile |
| `skills/` | Markdown skill files with YAML frontmatter | | `skills/` | Markdown skill files with YAML frontmatter |
| `scripts/create_clickup_task.py` | CLI script to create ClickUp tasks |
| `docs/clickup-task-creation.md` | Task creation conventions, per-type fields, and defaults |
## Conventions ## Conventions
@ -176,11 +174,9 @@ skill_map:
"Press Release": "Press Release":
tool: "write_press_releases" tool: "write_press_releases"
auto_execute: true auto_execute: true
required_fields: [topic, company_name, target_url]
field_mapping: field_mapping:
topic: "PR Topic" # ClickUp custom field for PR topic/keyword topic: "task_name" # uses ClickUp task name
company_name: "Customer" # looks up "Customer" custom field company_name: "Customer" # looks up "Customer" custom field
target_url: "IMSURL" # target money-site URL (required)
``` ```
Task lifecycle: `to do` → discovered → approved/awaiting_approval → executing → completed/failed (+ attachments uploaded) Task lifecycle: `to do` → discovered → approved/awaiting_approval → executing → completed/failed (+ attachments uploaded)

View File

@ -1,7 +1,6 @@
"""Entry point: python -m cheddahbot""" """Entry point: python -m cheddahbot"""
import logging import logging
from logging.handlers import RotatingFileHandler
from pathlib import Path from pathlib import Path
from .agent import Agent from .agent import Agent
@ -16,22 +15,6 @@ logging.basicConfig(
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s", format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S", datefmt="%H:%M:%S",
) )
# All levels to rotating log file (DEBUG+)
_log_dir = Path(__file__).resolve().parent.parent / "logs"
_log_dir.mkdir(exist_ok=True)
_file_handler = RotatingFileHandler(
_log_dir / "cheddahbot.log", maxBytes=5 * 1024 * 1024, backupCount=5
)
_file_handler.setLevel(logging.DEBUG)
_file_handler.setFormatter(
logging.Formatter("%(asctime)s [%(name)s] %(levelname)s: %(message)s")
)
logging.getLogger().addHandler(_file_handler)
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("httpcore").setLevel(logging.WARNING)
log = logging.getLogger("cheddahbot") log = logging.getLogger("cheddahbot")
@ -138,41 +121,6 @@ def main():
except Exception as e: except Exception as e:
log.warning("Notification bus not available: %s", e) log.warning("Notification bus not available: %s", e)
# ntfy.sh push notifications
if notification_bus and config.ntfy.enabled:
try:
import os
from .ntfy import NtfyChannel, NtfyNotifier
ntfy_channels = []
for ch_cfg in config.ntfy.channels:
topic = os.getenv(ch_cfg.topic_env_var, "")
if topic:
ntfy_channels.append(
NtfyChannel(
name=ch_cfg.name,
server=ch_cfg.server,
topic=topic,
categories=ch_cfg.categories,
include_patterns=ch_cfg.include_patterns,
exclude_patterns=ch_cfg.exclude_patterns,
priority=ch_cfg.priority,
tags=ch_cfg.tags,
)
)
else:
log.warning(
"ntfy channel '%s' skipped — env var %s not set",
ch_cfg.name, ch_cfg.topic_env_var,
)
notifier = NtfyNotifier(ntfy_channels)
if notifier.enabled:
notification_bus.subscribe("ntfy", notifier.notify)
log.info("ntfy notifier subscribed to notification bus")
except Exception as e:
log.warning("ntfy notifier not available: %s", e)
# Scheduler (uses default agent) # Scheduler (uses default agent)
scheduler = None scheduler = None
try: try:
@ -181,14 +129,22 @@ def main():
log.info("Starting scheduler...") log.info("Starting scheduler...")
scheduler = Scheduler(config, db, default_agent, notification_bus=notification_bus) scheduler = Scheduler(config, db, default_agent, notification_bus=notification_bus)
scheduler.start() scheduler.start()
# Inject scheduler into tool context so get_active_tasks can read it
if tools:
tools.scheduler = scheduler
except Exception as e: except Exception as e:
log.warning("Scheduler not available: %s", e) log.warning("Scheduler not available: %s", e)
log.info("Launching Gradio UI on %s:%s...", config.host, config.port)
blocks = create_ui(
registry, config, default_llm, notification_bus=notification_bus, scheduler=scheduler
)
# Build a parent FastAPI app so we can mount the dashboard alongside Gradio.
# Inserting routes into blocks.app before launch() doesn't work because
# launch()/mount_gradio_app() replaces the internal App instance.
import gradio as gr
import uvicorn import uvicorn
from fastapi import FastAPI from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from starlette.staticfiles import StaticFiles
fastapi_app = FastAPI() fastapi_app = FastAPI()
@ -199,33 +155,24 @@ def main():
fastapi_app.include_router(api_router) fastapi_app.include_router(api_router)
log.info("API router mounted at /api/") log.info("API router mounted at /api/")
# Mount new HTMX web UI (chat at /, dashboard at /dashboard) # Mount the dashboard as static files (must come before Gradio's catch-all)
from .web import mount_web_app dashboard_dir = Path(__file__).resolve().parent.parent / "dashboard"
if dashboard_dir.is_dir():
# Redirect /dashboard (no trailing slash) → /dashboard/
@fastapi_app.get("/dashboard")
async def _dashboard_redirect():
return RedirectResponse(url="/dashboard/")
mount_web_app( fastapi_app.mount(
fastapi_app, "/dashboard",
registry, StaticFiles(directory=str(dashboard_dir), html=True),
config, name="dashboard",
default_llm,
notification_bus=notification_bus,
scheduler=scheduler,
db=db,
)
# Mount Gradio at /old for transition period
try:
import gradio as gr
log.info("Mounting Gradio UI at /old...")
blocks = create_ui(
registry, config, default_llm, notification_bus=notification_bus, scheduler=scheduler
) )
gr.mount_gradio_app(fastapi_app, blocks, path="/old", pwa=False, show_error=True) log.info("Dashboard mounted at /dashboard/ (serving %s)", dashboard_dir)
log.info("Gradio UI available at /old")
except Exception as e: # Mount Gradio at the root
log.warning("Gradio UI not available: %s", e) gr.mount_gradio_app(fastapi_app, blocks, path="/", pwa=True, show_error=True)
log.info("Launching web UI on %s:%s...", config.host, config.port)
uvicorn.run(fastapi_app, host=config.host, port=config.port) uvicorn.run(fastapi_app, host=config.host, port=config.port)

View File

@ -369,7 +369,6 @@ class Agent:
system_context: str = "", system_context: str = "",
tools: str = "", tools: str = "",
model: str = "", model: str = "",
skip_permissions: bool = False,
) -> str: ) -> str:
"""Execute a task using the execution brain (Claude Code CLI). """Execute a task using the execution brain (Claude Code CLI).
@ -379,25 +378,19 @@ class Agent:
Args: Args:
tools: Override Claude Code tool list (e.g. "Bash,Read,WebSearch"). tools: Override Claude Code tool list (e.g. "Bash,Read,WebSearch").
model: Override the CLI model (e.g. "claude-sonnet-4.5"). model: Override the CLI model (e.g. "claude-sonnet-4.5").
skip_permissions: If True, run CLI with --dangerously-skip-permissions.
""" """
log.info("Execution brain task: %s", prompt[:100]) log.info("Execution brain task: %s", prompt[:100])
kwargs: dict = { kwargs: dict = {"system_prompt": system_context}
"system_prompt": system_context,
"timeout": self.config.timeouts.execution_brain,
}
if tools: if tools:
kwargs["tools"] = tools kwargs["tools"] = tools
if model: if model:
kwargs["model"] = model kwargs["model"] = model
if skip_permissions:
kwargs["skip_permissions"] = True
result = self.llm.execute(prompt, **kwargs) result = self.llm.execute(prompt, **kwargs)
# Log to daily memory # Log to daily memory
if self._memory: if self._memory:
try: try:
self._memory.log_daily(f"[Execution] {prompt[:200]}\n-> {result[:500]}") self._memory.log_daily(f"[Execution] {prompt[:200]}\n {result[:500]}")
except Exception as e: except Exception as e:
log.warning("Failed to log execution to memory: %s", e) log.warning("Failed to log execution to memory: %s", e)

View File

@ -125,7 +125,7 @@ async def get_tasks_by_company():
data = await get_tasks() data = await get_tasks()
by_company: dict[str, list] = {} by_company: dict[str, list] = {}
for task in data.get("tasks", []): for task in data.get("tasks", []):
company = task["custom_fields"].get("Client") or "Unassigned" company = task["custom_fields"].get("Customer") or "Unassigned"
by_company.setdefault(company, []).append(task) by_company.setdefault(company, []).append(task)
# Sort companies by task count descending # Sort companies by task count descending
@ -238,7 +238,7 @@ async def get_link_building_tasks():
in_progress_not_started.append(t) in_progress_not_started.append(t)
by_company: dict[str, list] = {} by_company: dict[str, list] = {}
for task in active_lb: for task in active_lb:
company = task["custom_fields"].get("Client") or "Unassigned" company = task["custom_fields"].get("Customer") or "Unassigned"
by_company.setdefault(company, []).append(task) by_company.setdefault(company, []).append(task)
result = { result = {
@ -320,7 +320,7 @@ async def get_need_cora_tasks():
if kw_lower not in by_keyword: if kw_lower not in by_keyword:
by_keyword[kw_lower] = { by_keyword[kw_lower] = {
"keyword": kw, "keyword": kw,
"company": t["custom_fields"].get("Client") or "Unassigned", "company": t["custom_fields"].get("Customer") or "Unassigned",
"due_date": t.get("due_date"), "due_date": t.get("due_date"),
"tasks": [], "tasks": [],
} }
@ -367,7 +367,7 @@ async def get_press_release_tasks():
by_company: dict[str, list] = {} by_company: dict[str, list] = {}
for task in pr_tasks: for task in pr_tasks:
company = task["custom_fields"].get("Client") or "Unassigned" company = task["custom_fields"].get("Customer") or "Unassigned"
by_company.setdefault(company, []).append(task) by_company.setdefault(company, []).append(task)
return { return {
@ -562,15 +562,6 @@ async def force_loop_run():
return {"status": "ok", "message": "Force pulse sent to heartbeat and poll loops"} return {"status": "ok", "message": "Force pulse sent to heartbeat and poll loops"}
@router.post("/system/briefing/force")
async def force_briefing():
"""Force the morning briefing to send now (won't block tomorrow's)."""
if not _scheduler:
return {"status": "error", "message": "Scheduler not available"}
_scheduler.force_briefing()
return {"status": "ok", "message": "Briefing force-triggered"}
@router.post("/cache/clear") @router.post("/cache/clear")
async def clear_cache(): async def clear_cache():
"""Clear the ClickUp data cache.""" """Clear the ClickUp data cache."""

View File

@ -31,7 +31,6 @@ class ClickUpTask:
list_name: str = "" list_name: str = ""
tags: list[str] = field(default_factory=list) tags: list[str] = field(default_factory=list)
date_done: str = "" date_done: str = ""
date_updated: str = ""
@classmethod @classmethod
def from_api(cls, data: dict, task_type_field_name: str = "Task Type") -> ClickUpTask: def from_api(cls, data: dict, task_type_field_name: str = "Task Type") -> ClickUpTask:
@ -68,9 +67,6 @@ class ClickUpTask:
raw_done = data.get("date_done") or data.get("date_closed") raw_done = data.get("date_done") or data.get("date_closed")
date_done = str(raw_done) if raw_done else "" date_done = str(raw_done) if raw_done else ""
raw_updated = data.get("date_updated")
date_updated = str(raw_updated) if raw_updated else ""
return cls( return cls(
id=data["id"], id=data["id"],
name=data.get("name", ""), name=data.get("name", ""),
@ -84,7 +80,6 @@ class ClickUpTask:
list_name=data.get("list", {}).get("name", ""), list_name=data.get("list", {}).get("name", ""),
tags=tags, tags=tags,
date_done=date_done, date_done=date_done,
date_updated=date_updated,
) )
@ -419,176 +414,6 @@ class ClickUpClient:
log.info("Created custom field '%s' (%s) on list %s", name, field_type, list_id) log.info("Created custom field '%s' (%s) on list %s", name, field_type, list_id)
return result return result
def get_task(self, task_id: str) -> ClickUpTask:
"""Fetch a single task by ID."""
resp = self._client.get(f"/task/{task_id}")
resp.raise_for_status()
return ClickUpTask.from_api(resp.json(), self._task_type_field_name)
def set_custom_field_by_name(
self, task_id: str, field_name: str, value: Any
) -> bool:
"""Set a custom field by its human-readable name.
Looks up the field ID from the task's list, then sets the value.
Falls back gracefully if the field doesn't exist.
"""
try:
task_data = self._client.get(f"/task/{task_id}").json()
list_id = task_data.get("list", {}).get("id", "")
if not list_id:
log.warning("Could not determine list_id for task %s", task_id)
return False
fields = self.get_custom_fields(list_id)
field_id = None
for f in fields:
if f.get("name") == field_name:
field_id = f["id"]
break
if not field_id:
log.warning("Field '%s' not found in list %s", field_name, list_id)
return False
return self.set_custom_field_value(task_id, field_id, value)
except Exception as e:
log.error("Failed to set field '%s' on task %s: %s", field_name, task_id, e)
return False
def set_custom_field_smart(
self, task_id: str, list_id: str, field_name: str, value: str
) -> bool:
"""Set a custom field by name, auto-resolving dropdown option UUIDs.
For dropdown fields, *value* is matched against option names
(case-insensitive). For all other field types, *value* is passed through.
"""
try:
fields = self.get_custom_fields(list_id)
target = None
for f in fields:
if f.get("name") == field_name:
target = f
break
if not target:
log.warning("Field '%s' not found in list %s", field_name, list_id)
return False
field_id = target["id"]
resolved = value
if target.get("type") == "drop_down":
options = target.get("type_config", {}).get("options", [])
for opt in options:
if opt.get("name", "").lower() == value.lower():
resolved = opt["id"]
break
else:
log.warning(
"Dropdown option '%s' not found for field '%s'",
value,
field_name,
)
return False
return self.set_custom_field_value(task_id, field_id, resolved)
except Exception as e:
log.error(
"Failed to set field '%s' on task %s: %s", field_name, task_id, e
)
return False
def get_custom_field_by_name(self, task_id: str, field_name: str) -> Any:
"""Read a custom field value from a task by field name.
Fetches the task and looks up the field value from custom_fields.
Returns None if not found.
"""
try:
task = self.get_task(task_id)
return task.custom_fields.get(field_name)
except Exception as e:
log.warning("Failed to read field '%s' from task %s: %s", field_name, task_id, e)
return None
def create_task(
self,
list_id: str,
name: str,
description: str = "",
status: str = "to do",
due_date: int | None = None,
tags: list[str] | None = None,
custom_fields: list[dict] | None = None,
priority: int | None = None,
assignees: list[int] | None = None,
time_estimate: int | None = None,
) -> dict:
"""Create a new task in a ClickUp list.
Args:
list_id: The list to create the task in.
name: Task name.
description: Task description (markdown supported).
status: Initial status (default "to do").
due_date: Due date as Unix timestamp in milliseconds.
tags: List of tag names to apply.
custom_fields: List of custom field dicts ({"id": ..., "value": ...}).
priority: 1=Urgent, 2=High, 3=Normal, 4=Low.
assignees: List of ClickUp user IDs.
time_estimate: Time estimate in milliseconds.
Returns:
API response dict containing task id, url, etc.
"""
payload: dict[str, Any] = {"name": name, "status": status}
if description:
payload["description"] = description
if due_date is not None:
payload["due_date"] = due_date
if tags:
payload["tags"] = tags
if custom_fields:
payload["custom_fields"] = custom_fields
if priority is not None:
payload["priority"] = priority
if assignees:
payload["assignees"] = assignees
if time_estimate is not None:
payload["time_estimate"] = time_estimate
def _call():
resp = self._client.post(f"/list/{list_id}/task", json=payload)
resp.raise_for_status()
return resp.json()
result = self._retry(_call)
log.info("Created task '%s' in list %s (id: %s)", name, list_id, result.get("id"))
return result
def find_list_in_folder(
self, space_id: str, folder_name: str, list_name: str = "Overall"
) -> str | None:
"""Find a list within a named folder in a space.
Args:
space_id: ClickUp space ID.
folder_name: Folder name to match (case-insensitive).
list_name: List name within the folder (default "Overall").
Returns:
The list_id if found, or None.
"""
folders = self.get_folders(space_id)
for folder in folders:
if folder["name"].lower() == folder_name.lower():
for lst in folder["lists"]:
if lst["name"].lower() == list_name.lower():
return lst["id"]
return None
def discover_field_filter(self, list_id: str, field_name: str) -> dict[str, Any] | None: def discover_field_filter(self, list_id: str, field_name: str) -> dict[str, Any] | None:
"""Discover a custom field's UUID and dropdown option map. """Discover a custom field's UUID and dropdown option map.

View File

@ -43,13 +43,11 @@ class ClickUpConfig:
poll_interval_minutes: int = 20 poll_interval_minutes: int = 20
poll_statuses: list[str] = field(default_factory=lambda: ["to do"]) poll_statuses: list[str] = field(default_factory=lambda: ["to do"])
review_status: str = "internal review" review_status: str = "internal review"
pr_review_status: str = "pr needs review"
in_progress_status: str = "in progress" in_progress_status: str = "in progress"
automation_status: str = "automation underway" automation_status: str = "automation underway"
error_status: str = "error" error_status: str = "error"
task_type_field_name: str = "Work Category" task_type_field_name: str = "Work Category"
default_auto_execute: bool = False default_auto_execute: bool = False
poll_task_types: list[str] = field(default_factory=list)
skill_map: dict = field(default_factory=dict) skill_map: dict = field(default_factory=dict)
enabled: bool = False enabled: bool = False
@ -89,7 +87,6 @@ class AutoCoraConfig:
cora_categories: list[str] = field( cora_categories: list[str] = field(
default_factory=lambda: ["Content Creation", "On Page Optimization", "Link Building"] default_factory=lambda: ["Content Creation", "On Page Optimization", "Link Building"]
) )
cora_human_inbox: str = "" # e.g. "Z:/Cora-For-Human"
@dataclass @dataclass
@ -98,39 +95,6 @@ class ApiBudgetConfig:
alert_threshold: float = 0.8 # alert at 80% of limit alert_threshold: float = 0.8 # alert at 80% of limit
@dataclass
class TimeoutConfig:
execution_brain: int = 2700 # 45 minutes
blm: int = 1800 # 30 minutes
@dataclass
class ContentConfig:
cora_inbox: str = "" # e.g. "Z:/content-cora-inbox"
outline_dir: str = "" # e.g. "Z:/content-outlines"
company_capabilities_default: str = (
"All certifications and licenses need to be verified on the company's website."
)
@dataclass
class NtfyChannelConfig:
name: str = ""
topic_env_var: str = "" # env var name holding the topic string
server: str = "https://ntfy.sh"
categories: list[str] = field(default_factory=list)
include_patterns: list[str] = field(default_factory=list)
exclude_patterns: list[str] = field(default_factory=list)
priority: str = "high" # min / low / default / high / urgent
tags: str = "" # comma-separated emoji shortcodes
@dataclass
class NtfyConfig:
enabled: bool = False
channels: list[NtfyChannelConfig] = field(default_factory=list)
@dataclass @dataclass
class AgentConfig: class AgentConfig:
"""Per-agent configuration for multi-agent support.""" """Per-agent configuration for multi-agent support."""
@ -162,9 +126,6 @@ class Config:
link_building: LinkBuildingConfig = field(default_factory=LinkBuildingConfig) link_building: LinkBuildingConfig = field(default_factory=LinkBuildingConfig)
autocora: AutoCoraConfig = field(default_factory=AutoCoraConfig) autocora: AutoCoraConfig = field(default_factory=AutoCoraConfig)
api_budget: ApiBudgetConfig = field(default_factory=ApiBudgetConfig) api_budget: ApiBudgetConfig = field(default_factory=ApiBudgetConfig)
content: ContentConfig = field(default_factory=ContentConfig)
timeouts: TimeoutConfig = field(default_factory=TimeoutConfig)
ntfy: NtfyConfig = field(default_factory=NtfyConfig)
agents: list[AgentConfig] = field(default_factory=lambda: [AgentConfig()]) agents: list[AgentConfig] = field(default_factory=lambda: [AgentConfig()])
# Derived paths # Derived paths
@ -224,28 +185,6 @@ def load_config() -> Config:
for k, v in data["api_budget"].items(): for k, v in data["api_budget"].items():
if hasattr(cfg.api_budget, k): if hasattr(cfg.api_budget, k):
setattr(cfg.api_budget, k, v) setattr(cfg.api_budget, k, v)
if "content" in data and isinstance(data["content"], dict):
for k, v in data["content"].items():
if hasattr(cfg.content, k):
setattr(cfg.content, k, v)
if "timeouts" in data and isinstance(data["timeouts"], dict):
for k, v in data["timeouts"].items():
if hasattr(cfg.timeouts, k):
setattr(cfg.timeouts, k, int(v))
# ntfy push notifications
if "ntfy" in data and isinstance(data["ntfy"], dict):
ntfy_data = data["ntfy"]
cfg.ntfy.enabled = ntfy_data.get("enabled", False)
if "channels" in ntfy_data and isinstance(ntfy_data["channels"], list):
cfg.ntfy.channels = []
for ch_data in ntfy_data["channels"]:
if isinstance(ch_data, dict):
ch = NtfyChannelConfig()
for k, v in ch_data.items():
if hasattr(ch, k):
setattr(ch, k, v)
cfg.ntfy.channels.append(ch)
# Multi-agent configs # Multi-agent configs
if "agents" in data and isinstance(data["agents"], list): if "agents" in data and isinstance(data["agents"], list):
@ -299,12 +238,6 @@ def load_config() -> Config:
if blm_dir := os.getenv("BLM_DIR"): if blm_dir := os.getenv("BLM_DIR"):
cfg.link_building.blm_dir = blm_dir cfg.link_building.blm_dir = blm_dir
# Timeout env var overrides (seconds)
if t := os.getenv("CHEDDAH_TIMEOUT_EXECUTION_BRAIN"):
cfg.timeouts.execution_brain = int(t)
if t := os.getenv("CHEDDAH_TIMEOUT_BLM"):
cfg.timeouts.blm = int(t)
# Ensure data directories exist # Ensure data directories exist
cfg.data_dir.mkdir(parents=True, exist_ok=True) cfg.data_dir.mkdir(parents=True, exist_ok=True)
(cfg.data_dir / "uploads").mkdir(exist_ok=True) (cfg.data_dir / "uploads").mkdir(exist_ok=True)

View File

@ -156,8 +156,6 @@ class LLMAdapter:
working_dir: str | None = None, working_dir: str | None = None,
tools: str = "Bash,Read,Edit,Write,Glob,Grep", tools: str = "Bash,Read,Edit,Write,Glob,Grep",
model: str | None = None, model: str | None = None,
skip_permissions: bool = False,
timeout: int = 2700,
) -> str: ) -> str:
"""Execution brain: calls Claude Code CLI with full tool access. """Execution brain: calls Claude Code CLI with full tool access.
@ -167,9 +165,6 @@ class LLMAdapter:
Args: Args:
tools: Comma-separated Claude Code tool names (default: standard set). tools: Comma-separated Claude Code tool names (default: standard set).
model: Override the CLI model (e.g. "claude-sonnet-4.5"). model: Override the CLI model (e.g. "claude-sonnet-4.5").
skip_permissions: If True, append --dangerously-skip-permissions to
timeout: Max seconds to wait for CLI completion (default: 2700 / 45 min).
the CLI invocation (used for automated pipelines).
""" """
claude_bin = shutil.which("claude") claude_bin = shutil.which("claude")
if not claude_bin: if not claude_bin:
@ -193,8 +188,6 @@ class LLMAdapter:
cmd.extend(["--model", model]) cmd.extend(["--model", model])
if system_prompt: if system_prompt:
cmd.extend(["--system-prompt", system_prompt]) cmd.extend(["--system-prompt", system_prompt])
if skip_permissions:
cmd.append("--dangerously-skip-permissions")
log.debug("Execution brain cmd: %s", " ".join(cmd[:6]) + "...") log.debug("Execution brain cmd: %s", " ".join(cmd[:6]) + "...")
@ -220,11 +213,10 @@ class LLMAdapter:
) )
try: try:
stdout, stderr = proc.communicate(input=prompt, timeout=timeout) stdout, stderr = proc.communicate(input=prompt, timeout=300)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
proc.kill() proc.kill()
minutes = timeout // 60 return "Error: Claude Code execution timed out after 5 minutes."
return f"Error: Claude Code execution timed out after {minutes} minutes."
if proc.returncode != 0: if proc.returncode != 0:
return f"Execution error: {stderr or 'unknown error'}" return f"Execution error: {stderr or 'unknown error'}"
@ -363,14 +355,9 @@ class LLMAdapter:
except Exception as e: except Exception as e:
if not has_yielded and attempt < max_retries and _is_retryable_error(e): if not has_yielded and attempt < max_retries and _is_retryable_error(e):
wait = 2**attempt wait = 2 ** attempt
log.warning( log.warning("Retryable LLM error (attempt %d/%d), retrying in %ds: %s",
"Retryable LLM error (attempt %d/%d), retrying in %ds: %s", attempt + 1, max_retries + 1, wait, e)
attempt + 1,
max_retries + 1,
wait,
e,
)
time.sleep(wait) time.sleep(wait)
continue continue
yield {"type": "text", "content": _friendly_error(e, self.provider)} yield {"type": "text", "content": _friendly_error(e, self.provider)}

View File

@ -1,175 +0,0 @@
"""ntfy.sh push notification sender.
Subscribes to the NotificationBus and routes notifications to ntfy.sh
topics based on category and message-pattern matching.
"""
from __future__ import annotations
import hashlib
import logging
import re
import threading
from dataclasses import dataclass, field
from datetime import date
import httpx
log = logging.getLogger(__name__)
@dataclass
class NtfyChannel:
"""One ntfy topic with routing rules."""
name: str
server: str
topic: str
categories: list[str]
include_patterns: list[str] = field(default_factory=list)
exclude_patterns: list[str] = field(default_factory=list)
priority: str = "high"
tags: str = ""
def accepts(self, message: str, category: str) -> bool:
"""Return True if this channel should receive the notification."""
if category not in self.categories:
return False
if self.exclude_patterns:
for pat in self.exclude_patterns:
if re.search(pat, message, re.IGNORECASE):
return False
if self.include_patterns:
return any(
re.search(pat, message, re.IGNORECASE)
for pat in self.include_patterns
)
return True # no include_patterns = accept all matching categories
class NtfyNotifier:
"""Posts notifications to ntfy.sh topics."""
def __init__(
self,
channels: list[NtfyChannel],
*,
daily_cap: int = 200,
):
self._channels = [ch for ch in channels if ch.topic]
self._daily_cap = daily_cap
self._lock = threading.Lock()
# dedup: set of hash(channel.name + message) — persists for process lifetime
self._sent: set[str] = set()
# daily cap tracking
self._daily_count = 0
self._daily_date = ""
# 429 backoff: date string when rate-limited
self._rate_limited_until = ""
if self._channels:
log.info(
"ntfy notifier initialized with %d channel(s): %s",
len(self._channels),
", ".join(ch.name for ch in self._channels),
)
@property
def enabled(self) -> bool:
return bool(self._channels)
def _today(self) -> str:
return date.today().isoformat()
def _check_and_track(self, channel_name: str, message: str) -> bool:
"""Return True if this message should be sent. Updates internal state."""
today = self._today()
with self._lock:
# 429 backoff: skip all sends for rest of day
if self._rate_limited_until == today:
return False
# Reset daily counter on date rollover (but keep dedup memory)
if self._daily_date != today:
self._daily_date = today
self._daily_count = 0
self._rate_limited_until = ""
# Daily cap check
if self._daily_count >= self._daily_cap:
return False
# Dedup check — once sent, never send the same message again
# (until process restart)
key = hashlib.md5(
(channel_name + "\0" + message).encode()
).hexdigest()
if key in self._sent:
log.info(
"ntfy dedup: suppressed duplicate to '%s'", channel_name,
)
return False
# All checks passed — record send
self._sent.add(key)
self._daily_count += 1
if self._daily_count == self._daily_cap:
log.warning(
"ntfy daily cap reached (%d). No more sends today.",
self._daily_cap,
)
return True
def _mark_rate_limited(self) -> None:
"""Flag that we got a 429 — suppress all sends for rest of day."""
with self._lock:
self._rate_limited_until = self._today()
log.warning("ntfy 429 received. Suppressing all sends for rest of day.")
def notify(self, message: str, category: str) -> None:
"""Route a notification to matching ntfy channels.
This is the callback signature expected by NotificationBus.subscribe().
Each matching channel posts in a daemon thread so the notification
pipeline is never blocked.
"""
for channel in self._channels:
if channel.accepts(message, category):
if not self._check_and_track(channel.name, message):
continue
t = threading.Thread(
target=self._post,
args=(channel, message, category),
daemon=True,
)
t.start()
def _post(self, channel: NtfyChannel, message: str, category: str) -> None:
"""Send a notification to an ntfy topic. Fire-and-forget."""
url = f"{channel.server.rstrip('/')}/{channel.topic}"
headers: dict[str, str] = {
"Title": f"CheddahBot [{category}]",
"Priority": channel.priority,
}
if channel.tags:
headers["Tags"] = channel.tags
try:
resp = httpx.post(
url,
content=message.encode("utf-8"),
headers=headers,
timeout=10.0,
)
if resp.status_code == 429:
self._mark_rate_limited()
elif resp.status_code >= 400:
log.warning(
"ntfy '%s' returned %d: %s",
channel.name, resp.status_code, resp.text[:200],
)
else:
log.debug("ntfy notification sent to '%s'", channel.name)
except httpx.HTTPError as e:
log.warning("ntfy '%s' failed: %s", channel.name, e)

File diff suppressed because it is too large Load Diff

View File

@ -1,619 +0,0 @@
/* CheddahBot Dark Theme */
:root {
--bg-primary: #0d1117;
--bg-surface: #161b22;
--bg-surface-hover: #1c2129;
--bg-input: #0d1117;
--text-primary: #e6edf3;
--text-secondary: #8b949e;
--text-muted: #484f58;
--accent: #2dd4bf;
--accent-dim: #134e4a;
--border: #30363d;
--success: #3fb950;
--error: #f85149;
--warning: #d29922;
--font-sans: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif;
--font-mono: 'JetBrains Mono', 'Fira Code', 'Cascadia Code', monospace;
--radius: 8px;
--sidebar-width: 280px;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
html, body {
height: 100%;
font-family: var(--font-sans);
font-size: 15px;
line-height: 1.5;
color: var(--text-primary);
background: var(--bg-primary);
overflow: hidden;
}
/* Top Navigation */
.top-nav {
display: flex;
align-items: center;
gap: 24px;
padding: 0 20px;
height: 48px;
background: var(--bg-surface);
border-bottom: 1px solid var(--border);
flex-shrink: 0;
}
.nav-brand {
font-weight: 700;
font-size: 1.1em;
color: var(--accent);
}
.nav-links { display: flex; gap: 4px; }
.nav-link {
color: var(--text-secondary);
text-decoration: none;
padding: 6px 14px;
border-radius: var(--radius);
font-size: 0.9em;
transition: background 0.15s, color 0.15s;
}
.nav-link:hover { background: var(--bg-surface-hover); color: var(--text-primary); }
.nav-link.active { color: var(--accent); background: var(--accent-dim); }
/* Main content area */
.main-content {
height: calc(100vh - 48px);
overflow: hidden;
}
/* ─── Chat Layout ─── */
.chat-layout {
display: flex;
height: 100%;
}
/* Sidebar */
.chat-sidebar {
width: var(--sidebar-width);
min-width: var(--sidebar-width);
background: var(--bg-surface);
border-right: 1px solid var(--border);
display: flex;
flex-direction: column;
padding: 12px;
gap: 8px;
overflow-y: auto;
flex-shrink: 0;
}
.sidebar-header {
display: flex;
justify-content: space-between;
align-items: center;
}
.sidebar-header h3 { font-size: 0.85em; color: var(--text-secondary); text-transform: uppercase; letter-spacing: 0.05em; }
.sidebar-toggle {
display: none;
background: none;
border: none;
color: var(--text-secondary);
font-size: 1.2em;
cursor: pointer;
}
.sidebar-open-btn {
display: none;
position: fixed;
top: 56px;
left: 8px;
z-index: 20;
background: var(--bg-surface);
border: 1px solid var(--border);
color: var(--text-primary);
padding: 6px 10px;
border-radius: var(--radius);
cursor: pointer;
font-size: 1.2em;
}
.sidebar-divider {
height: 1px;
background: var(--border);
margin: 4px 0;
}
.agent-selector { display: flex; flex-direction: column; gap: 4px; }
.agent-btn {
padding: 8px 12px;
background: transparent;
border: 1px solid var(--border);
border-radius: var(--radius);
color: var(--text-primary);
cursor: pointer;
text-align: left;
font-size: 0.9em;
transition: border-color 0.15s, background 0.15s;
}
.agent-btn:hover { background: var(--bg-surface-hover); }
.agent-btn.active { border-color: var(--accent); background: var(--accent-dim); }
.btn-new-chat {
width: 100%;
padding: 8px;
background: var(--accent-dim);
border: 1px solid var(--accent);
border-radius: var(--radius);
color: var(--accent);
cursor: pointer;
font-size: 0.9em;
transition: background 0.15s;
}
.btn-new-chat:hover { background: var(--accent); color: var(--bg-primary); }
.chat-sidebar h3 {
font-size: 0.8em;
color: var(--text-secondary);
text-transform: uppercase;
letter-spacing: 0.05em;
margin-top: 8px;
}
.conv-btn {
display: block;
width: 100%;
padding: 8px 10px;
background: transparent;
border: 1px solid transparent;
border-radius: var(--radius);
color: var(--text-primary);
cursor: pointer;
text-align: left;
font-size: 0.85em;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
transition: background 0.15s;
}
.conv-btn:hover { background: var(--bg-surface-hover); }
.conv-btn.active { border-color: var(--accent); background: var(--accent-dim); }
/* Chat main area */
.chat-main {
flex: 1;
display: flex;
flex-direction: column;
min-width: 0;
}
/* Status bar */
.status-bar {
display: flex;
gap: 16px;
padding: 8px 20px;
font-size: 0.8em;
color: var(--text-secondary);
border-bottom: 1px solid var(--border);
background: var(--bg-surface);
flex-shrink: 0;
}
.status-item strong { color: var(--text-primary); }
.text-ok { color: var(--success) !important; }
.text-err { color: var(--error) !important; }
/* Notification banner */
.notification-banner {
margin: 8px 20px 0;
padding: 10px 16px;
background: var(--bg-surface);
border: 1px solid var(--accent-dim);
border-radius: var(--radius);
font-size: 0.9em;
color: var(--accent);
}
/* Messages area */
.chat-messages {
flex: 1;
overflow-y: auto;
padding: 16px 20px;
display: flex;
flex-direction: column;
gap: 12px;
}
.message {
display: flex;
gap: 10px;
max-width: 85%;
animation: fadeIn 0.2s ease-out;
}
@keyframes fadeIn {
from { opacity: 0; transform: translateY(4px); }
to { opacity: 1; transform: translateY(0); }
}
.message.user { align-self: flex-end; flex-direction: row-reverse; }
.message.assistant { align-self: flex-start; }
.message-avatar {
width: 32px;
height: 32px;
border-radius: 50%;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.7em;
font-weight: 700;
flex-shrink: 0;
}
.message.user .message-avatar { background: var(--accent-dim); color: var(--accent); }
.message.assistant .message-avatar { background: #1c2129; color: var(--text-secondary); }
.message-body {
background: var(--bg-surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 10px 14px;
min-width: 0;
}
.message.user .message-body { background: var(--accent-dim); border-color: var(--accent); }
.message-content {
word-wrap: break-word;
overflow-wrap: break-word;
}
/* Markdown rendering in messages */
.message-content p { margin: 0.4em 0; }
.message-content p:first-child { margin-top: 0; }
.message-content p:last-child { margin-bottom: 0; }
.message-content pre {
background: var(--bg-primary);
border: 1px solid var(--border);
border-radius: 4px;
padding: 10px;
overflow-x: auto;
font-family: var(--font-mono);
font-size: 0.9em;
margin: 0.5em 0;
}
.message-content code {
font-family: var(--font-mono);
font-size: 0.9em;
background: var(--bg-primary);
padding: 2px 5px;
border-radius: 3px;
}
.message-content pre code { background: none; padding: 0; }
.message-content ul, .message-content ol { margin: 0.4em 0; padding-left: 1.5em; }
.message-content a { color: var(--accent); }
.message-content blockquote {
border-left: 3px solid var(--accent);
padding-left: 12px;
color: var(--text-secondary);
margin: 0.5em 0;
}
.message-content table { border-collapse: collapse; margin: 0.5em 0; }
.message-content th, .message-content td {
border: 1px solid var(--border);
padding: 6px 10px;
text-align: left;
}
.message-content th { background: var(--bg-surface-hover); }
/* Chat input area */
.chat-input-area {
padding: 12px 20px;
border-top: 1px solid var(--border);
background: var(--bg-surface);
flex-shrink: 0;
}
.input-row {
display: flex;
align-items: flex-end;
gap: 8px;
}
#chat-input {
flex: 1;
background: var(--bg-input);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 10px 14px;
color: var(--text-primary);
font-family: var(--font-sans);
font-size: 15px;
resize: none;
max-height: 200px;
line-height: 1.4;
}
#chat-input:focus { outline: none; border-color: var(--accent); }
#chat-input::placeholder { color: var(--text-muted); }
.file-upload-btn {
padding: 8px 10px;
cursor: pointer;
font-size: 1.2em;
color: var(--text-secondary);
transition: color 0.15s;
flex-shrink: 0;
}
.file-upload-btn:hover { color: var(--accent); }
.send-btn {
padding: 8px 14px;
background: var(--accent);
border: none;
border-radius: var(--radius);
color: var(--bg-primary);
font-size: 1.1em;
cursor: pointer;
flex-shrink: 0;
transition: opacity 0.15s;
}
.send-btn:hover { opacity: 0.85; }
.file-preview {
margin-top: 6px;
font-size: 0.85em;
color: var(--text-secondary);
}
.file-preview .file-tag {
display: inline-block;
background: var(--bg-primary);
border: 1px solid var(--border);
border-radius: 4px;
padding: 2px 8px;
margin-right: 6px;
}
/* ─── Dashboard Layout ─── */
.dashboard-layout {
display: flex;
flex-direction: column;
gap: 16px;
padding: 16px 20px;
height: 100%;
overflow-y: auto;
}
.panel {
background: var(--bg-surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 16px;
}
.panel-title {
font-size: 1.1em;
font-weight: 600;
margin-bottom: 12px;
color: var(--accent);
}
.panel-section {
margin-bottom: 16px;
}
.panel-section h3 {
font-size: 0.85em;
color: var(--text-secondary);
text-transform: uppercase;
letter-spacing: 0.05em;
margin-bottom: 8px;
}
/* Loop health grid */
.loop-grid {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.loop-badge {
display: flex;
flex-direction: column;
align-items: center;
padding: 8px 12px;
border-radius: var(--radius);
font-size: 0.8em;
min-width: 90px;
border: 1px solid var(--border);
}
.loop-name { font-weight: 600; }
.loop-ago { color: var(--text-secondary); font-size: 0.85em; }
.badge-ok { border-color: var(--success); background: rgba(63, 185, 80, 0.1); }
.badge-ok .loop-name { color: var(--success); }
.badge-warn { border-color: var(--warning); background: rgba(210, 153, 34, 0.1); }
.badge-warn .loop-name { color: var(--warning); }
.badge-err { border-color: var(--error); background: rgba(248, 81, 73, 0.1); }
.badge-err .loop-name { color: var(--error); }
.badge-muted { border-color: var(--text-muted); }
.badge-muted .loop-name { color: var(--text-muted); }
/* Active executions */
.exec-list { display: flex; flex-direction: column; gap: 6px; }
.exec-item {
display: flex;
gap: 12px;
padding: 6px 10px;
background: var(--bg-primary);
border-radius: 4px;
font-size: 0.85em;
}
.exec-name { flex: 1; font-weight: 500; }
.exec-tool { color: var(--text-secondary); }
.exec-dur { color: var(--accent); font-family: var(--font-mono); }
/* Action buttons */
.action-buttons { display: flex; gap: 8px; flex-wrap: wrap; }
.btn {
padding: 8px 16px;
background: var(--bg-surface-hover);
border: 1px solid var(--border);
border-radius: var(--radius);
color: var(--text-primary);
cursor: pointer;
font-size: 0.9em;
transition: border-color 0.15s, background 0.15s;
}
.btn:hover { border-color: var(--accent); }
.btn-sm { padding: 6px 12px; font-size: 0.8em; }
/* Notification feed */
.notif-feed { display: flex; flex-direction: column; gap: 4px; max-height: 300px; overflow-y: auto; }
.notif-item {
padding: 6px 10px;
font-size: 0.85em;
border-left: 3px solid var(--border);
background: var(--bg-primary);
border-radius: 0 4px 4px 0;
}
.notif-clickup { border-left-color: var(--accent); }
.notif-info { border-left-color: var(--text-secondary); }
.notif-error { border-left-color: var(--error); }
.notif-cat {
font-weight: 600;
font-size: 0.8em;
text-transform: uppercase;
color: var(--text-secondary);
}
/* Task table */
.task-table { width: 100%; border-collapse: collapse; font-size: 0.85em; }
.task-table th, .task-table td { padding: 8px 12px; border-bottom: 1px solid var(--border); text-align: left; }
.task-table th { color: var(--text-secondary); font-weight: 600; text-transform: uppercase; font-size: 0.85em; }
.task-table a { color: var(--accent); text-decoration: none; }
.task-table a:hover { text-decoration: underline; }
.status-badge {
display: inline-block;
padding: 2px 8px;
border-radius: 4px;
font-size: 0.85em;
font-weight: 500;
}
.status-to-do { background: rgba(139, 148, 158, 0.2); color: var(--text-secondary); }
.status-in-progress, .status-automation-underway { background: rgba(45, 212, 191, 0.15); color: var(--accent); }
.status-error { background: rgba(248, 81, 73, 0.15); color: var(--error); }
.status-complete, .status-closed { background: rgba(63, 185, 80, 0.15); color: var(--success); }
.status-internal-review, .status-outline-review { background: rgba(210, 153, 34, 0.15); color: var(--warning); }
/* Pipeline groups */
.pipeline-group { margin-bottom: 16px; }
.pipeline-group h4 {
font-size: 0.9em;
margin-bottom: 8px;
padding-bottom: 4px;
border-bottom: 1px solid var(--border);
}
.pipeline-stats {
display: flex;
gap: 12px;
margin-bottom: 12px;
flex-wrap: wrap;
}
.pipeline-stat {
padding: 8px 14px;
background: var(--bg-primary);
border: 1px solid var(--border);
border-radius: var(--radius);
text-align: center;
}
.pipeline-stat .stat-count { font-size: 1.5em; font-weight: 700; color: var(--accent); }
.pipeline-stat .stat-label { font-size: 0.75em; color: var(--text-secondary); }
/* Flash messages */
.flash-msg {
position: fixed;
bottom: 20px;
right: 20px;
background: var(--accent);
color: var(--bg-primary);
padding: 10px 20px;
border-radius: var(--radius);
font-weight: 600;
font-size: 0.9em;
z-index: 100;
animation: fadeIn 0.2s ease-out, fadeOut 0.5s 2.5s ease-out forwards;
}
@keyframes fadeOut { to { opacity: 0; transform: translateY(10px); } }
/* Utility */
.text-muted { color: var(--text-muted); }
/* Typing indicator */
.typing-indicator span {
display: inline-block;
width: 6px;
height: 6px;
background: var(--text-secondary);
border-radius: 50%;
margin: 0 2px;
animation: bounce 1.2s infinite;
}
.typing-indicator span:nth-child(2) { animation-delay: 0.2s; }
.typing-indicator span:nth-child(3) { animation-delay: 0.4s; }
@keyframes bounce {
0%, 60%, 100% { transform: translateY(0); }
30% { transform: translateY(-6px); }
}
/* ─── Mobile ─── */
@media (max-width: 768px) {
.chat-sidebar {
position: fixed;
top: 48px;
left: 0;
bottom: 0;
z-index: 30;
transform: translateX(-100%);
transition: transform 0.2s ease;
width: 280px;
}
.chat-sidebar.open { transform: translateX(0); }
.sidebar-toggle { display: block; }
.sidebar-open-btn { display: block; }
.status-bar { flex-wrap: wrap; gap: 8px; padding: 6px 12px; font-size: 0.75em; }
.chat-messages { padding: 12px; }
.message { max-width: 95%; }
.chat-input-area { padding: 8px 12px; }
#chat-input { font-size: 16px; } /* Prevent iOS zoom */
.dashboard-layout { padding: 12px; }
.loop-grid { gap: 6px; }
.loop-badge { min-width: 70px; padding: 6px 8px; font-size: 0.75em; }
}
/* Overlay for mobile sidebar */
.sidebar-overlay {
display: none;
position: fixed;
top: 48px;
left: 0;
right: 0;
bottom: 0;
background: rgba(0, 0, 0, 0.5);
z-index: 25;
}
.sidebar-overlay.visible { display: block; }
/* Scrollbar styling */
::-webkit-scrollbar { width: 6px; }
::-webkit-scrollbar-track { background: transparent; }
::-webkit-scrollbar-thumb { background: var(--border); border-radius: 3px; }
::-webkit-scrollbar-thumb:hover { background: var(--text-muted); }

View File

@ -1,284 +0,0 @@
/* CheddahBot Frontend JS */
// ── Session Management ──
const SESSION_KEY = 'cheddahbot_session';
function getSession() {
try { return JSON.parse(localStorage.getItem(SESSION_KEY) || '{}'); }
catch { return {}; }
}
function saveSession(data) {
const s = getSession();
Object.assign(s, data);
localStorage.setItem(SESSION_KEY, JSON.stringify(s));
}
function getActiveAgent() {
return getSession().agent_name || document.getElementById('input-agent-name')?.value || 'default';
}
// ── Agent Switching ──
function switchAgent(name) {
// Update UI
document.querySelectorAll('.agent-btn').forEach(b => {
b.classList.toggle('active', b.dataset.agent === name);
});
document.getElementById('input-agent-name').value = name;
document.getElementById('input-conv-id').value = '';
saveSession({ agent_name: name, conv_id: null });
// Clear chat and load new sidebar
document.getElementById('chat-messages').innerHTML = '';
refreshSidebar();
}
function setActiveAgent(name) {
document.querySelectorAll('.agent-btn').forEach(b => {
b.classList.toggle('active', b.dataset.agent === name);
});
const agentInput = document.getElementById('input-agent-name');
if (agentInput) agentInput.value = name;
}
// ── Sidebar ──
function refreshSidebar() {
const agent = getActiveAgent();
htmx.ajax('GET', '/chat/conversations?agent_name=' + agent, {
target: '#sidebar-conversations',
swap: 'innerHTML'
});
}
// ── Conversation Loading ──
function loadConversation(convId) {
const agent = getActiveAgent();
document.getElementById('input-conv-id').value = convId;
saveSession({ conv_id: convId });
htmx.ajax('GET', '/chat/load/' + convId + '?agent_name=' + agent, {
target: '#chat-messages',
swap: 'innerHTML'
}).then(() => {
scrollChat();
renderAllMarkdown();
});
}
// ── Chat Input ──
function handleKeydown(e) {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
document.getElementById('chat-form').requestSubmit();
}
}
function autoResize(el) {
el.style.height = 'auto';
el.style.height = Math.min(el.scrollHeight, 200) + 'px';
}
function afterSend(event) {
const input = document.getElementById('chat-input');
input.value = '';
input.style.height = 'auto';
// Clear file input and preview
const fileInput = document.querySelector('input[type="file"]');
if (fileInput) fileInput.value = '';
const preview = document.getElementById('file-preview');
if (preview) { preview.style.display = 'none'; preview.innerHTML = ''; }
scrollChat();
}
function scrollChat() {
const el = document.getElementById('chat-messages');
if (el) {
requestAnimationFrame(() => {
el.scrollTop = el.scrollHeight;
});
}
}
// ── File Upload Preview ──
function showFileNames(input) {
const preview = document.getElementById('file-preview');
if (!input.files.length) {
preview.style.display = 'none';
return;
}
let html = '';
for (const f of input.files) {
html += '<span class="file-tag">' + f.name + '</span>';
}
preview.innerHTML = html;
preview.style.display = 'block';
}
// Drag and drop
document.addEventListener('DOMContentLoaded', () => {
const chatMain = document.querySelector('.chat-main');
if (!chatMain) return;
chatMain.addEventListener('dragover', e => {
e.preventDefault();
chatMain.style.outline = '2px dashed var(--accent)';
});
chatMain.addEventListener('dragleave', () => {
chatMain.style.outline = '';
});
chatMain.addEventListener('drop', e => {
e.preventDefault();
chatMain.style.outline = '';
const fileInput = document.querySelector('input[type="file"]');
if (fileInput && e.dataTransfer.files.length) {
fileInput.files = e.dataTransfer.files;
showFileNames(fileInput);
}
});
});
// ── SSE Streaming ──
// Handle SSE chunks for chat streaming
let streamBuffer = '';
let activeSSE = null;
document.addEventListener('htmx:sseBeforeMessage', function(e) {
// This fires for each SSE event received by htmx
});
// Watch for SSE trigger divs being added to the DOM
const observer = new MutationObserver(mutations => {
for (const m of mutations) {
for (const node of m.addedNodes) {
if (node.id === 'sse-trigger') {
setupStream(node);
}
}
}
});
document.addEventListener('DOMContentLoaded', () => {
const chatMessages = document.getElementById('chat-messages');
if (chatMessages) {
observer.observe(chatMessages, { childList: true, subtree: true });
}
});
function setupStream(triggerDiv) {
const sseUrl = triggerDiv.getAttribute('sse-connect');
if (!sseUrl) return;
// Remove the htmx SSE to manage manually
triggerDiv.remove();
const responseDiv = document.getElementById('assistant-response');
if (!responseDiv) return;
streamBuffer = '';
// Show typing indicator
responseDiv.innerHTML = '<div class="typing-indicator"><span></span><span></span><span></span></div>';
const source = new EventSource(sseUrl);
activeSSE = source;
source.addEventListener('chunk', function(e) {
if (streamBuffer === '') {
// Remove typing indicator on first chunk
responseDiv.innerHTML = '';
}
streamBuffer += e.data;
// Render markdown
try {
responseDiv.innerHTML = marked.parse(streamBuffer);
} catch {
responseDiv.textContent = streamBuffer;
}
scrollChat();
});
source.addEventListener('done', function(e) {
source.close();
activeSSE = null;
// Final markdown render
if (streamBuffer) {
try {
responseDiv.innerHTML = marked.parse(streamBuffer);
} catch {
responseDiv.textContent = streamBuffer;
}
}
streamBuffer = '';
// Update conv_id from done event data
const convId = e.data;
if (convId) {
document.getElementById('input-conv-id').value = convId;
saveSession({ conv_id: convId });
}
// Refresh sidebar
refreshSidebar();
scrollChat();
});
source.onerror = function() {
source.close();
activeSSE = null;
if (!streamBuffer) {
responseDiv.innerHTML = '<span class="text-err">Connection lost</span>';
}
};
}
// ── Markdown Rendering ──
function renderAllMarkdown() {
document.querySelectorAll('.message-content').forEach(el => {
const raw = el.textContent;
if (raw && typeof marked !== 'undefined') {
try {
el.innerHTML = marked.parse(raw);
} catch { /* keep raw text */ }
}
});
}
// ── Mobile Sidebar ──
function toggleSidebar() {
const sidebar = document.getElementById('chat-sidebar');
const overlay = document.getElementById('sidebar-overlay');
if (sidebar) {
sidebar.classList.toggle('open');
}
if (overlay) {
overlay.classList.toggle('visible');
}
}
// ── Notification Banner (chat page) ──
function setupChatNotifications() {
const banner = document.getElementById('notification-banner');
if (!banner) return;
const source = new EventSource('/sse/notifications');
source.addEventListener('notification', function(e) {
const notif = JSON.parse(e.data);
banner.textContent = notif.message;
banner.style.display = 'block';
// Auto-hide after 15s
setTimeout(() => { banner.style.display = 'none'; }, 15000);
});
}
document.addEventListener('DOMContentLoaded', setupChatNotifications);
// ── HTMX Events ──
document.addEventListener('scrollChat', scrollChat);
document.addEventListener('htmx:afterSwap', function(e) {
if (e.target.id === 'chat-messages') {
renderAllMarkdown();
scrollChat();
}
});

View File

@ -1,27 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}CheddahBot{% endblock %}</title>
<link rel="stylesheet" href="/static/app.css">
<script src="https://unpkg.com/htmx.org@2.0.4"></script>
<script src="https://unpkg.com/htmx-ext-sse@2.3.0/sse.js"></script>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
{% block head %}{% endblock %}
</head>
<body>
<nav class="top-nav">
<div class="nav-brand">CheddahBot</div>
<div class="nav-links">
<a href="/" class="nav-link {% block nav_chat_active %}{% endblock %}">Chat</a>
<a href="/dashboard" class="nav-link {% block nav_dash_active %}{% endblock %}">Dashboard</a>
</div>
</nav>
<main class="main-content">
{% block content %}{% endblock %}
</main>
<script src="/static/app.js"></script>
{% block scripts %}{% endblock %}
</body>
</html>

View File

@ -1,111 +0,0 @@
{% extends "base.html" %}
{% block title %}Chat - CheddahBot{% endblock %}
{% block nav_chat_active %}active{% endblock %}
{% block content %}
<div class="chat-layout">
<!-- Sidebar -->
<aside class="chat-sidebar" id="chat-sidebar">
<div class="sidebar-header">
<h3>Agents</h3>
<button class="sidebar-toggle" onclick="toggleSidebar()" aria-label="Close sidebar">&#x2715;</button>
</div>
<div class="agent-selector" id="agent-selector">
{% for agent in agents %}
<button
class="agent-btn {% if agent.name == default_agent %}active{% endif %}"
data-agent="{{ agent.name }}"
onclick="switchAgent('{{ agent.name }}')"
>{{ agent.display_name }}</button>
{% endfor %}
</div>
<div class="sidebar-divider"></div>
<button
class="btn btn-new-chat"
hx-post="/chat/new"
hx-vals='{"agent_name": "{{ default_agent }}"}'
hx-target="#chat-messages"
hx-swap="innerHTML"
onclick="this.setAttribute('hx-vals', JSON.stringify({agent_name: getActiveAgent()}))"
>+ New Chat</button>
<h3>History</h3>
<div id="sidebar-conversations"
hx-get="/chat/conversations?agent_name={{ default_agent }}"
hx-trigger="load"
hx-swap="innerHTML">
</div>
</aside>
<!-- Mobile sidebar toggle + overlay -->
<button class="sidebar-open-btn" onclick="toggleSidebar()" aria-label="Open sidebar">&#9776;</button>
<div id="sidebar-overlay" class="sidebar-overlay" onclick="toggleSidebar()"></div>
<!-- Chat area -->
<div class="chat-main">
<!-- Status bar -->
<div class="status-bar">
<span class="status-item">Model: <strong>{{ chat_model }}</strong></span>
<span class="status-item">Exec: <strong class="{% if exec_available %}text-ok{% else %}text-err{% endif %}">{{ "OK" if exec_available else "N/A" }}</strong></span>
<span class="status-item">ClickUp: <strong class="{% if clickup_enabled %}text-ok{% else %}text-err{% endif %}">{{ "ON" if clickup_enabled else "OFF" }}</strong></span>
</div>
<!-- Notification banner (populated by SSE) -->
<div id="notification-banner" class="notification-banner" style="display:none;"></div>
<!-- Messages -->
<div class="chat-messages" id="chat-messages">
<!-- Messages loaded here -->
</div>
<!-- Input area -->
<form id="chat-form" class="chat-input-area"
hx-post="/chat/send"
hx-target="#chat-messages"
hx-swap="beforeend"
hx-encoding="multipart/form-data"
hx-on::after-request="afterSend(event)">
<input type="hidden" name="agent_name" id="input-agent-name" value="{{ default_agent }}">
<input type="hidden" name="conv_id" id="input-conv-id" value="">
<div class="input-row">
<label class="file-upload-btn" title="Attach files">
&#x1f4ce;
<input type="file" name="files" multiple style="display:none;" onchange="showFileNames(this)">
</label>
<textarea name="text" id="chat-input" rows="1" placeholder="Type a message..."
onkeydown="handleKeydown(event)" oninput="autoResize(this)"></textarea>
<button type="submit" class="send-btn" title="Send">&#x27A4;</button>
</div>
<div id="file-preview" class="file-preview" style="display:none;"></div>
</form>
</div>
</div>
{% endblock %}
{% block scripts %}
<script>
// Initialize session state
const SESSION_KEY = 'cheddahbot_session';
let session = JSON.parse(localStorage.getItem(SESSION_KEY) || '{}');
if (!session.agent_name) session.agent_name = '{{ default_agent }}';
// Restore session on load
document.addEventListener('DOMContentLoaded', function() {
if (session.agent_name) {
setActiveAgent(session.agent_name);
}
if (session.conv_id) {
loadConversation(session.conv_id);
}
// Load conversations for sidebar
refreshSidebar();
});
function saveSession() {
localStorage.setItem(SESSION_KEY, JSON.stringify(session));
}
</script>
{% endblock %}

View File

@ -1,174 +0,0 @@
{% extends "base.html" %}
{% block title %}Dashboard - CheddahBot{% endblock %}
{% block nav_dash_active %}active{% endblock %}
{% block content %}
<div class="dashboard-layout">
<!-- Ops Panel -->
<section class="panel" id="ops-panel">
<h2 class="panel-title">Operations</h2>
<!-- Active Executions -->
<div class="panel-section">
<h3>Active Executions</h3>
<div id="active-executions" class="exec-list">
<span class="text-muted">Loading...</span>
</div>
</div>
<!-- Loop Health -->
<div class="panel-section">
<h3>Loop Health</h3>
<div id="loop-health" class="loop-grid">
<span class="text-muted">Loading...</span>
</div>
</div>
<!-- Actions -->
<div class="panel-section">
<h3>Actions</h3>
<div class="action-buttons">
<button class="btn btn-sm"
hx-post="/api/system/loops/force"
hx-swap="none"
hx-on::after-request="showFlash('Force pulse sent')">
Force Pulse
</button>
<button class="btn btn-sm"
hx-post="/api/system/briefing/force"
hx-swap="none"
hx-on::after-request="showFlash('Briefing triggered')">
Force Briefing
</button>
<button class="btn btn-sm"
hx-post="/api/cache/clear"
hx-swap="none"
hx-on::after-request="showFlash('Cache cleared')">
Clear Cache
</button>
</div>
</div>
<!-- Notification Feed -->
<div class="panel-section">
<h3>Notifications</h3>
<div id="notification-feed" class="notif-feed">
<span class="text-muted">Waiting for notifications...</span>
</div>
</div>
</section>
<!-- Pipeline Panel -->
<section class="panel" id="pipeline-panel">
<h2 class="panel-title">Pipeline</h2>
<div id="pipeline-content"
hx-get="/dashboard/pipeline"
hx-trigger="load, every 120s"
hx-swap="innerHTML">
<span class="text-muted">Loading pipeline data...</span>
</div>
</section>
</div>
{% endblock %}
{% block scripts %}
<script>
// Connect to SSE for live loop updates
const loopSource = new EventSource('/sse/loops');
loopSource.addEventListener('loops', function(e) {
const data = JSON.parse(e.data);
renderLoopHealth(data.loops);
renderActiveExecutions(data.executions);
});
// Connect to SSE for notifications
const notifSource = new EventSource('/sse/notifications');
notifSource.addEventListener('notification', function(e) {
const notif = JSON.parse(e.data);
addNotification(notif.message, notif.category);
});
function renderLoopHealth(loops) {
const container = document.getElementById('loop-health');
if (!loops || Object.keys(loops).length === 0) {
container.innerHTML = '<span class="text-muted">No loop data</span>';
return;
}
let html = '';
const now = new Date();
for (const [name, ts] of Object.entries(loops)) {
let statusClass = 'badge-muted';
let agoText = 'never';
if (ts) {
const dt = new Date(ts);
const secs = Math.floor((now - dt) / 1000);
if (secs < 120) {
statusClass = 'badge-ok';
agoText = secs + 's ago';
} else if (secs < 600) {
statusClass = 'badge-warn';
agoText = Math.floor(secs / 60) + 'm ago';
} else {
statusClass = 'badge-err';
agoText = Math.floor(secs / 60) + 'm ago';
}
}
html += '<div class="loop-badge ' + statusClass + '">' +
'<span class="loop-name">' + name + '</span>' +
'<span class="loop-ago">' + agoText + '</span>' +
'</div>';
}
container.innerHTML = html;
}
function renderActiveExecutions(execs) {
const container = document.getElementById('active-executions');
if (!execs || Object.keys(execs).length === 0) {
container.innerHTML = '<span class="text-muted">No active executions</span>';
return;
}
let html = '';
const now = new Date();
for (const [id, info] of Object.entries(execs)) {
const started = new Date(info.started_at);
const durSecs = Math.floor((now - started) / 1000);
let dur = durSecs + 's';
if (durSecs >= 60) dur = Math.floor(durSecs / 60) + 'm ' + (durSecs % 60) + 's';
html += '<div class="exec-item">' +
'<span class="exec-name">' + info.name + '</span>' +
'<span class="exec-tool">' + info.tool + '</span>' +
'<span class="exec-dur">' + dur + '</span>' +
'</div>';
}
container.innerHTML = html;
}
let notifCount = 0;
function addNotification(message, category) {
const container = document.getElementById('notification-feed');
if (notifCount === 0) container.innerHTML = '';
notifCount++;
const div = document.createElement('div');
div.className = 'notif-item notif-' + (category || 'info');
div.innerHTML = '<span class="notif-cat">' + (category || 'info') + '</span> ' + message;
container.insertBefore(div, container.firstChild);
// Keep max 30
while (container.children.length > 30) {
container.removeChild(container.lastChild);
}
}
function showFlash(msg) {
const el = document.createElement('div');
el.className = 'flash-msg';
el.textContent = msg;
document.body.appendChild(el);
setTimeout(() => el.remove(), 3000);
}
</script>
{% endblock %}

View File

@ -1,6 +0,0 @@
<div class="message {{ role }}">
<div class="message-avatar">{% if role == 'user' %}You{% else %}CB{% endif %}</div>
<div class="message-body">
<div class="message-content">{{ content }}</div>
</div>
</div>

View File

@ -1,11 +0,0 @@
{% if conversations %}
{% for conv in conversations %}
<button class="conv-btn"
onclick="loadConversation('{{ conv.id }}')"
title="{{ conv.title or 'New Chat' }}">
{{ conv.title or 'New Chat' }}
</button>
{% endfor %}
{% else %}
<p class="text-muted">No conversations yet</p>
{% endif %}

View File

@ -1,6 +0,0 @@
{% for name, info in loops.items() %}
<div class="loop-badge {{ info.class }}">
<span class="loop-name">{{ name }}</span>
<span class="loop-ago">{{ info.ago }}</span>
</div>
{% endfor %}

View File

@ -1,6 +0,0 @@
{% for notif in notifications %}
<div class="notif-item notif-{{ notif.category or 'info' }}">
<span class="notif-cat">{{ notif.category }}</span>
{{ notif.message }}
</div>
{% endfor %}

View File

@ -1,27 +0,0 @@
{% if tasks %}
<table class="task-table">
<thead>
<tr>
<th>Task</th>
<th>Customer</th>
<th>Status</th>
<th>Due</th>
</tr>
</thead>
<tbody>
{% for task in tasks %}
<tr>
<td>
{% if task.url %}<a href="{{ task.url }}" target="_blank" rel="noopener">{{ task.name }}</a>
{% else %}{{ task.name }}{% endif %}
</td>
<td>{{ task.custom_fields.get('Client', 'N/A') if task.custom_fields else 'N/A' }}</td>
<td><span class="status-badge status-{{ task.status|replace(' ', '-') }}">{{ task.status }}</span></td>
<td>{{ task.due_display or '-' }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p class="text-muted">No tasks</p>
{% endif %}

View File

@ -105,7 +105,6 @@ class ToolRegistry:
self.db = db self.db = db
self.agent = agent self.agent = agent
self.agent_registry = None # set after multi-agent setup self.agent_registry = None # set after multi-agent setup
self.scheduler = None # set after scheduler creation
self._discover_tools() self._discover_tools()
def _discover_tools(self): def _discover_tools(self):
@ -159,13 +158,10 @@ class ToolRegistry:
"agent": self.agent, "agent": self.agent,
"memory": self.agent._memory, "memory": self.agent._memory,
"agent_registry": self.agent_registry, "agent_registry": self.agent_registry,
"scheduler": self.scheduler,
} }
# Pass scheduler-injected metadata through ctx (not LLM-visible) # Pass scheduler-injected metadata through ctx (not LLM-visible)
if "clickup_task_id" in args: if "clickup_task_id" in args:
ctx["clickup_task_id"] = args.pop("clickup_task_id") ctx["clickup_task_id"] = args.pop("clickup_task_id")
if "clickup_task_status" in args:
ctx["clickup_task_status"] = args.pop("clickup_task_status")
args["ctx"] = ctx args["ctx"] = ctx
# Filter args to only params the function accepts (plus **kwargs) # Filter args to only params the function accepts (plus **kwargs)

View File

@ -52,15 +52,15 @@ def _get_clickup_client(ctx: dict):
def _find_qualifying_tasks(client, config, target_date: str, categories: list[str]): def _find_qualifying_tasks(client, config, target_date: str, categories: list[str]):
"""Find 'to do' tasks in cora_categories due on target_date (single day). """Find 'to do' tasks in cora_categories due on target_date.
Used when target_date is explicitly provided.
Returns list of ClickUpTask objects. Returns list of ClickUpTask objects.
""" """
space_id = config.clickup.space_id space_id = config.clickup.space_id
if not space_id: if not space_id:
return [] return []
# Parse target date to filter by due_date range (full day)
try: try:
dt = datetime.strptime(target_date, "%Y-%m-%d").replace(tzinfo=UTC) dt = datetime.strptime(target_date, "%Y-%m-%d").replace(tzinfo=UTC)
except ValueError: except ValueError:
@ -78,8 +78,10 @@ def _find_qualifying_tasks(client, config, target_date: str, categories: list[st
qualifying = [] qualifying = []
for task in tasks: for task in tasks:
# Must be in one of the cora categories
if task.task_type not in categories: if task.task_type not in categories:
continue continue
# Must have a due_date within the target day
if not task.due_date: if not task.due_date:
continue continue
try: try:
@ -93,129 +95,17 @@ def _find_qualifying_tasks(client, config, target_date: str, categories: list[st
return qualifying return qualifying
def _find_qualifying_tasks_sweep(client, config, categories: list[str]): def _find_all_todo_tasks(client, config, categories: list[str]):
"""Multi-pass sweep for qualifying tasks when no explicit date is given. """Find ALL 'to do' tasks in cora_categories (no date filter).
Pass 1: Tasks due today Used to find sibling tasks sharing the same keyword.
Pass 2: Overdue tasks tagged with current month (e.g. "feb26")
Pass 3: Tasks tagged with last month (e.g. "jan26"), still "to do"
Pass 4: Tasks due in next 2 days (look-ahead)
Deduplicates across passes by task ID.
Returns list of ClickUpTask objects.
""" """
space_id = config.clickup.space_id space_id = config.clickup.space_id
if not space_id: if not space_id:
return [] return []
now = datetime.now(UTC) tasks = client.get_tasks_from_space(space_id, statuses=["to do"])
today_start_ms = int( return [t for t in tasks if t.task_type in categories]
now.replace(hour=0, minute=0, second=0, microsecond=0).timestamp() * 1000
)
today_end_ms = today_start_ms + 24 * 60 * 60 * 1000
lookahead_end_ms = today_start_ms + 3 * 24 * 60 * 60 * 1000 # +2 days
# Current and last month tags (e.g. "feb26", "jan26")
current_month_tag = now.strftime("%b%y").lower()
# Go back one month
if now.month == 1:
last_month = now.replace(year=now.year - 1, month=12)
else:
last_month = now.replace(month=now.month - 1)
last_month_tag = last_month.strftime("%b%y").lower()
# Fetch all "to do" tasks with due dates up to lookahead
all_tasks = client.get_tasks_from_space(
space_id,
statuses=["to do"],
due_date_lt=lookahead_end_ms,
)
# Filter to cora categories
cora_tasks = [t for t in all_tasks if t.task_type in categories]
seen_ids: set[str] = set()
qualifying: list = []
def _add(task):
if task.id not in seen_ids:
seen_ids.add(task.id)
qualifying.append(task)
# Pass 1: Due today
for task in cora_tasks:
if not task.due_date:
continue
try:
due_ms = int(task.due_date)
except (ValueError, TypeError):
continue
if today_start_ms <= due_ms < today_end_ms:
_add(task)
# Pass 2: Overdue + tagged with current month
for task in cora_tasks:
if not task.due_date:
continue
try:
due_ms = int(task.due_date)
except (ValueError, TypeError):
continue
if due_ms < today_start_ms and current_month_tag in task.tags:
_add(task)
# Pass 3: Tagged with last month, still "to do"
for task in cora_tasks:
if last_month_tag in task.tags:
_add(task)
# Pass 4: Look-ahead (due in next 2 days, excluding today which was pass 1)
for task in cora_tasks:
if not task.due_date:
continue
try:
due_ms = int(task.due_date)
except (ValueError, TypeError):
continue
if today_end_ms <= due_ms < lookahead_end_ms:
_add(task)
log.info(
"AutoCora sweep: %d qualifying tasks "
"(today=%d, overdue+month=%d, last_month=%d, lookahead=%d)",
len(qualifying),
sum(1 for t in qualifying if _is_due_today(t, today_start_ms, today_end_ms)),
sum(1 for t in qualifying if _is_overdue_with_tag(t, today_start_ms, current_month_tag)),
sum(1 for t in qualifying if last_month_tag in t.tags),
sum(1 for t in qualifying if _is_lookahead(t, today_end_ms, lookahead_end_ms)),
)
return qualifying
def _is_due_today(task, start_ms, end_ms) -> bool:
try:
due = int(task.due_date)
return start_ms <= due < end_ms
except (ValueError, TypeError):
return False
def _is_overdue_with_tag(task, today_start_ms, tag) -> bool:
try:
due = int(task.due_date)
return due < today_start_ms and tag in task.tags
except (ValueError, TypeError):
return False
def _is_lookahead(task, today_end_ms, lookahead_end_ms) -> bool:
try:
due = int(task.due_date)
return today_end_ms <= due < lookahead_end_ms
except (ValueError, TypeError):
return False
def _group_by_keyword(tasks, all_tasks): def _group_by_keyword(tasks, all_tasks):
@ -245,7 +135,8 @@ def _group_by_keyword(tasks, all_tasks):
url = task.custom_fields.get("IMSURL", "") or "" url = task.custom_fields.get("IMSURL", "") or ""
url = str(url).strip() url = str(url).strip()
if not url: if not url:
url = "https://seotoollab.com/blank.html" alerts.append(f"Task '{task.name}' (id={task.id}) missing IMSURL field")
continue
kw_lower = keyword.lower() kw_lower = keyword.lower()
if kw_lower not in groups: if kw_lower not in groups:
@ -275,8 +166,7 @@ def _group_by_keyword(tasks, all_tasks):
@tool( @tool(
"submit_autocora_jobs", "submit_autocora_jobs",
"Submit Cora SEO report jobs for ClickUp tasks. Uses a multi-pass sweep " "Submit Cora SEO report jobs for ClickUp tasks due on a given date. "
"(today, overdue, last month, look-ahead) unless a specific date is given. "
"Writes job JSON files to the AutoCora shared folder queue.", "Writes job JSON files to the AutoCora shared folder queue.",
category="autocora", category="autocora",
) )
@ -284,36 +174,38 @@ def submit_autocora_jobs(target_date: str = "", ctx: dict | None = None) -> str:
"""Submit AutoCora jobs for qualifying ClickUp tasks. """Submit AutoCora jobs for qualifying ClickUp tasks.
Args: Args:
target_date: Date to check (YYYY-MM-DD). Empty = multi-pass sweep. target_date: Date to check (YYYY-MM-DD). Defaults to today.
ctx: Injected context with config, db, etc. ctx: Injected context with config, db, etc.
""" """
if not ctx: if not ctx:
return "Error: context not available" return "Error: context not available"
config = ctx["config"] config = ctx["config"]
db = ctx["db"]
autocora = config.autocora autocora = config.autocora
if not autocora.enabled: if not autocora.enabled:
return "AutoCora is disabled in config." return "AutoCora is disabled in config."
if not target_date:
target_date = datetime.now(UTC).strftime("%Y-%m-%d")
if not config.clickup.api_token: if not config.clickup.api_token:
return "Error: ClickUp API token not configured" return "Error: ClickUp API token not configured"
client = _get_clickup_client(ctx) client = _get_clickup_client(ctx)
# Find qualifying tasks — sweep or single-day # Find qualifying tasks (due on target_date, in cora_categories, status "to do")
if target_date: qualifying = _find_qualifying_tasks(client, config, target_date, autocora.cora_categories)
qualifying = _find_qualifying_tasks(client, config, target_date, autocora.cora_categories)
label = target_date
else:
qualifying = _find_qualifying_tasks_sweep(client, config, autocora.cora_categories)
label = "sweep"
if not qualifying: if not qualifying:
return f"No qualifying tasks found ({label})." return f"No qualifying tasks found for {target_date}."
# Group by keyword — only siblings that also passed the sweep qualify # Find ALL to-do tasks in cora categories for sibling keyword matching
groups, alerts = _group_by_keyword(qualifying, qualifying) all_todo = _find_all_todo_tasks(client, config, autocora.cora_categories)
# Group by keyword
groups, alerts = _group_by_keyword(qualifying, all_todo)
if not groups and alerts: if not groups and alerts:
return "No jobs submitted.\n\n" + "\n".join(f"- {a}" for a in alerts) return "No jobs submitted.\n\n" + "\n".join(f"- {a}" for a in alerts)
@ -326,13 +218,19 @@ def submit_autocora_jobs(target_date: str = "", ctx: dict | None = None) -> str:
skipped = [] skipped = []
for kw_lower, group in groups.items(): for kw_lower, group in groups.items():
# Check if a job file already exists for this keyword (dedup by file) # Check KV for existing submission
existing_jobs = list(jobs_dir.glob(f"job-*-{_slugify(group['keyword'])}*.json")) kv_key = f"autocora:job:{kw_lower}"
if existing_jobs: existing = db.kv_get(kv_key)
skipped.append(group["keyword"]) if existing:
continue try:
state = json.loads(existing)
if state.get("status") == "submitted":
skipped.append(group["keyword"])
continue
except json.JSONDecodeError:
pass
# Write job file (contains task_ids for the result poller) # Write job file
job_id = _make_job_id(group["keyword"]) job_id = _make_job_id(group["keyword"])
job_data = { job_data = {
"keyword": group["keyword"], "keyword": group["keyword"],
@ -342,21 +240,28 @@ def submit_autocora_jobs(target_date: str = "", ctx: dict | None = None) -> str:
job_path = jobs_dir / f"{job_id}.json" job_path = jobs_dir / f"{job_id}.json"
job_path.write_text(json.dumps(job_data, indent=2), encoding="utf-8") job_path.write_text(json.dumps(job_data, indent=2), encoding="utf-8")
# Move ClickUp tasks to "automation underway" # Track in KV
for tid in group["task_ids"]: kv_state = {
client.update_task_status(tid, "automation underway") "status": "submitted",
"job_id": job_id,
"keyword": group["keyword"],
"url": group["url"],
"task_ids": group["task_ids"],
"submitted_at": datetime.now(UTC).isoformat(),
}
db.kv_set(kv_key, json.dumps(kv_state))
submitted.append(group["keyword"]) submitted.append(group["keyword"])
log.info("Submitted AutoCora job: %s -> %s", group["keyword"], job_id) log.info("Submitted AutoCora job: %s %s", group["keyword"], job_id)
# Build response # Build response
lines = [f"AutoCora submission ({label}):"] lines = [f"AutoCora submission for {target_date}:"]
if submitted: if submitted:
lines.append(f"\nSubmitted {len(submitted)} job(s):") lines.append(f"\nSubmitted {len(submitted)} job(s):")
for kw in submitted: for kw in submitted:
lines.append(f" - {kw}") lines.append(f" - {kw}")
if skipped: if skipped:
lines.append(f"\nSkipped {len(skipped)} (job file already exists):") lines.append(f"\nSkipped {len(skipped)} (already submitted):")
for kw in skipped: for kw in skipped:
lines.append(f" - {kw}") lines.append(f" - {kw}")
if alerts: if alerts:
@ -370,61 +275,93 @@ def submit_autocora_jobs(target_date: str = "", ctx: dict | None = None) -> str:
@tool( @tool(
"poll_autocora_results", "poll_autocora_results",
"Poll the AutoCora results folder for completed Cora SEO report jobs. " "Poll the AutoCora results folder for completed Cora SEO report jobs. "
"Scans for .result files, reads task_ids from the JSON, updates ClickUp, " "Updates ClickUp task statuses based on results.",
"then moves the result file to a processed/ subfolder.",
category="autocora", category="autocora",
) )
def poll_autocora_results(ctx: dict | None = None) -> str: def poll_autocora_results(ctx: dict | None = None) -> str:
"""Poll for AutoCora results and update ClickUp tasks. """Poll for AutoCora results and update ClickUp tasks.
Scans the results folder for .result files. Each result file is JSON Args:
containing {status, task_ids, keyword, ...}. After processing, the ctx: Injected context with config, db, etc.
result file is moved to results/processed/ to avoid re-processing.
""" """
if not ctx: if not ctx:
return "Error: context not available" return "Error: context not available"
config = ctx["config"] config = ctx["config"]
db = ctx["db"]
autocora = config.autocora autocora = config.autocora
if not autocora.enabled: if not autocora.enabled:
return "AutoCora is disabled in config." return "AutoCora is disabled in config."
# Find all submitted jobs in KV
kv_entries = db.kv_scan("autocora:job:")
submitted = []
for key, value in kv_entries:
try:
state = json.loads(value)
if state.get("status") == "submitted":
submitted.append((key, state))
except json.JSONDecodeError:
continue
if not submitted:
return "No pending AutoCora jobs to check."
results_dir = Path(autocora.results_dir) results_dir = Path(autocora.results_dir)
if not results_dir.exists(): if not results_dir.exists():
return f"Results directory does not exist: {results_dir}" return f"Results directory does not exist: {results_dir}"
# Scan for .result files
result_files = list(results_dir.glob("*.result"))
if not result_files:
return "No result files found in results folder."
client = None client = None
if config.clickup.api_token: if config.clickup.api_token:
client = _get_clickup_client(ctx) client = _get_clickup_client(ctx)
processed_dir = results_dir / "processed"
processed = [] processed = []
still_pending = []
for result_path in result_files: for kv_key, state in submitted:
job_id = state.get("job_id", "")
if not job_id:
continue
result_path = results_dir / f"{job_id}.result"
if not result_path.exists():
still_pending.append(state.get("keyword", job_id))
continue
# Read and parse result
raw = result_path.read_text(encoding="utf-8").strip() raw = result_path.read_text(encoding="utf-8").strip()
result_data = _parse_result(raw) result_data = _parse_result(raw)
task_ids = result_data.get("task_ids", []) # Get task_ids: prefer result file, fall back to KV
task_ids = result_data.get("task_ids") or state.get("task_ids", [])
status = result_data.get("status", "UNKNOWN") status = result_data.get("status", "UNKNOWN")
keyword = result_data.get("keyword", result_path.stem) keyword = state.get("keyword", "")
if status == "SUCCESS": if status == "SUCCESS":
# Update KV
state["status"] = "completed"
state["completed_at"] = datetime.now(UTC).isoformat()
db.kv_set(kv_key, json.dumps(state))
# Update ClickUp tasks
if client and task_ids: if client and task_ids:
for tid in task_ids: for tid in task_ids:
client.update_task_status(tid, autocora.success_status) client.update_task_status(tid, autocora.success_status)
client.add_comment(tid, f"Cora report generated for \"{keyword}\" — ready for you to look at it.") client.add_comment(tid, f"Cora report completed for keyword: {keyword}")
processed.append(f"SUCCESS: {keyword}") processed.append(f"SUCCESS: {keyword}")
log.info("AutoCora SUCCESS: %s", keyword) log.info("AutoCora SUCCESS: %s", keyword)
elif status == "FAILURE": elif status == "FAILURE":
reason = result_data.get("reason", "unknown error") reason = result_data.get("reason", "unknown error")
state["status"] = "failed"
state["error"] = reason
state["completed_at"] = datetime.now(UTC).isoformat()
db.kv_set(kv_key, json.dumps(state))
# Update ClickUp tasks
if client and task_ids: if client and task_ids:
for tid in task_ids: for tid in task_ids:
client.update_task_status(tid, autocora.error_status) client.update_task_status(tid, autocora.error_status)
@ -438,19 +375,16 @@ def poll_autocora_results(ctx: dict | None = None) -> str:
else: else:
processed.append(f"UNKNOWN: {keyword} (status={status})") processed.append(f"UNKNOWN: {keyword} (status={status})")
# Move result file to processed/ so it's not re-processed
processed_dir.mkdir(exist_ok=True)
try:
result_path.rename(processed_dir / result_path.name)
except OSError as e:
log.warning("Could not move result file %s: %s", result_path.name, e)
# Build response # Build response
lines = ["AutoCora poll results:"] lines = ["AutoCora poll results:"]
if processed: if processed:
lines.append(f"\nProcessed {len(processed)} result(s):") lines.append(f"\nProcessed {len(processed)} result(s):")
for p in processed: for p in processed:
lines.append(f" - {p}") lines.append(f" - {p}")
if still_pending:
lines.append(f"\nStill pending ({len(still_pending)}):")
for kw in still_pending:
lines.append(f" - {kw}")
return "\n".join(lines) return "\n".join(lines)

View File

@ -1,9 +1,9 @@
"""ClickUp chat-facing tools for listing, querying, and resetting tasks.""" """ClickUp chat-facing tools for listing, approving, and declining tasks."""
from __future__ import annotations from __future__ import annotations
import json
import logging import logging
from datetime import UTC, datetime
from . import tool from . import tool
@ -24,6 +24,22 @@ def _get_clickup_client(ctx: dict):
) )
def _get_clickup_states(db) -> dict[str, dict]:
"""Load all tracked ClickUp task states from kv_store."""
pairs = db.kv_scan("clickup:task:")
states = {}
for key, value in pairs:
# keys look like clickup:task:{id}:state
parts = key.split(":")
if len(parts) == 4 and parts[3] == "state":
task_id = parts[2]
try: # noqa: SIM105
states[task_id] = json.loads(value)
except json.JSONDecodeError:
pass
return states
@tool( @tool(
"clickup_query_tasks", "clickup_query_tasks",
"Query ClickUp live for tasks. Optionally filter by status (e.g. 'to do', 'in progress') " "Query ClickUp live for tasks. Optionally filter by status (e.g. 'to do', 'in progress') "
@ -78,286 +94,112 @@ def clickup_query_tasks(status: str = "", task_type: str = "", ctx: dict | None
@tool( @tool(
"clickup_list_tasks", "clickup_list_tasks",
"List ClickUp tasks in automation-related statuses (automation underway, " "List ClickUp tasks that Cheddah is tracking. Optionally filter by internal state "
"outline review, internal review, error). Shows tasks currently being processed.", "(executing, completed, failed).",
category="clickup", category="clickup",
) )
def clickup_list_tasks(status: str = "", ctx: dict | None = None) -> str: def clickup_list_tasks(status: str = "", ctx: dict | None = None) -> str:
"""List ClickUp tasks in automation-related statuses.""" """List tracked ClickUp tasks, optionally filtered by state."""
client = _get_clickup_client(ctx) db = ctx["db"]
if not client: states = _get_clickup_states(db)
return "Error: ClickUp API token not configured."
cfg = ctx["config"].clickup if not states:
if not cfg.space_id: return "No ClickUp tasks are currently being tracked."
return "Error: ClickUp space_id not configured."
# Query tasks in automation-related statuses
automation_statuses = [
cfg.automation_status,
"outline review",
cfg.review_status,
cfg.error_status,
]
if status: if status:
automation_statuses = [status] states = {tid: s for tid, s in states.items() if s.get("state") == status}
if not states:
try: return f"No ClickUp tasks with state '{status}'."
tasks = client.get_tasks_from_space(cfg.space_id, statuses=automation_statuses)
except Exception as e:
return f"Error querying ClickUp: {e}"
finally:
client.close()
if not tasks:
filter_note = f" with status '{status}'" if status else " in automation statuses"
return f"No tasks found{filter_note}."
lines = [] lines = []
for t in tasks: for task_id, state in sorted(states.items(), key=lambda x: x[1].get("discovered_at", "")):
parts = [f"**{t.name}** (ID: {t.id})"] name = state.get("clickup_task_name", "Unknown")
parts.append(f" Status: {t.status} | Type: {t.task_type or ''}") task_type = state.get("task_type", "")
fields = {k: v for k, v in t.custom_fields.items() if v} task_state = state.get("state", "unknown")
if fields: skill = state.get("skill_name", "")
field_strs = [f"{k}: {v}" for k, v in fields.items()] lines.append(
parts.append(f" Fields: {', '.join(field_strs)}") f"• **{name}** (ID: {task_id})\n"
lines.append("\n".join(parts)) f" Type: {task_type} | State: {task_state} | Skill: {skill}"
)
return f"**Automation Tasks ({len(lines)}):**\n\n" + "\n\n".join(lines) return f"**Tracked ClickUp Tasks ({len(lines)}):**\n\n" + "\n\n".join(lines)
@tool( @tool(
"clickup_task_status", "clickup_task_status",
"Check the current status and details of a ClickUp task by its ID.", "Check the detailed internal processing state of a ClickUp task by its ID.",
category="clickup", category="clickup",
) )
def clickup_task_status(task_id: str, ctx: dict | None = None) -> str: def clickup_task_status(task_id: str, ctx: dict | None = None) -> str:
"""Get current status for a specific ClickUp task from the API.""" """Get detailed state for a specific tracked task."""
client = _get_clickup_client(ctx) db = ctx["db"]
if not client: raw = db.kv_get(f"clickup:task:{task_id}:state")
return "Error: ClickUp API token not configured." if not raw:
return f"No tracked state found for task ID '{task_id}'."
try: try:
task = client.get_task(task_id) state = json.loads(raw)
except Exception as e: except json.JSONDecodeError:
return f"Error fetching task '{task_id}': {e}" return f"Corrupted state data for task '{task_id}'."
finally:
client.close()
lines = [f"**Task: {task.name}** (ID: {task.id})"] lines = [f"**Task: {state.get('clickup_task_name', 'Unknown')}** (ID: {task_id})"]
lines.append(f"Status: {task.status}") lines.append(f"State: {state.get('state', 'unknown')}")
lines.append(f"Type: {task.task_type or ''}") lines.append(f"Task Type: {state.get('task_type', '')}")
if task.url: lines.append(f"Mapped Skill: {state.get('skill_name', '')}")
lines.append(f"URL: {task.url}") lines.append(f"Discovered: {state.get('discovered_at', '')}")
if task.due_date: if state.get("started_at"):
lines.append(f"Due: {task.due_date}") lines.append(f"Started: {state['started_at']}")
if task.date_updated: if state.get("completed_at"):
lines.append(f"Updated: {task.date_updated}") lines.append(f"Completed: {state['completed_at']}")
fields = {k: v for k, v in task.custom_fields.items() if v} if state.get("error"):
if fields: lines.append(f"Error: {state['error']}")
field_strs = [f"{k}: {v}" for k, v in fields.items()] if state.get("deliverable_paths"):
lines.append(f"Fields: {', '.join(field_strs)}") lines.append(f"Deliverables: {', '.join(state['deliverable_paths'])}")
if state.get("custom_fields"):
fields_str = ", ".join(f"{k}: {v}" for k, v in state["custom_fields"].items() if v)
if fields_str:
lines.append(f"Custom Fields: {fields_str}")
return "\n".join(lines) return "\n".join(lines)
@tool(
"clickup_create_task",
"Create a new ClickUp task for a client. Requires task name and client name. "
"Optionally set work category, description, status, due_date (Unix ms), "
"tags (comma-separated), and arbitrary custom fields via custom_fields_json "
'(JSON object like {"Keyword":"value","CLIFlags":"--tier1-count 5"}). '
"The task is created in the 'Overall' list within the client's folder.",
category="clickup",
)
def clickup_create_task(
name: str,
client: str,
work_category: str = "",
description: str = "",
status: str = "to do",
due_date: str = "",
tags: str = "",
custom_fields_json: str = "",
priority: int = 2,
assignee: int = 10765627,
time_estimate_ms: int = 0,
ctx: dict | None = None,
) -> str:
"""Create a new ClickUp task in the client's Overall list."""
import json as _json
client_obj = _get_clickup_client(ctx)
if not client_obj:
return "Error: ClickUp API token not configured."
cfg = ctx["config"].clickup
if not cfg.space_id:
return "Error: ClickUp space_id not configured."
try:
# Find the client's Overall list
list_id = client_obj.find_list_in_folder(cfg.space_id, client)
if not list_id:
return (
f"Error: Could not find folder '{client}' "
f"with an 'Overall' list in space."
)
# Build create kwargs
create_kwargs: dict = {
"list_id": list_id,
"name": name,
"description": description,
"status": status,
"priority": priority,
"assignees": [assignee],
}
if due_date:
create_kwargs["due_date"] = int(due_date)
if tags:
create_kwargs["tags"] = [t.strip() for t in tags.split(",")]
if time_estimate_ms:
create_kwargs["time_estimate"] = time_estimate_ms
# Create the task
result = client_obj.create_task(**create_kwargs)
task_id = result.get("id", "")
task_url = result.get("url", "")
# Set Client dropdown field
client_obj.set_custom_field_smart(task_id, list_id, "Client", client)
# Set Work Category if provided
if work_category:
client_obj.set_custom_field_smart(
task_id, list_id, "Work Category", work_category
)
# Set any additional custom fields
if custom_fields_json:
extra_fields = _json.loads(custom_fields_json)
for field_name, field_value in extra_fields.items():
client_obj.set_custom_field_smart(
task_id, list_id, field_name, str(field_value)
)
return (
f"Task created successfully!\n"
f" Name: {name}\n"
f" Client: {client}\n"
f" ID: {task_id}\n"
f" URL: {task_url}"
)
except Exception as e:
return f"Error creating task: {e}"
finally:
client_obj.close()
@tool( @tool(
"clickup_reset_task", "clickup_reset_task",
"Reset a ClickUp task to 'to do' status so it can be retried on the next poll. " "Reset a ClickUp task's internal tracking state so it can be retried on the next poll. "
"Use this when a task is stuck in an error or automation state.", "Use this when a task has failed or completed and you want to re-run it.",
category="clickup", category="clickup",
) )
def clickup_reset_task(task_id: str, ctx: dict | None = None) -> str: def clickup_reset_task(task_id: str, ctx: dict | None = None) -> str:
"""Reset a ClickUp task status to 'to do' for retry.""" """Delete the kv_store state for a single task so it can be retried."""
client = _get_clickup_client(ctx) db = ctx["db"]
if not client: key = f"clickup:task:{task_id}:state"
return "Error: ClickUp API token not configured." raw = db.kv_get(key)
if not raw:
return f"No tracked state found for task ID '{task_id}'. Nothing to reset."
cfg = ctx["config"].clickup db.kv_delete(key)
reset_status = cfg.poll_statuses[0] if cfg.poll_statuses else "to do" return f"Task '{task_id}' state cleared. It will be picked up on the next scheduler poll."
try:
client.update_task_status(task_id, reset_status)
client.add_comment(
task_id,
f"Task reset to '{reset_status}' via chat command.",
)
except Exception as e:
return f"Error resetting task '{task_id}': {e}"
finally:
client.close()
return (
f"Task '{task_id}' reset to '{reset_status}'. "
f"It will be picked up on the next scheduler poll."
)
def _format_duration(delta) -> str:
"""Format a timedelta as a human-readable duration string."""
total_seconds = int(delta.total_seconds())
hours, remainder = divmod(total_seconds, 3600)
minutes, seconds = divmod(remainder, 60)
if hours:
return f"{hours}h {minutes}m {seconds}s"
if minutes:
return f"{minutes}m {seconds}s"
return f"{seconds}s"
def _format_ago(iso_str: str | None) -> str:
"""Format an ISO timestamp as 'Xm ago' relative to now."""
if not iso_str:
return "never"
try:
ts = datetime.fromisoformat(iso_str)
delta = datetime.now(UTC) - ts
total_seconds = int(delta.total_seconds())
if total_seconds < 60:
return f"{total_seconds}s ago"
minutes = total_seconds // 60
if minutes < 60:
return f"{minutes}m ago"
hours = minutes // 60
return f"{hours}h {minutes % 60}m ago"
except (ValueError, TypeError):
return "unknown"
@tool( @tool(
"get_active_tasks", "clickup_reset_all",
"Show what CheddahBot is actively executing right now. " "Clear ALL internal ClickUp task tracking state. Use this to wipe the slate clean "
"Reports running tasks, loop health, and whether it's safe to restart.", "so all eligible tasks can be retried on the next poll cycle.",
category="clickup", category="clickup",
) )
def get_active_tasks(ctx: dict | None = None) -> str: def clickup_reset_all(ctx: dict | None = None) -> str:
"""Show actively running scheduler tasks and loop health.""" """Delete all clickup task states and legacy active_ids from kv_store."""
scheduler = ctx.get("scheduler") if ctx else None db = ctx["db"]
if not scheduler: states = _get_clickup_states(db)
return "Scheduler not available — cannot check active executions." count = 0
for task_id in states:
db.kv_delete(f"clickup:task:{task_id}:state")
count += 1
now = datetime.now(UTC) # Also clean up legacy active_ids key
lines = [] if db.kv_get("clickup:active_task_ids"):
db.kv_delete("clickup:active_task_ids")
# Active executions return (
active = scheduler.get_active_executions() f"Cleared {count} task state(s) from tracking. Next poll will re-discover eligible tasks."
if active: )
lines.append(f"**Active Executions ({len(active)}):**")
for task_id, info in active.items():
duration = _format_duration(now - info["started_at"])
lines.append(
f"- **{info['name']}** — `{info['tool']}` — "
f"running {duration} ({info['thread']} thread)"
)
else:
lines.append("**No tasks actively executing.**")
# Loop health
timestamps = scheduler.get_loop_timestamps()
lines.append("")
lines.append("**Loop Health:**")
for loop_name, ts in timestamps.items():
lines.append(f"- {loop_name}: last ran {_format_ago(ts)}")
# Safe to restart?
lines.append("")
if active:
lines.append(f"**Safe to restart: No** ({len(active)} task(s) actively running)")
else:
lines.append("**Safe to restart: Yes**")
return "\n".join(lines)

File diff suppressed because it is too large Load Diff

View File

@ -6,11 +6,12 @@ Primary workflow: ingest CORA .xlsx → generate content batch.
from __future__ import annotations from __future__ import annotations
import json
import logging import logging
import os import os
import re import re
import subprocess import subprocess
from collections.abc import Callable from datetime import UTC, datetime
from pathlib import Path from pathlib import Path
from . import tool from . import tool
@ -30,13 +31,6 @@ def _get_blm_dir(ctx: dict | None) -> str:
return os.getenv("BLM_DIR", "E:/dev/Big-Link-Man") return os.getenv("BLM_DIR", "E:/dev/Big-Link-Man")
def _get_blm_timeout(ctx: dict | None) -> int:
"""Get BLM subprocess timeout from config or default (1800s / 30 min)."""
if ctx and "config" in ctx:
return ctx["config"].timeouts.blm
return 1800
def _run_blm_command( def _run_blm_command(
args: list[str], blm_dir: str, timeout: int = 1800 args: list[str], blm_dir: str, timeout: int = 1800
) -> subprocess.CompletedProcess: ) -> subprocess.CompletedProcess:
@ -169,9 +163,9 @@ def _parse_generate_output(stdout: str) -> dict:
def _set_status(ctx: dict | None, message: str) -> None: def _set_status(ctx: dict | None, message: str) -> None:
"""Log pipeline progress. Previously wrote to KV; now just logs.""" """Write pipeline progress to KV store for UI polling."""
if message: if ctx and "db" in ctx:
log.info("[LB Pipeline] %s", message) ctx["db"].kv_set("linkbuilding:status", message)
def _get_clickup_client(ctx: dict | None): def _get_clickup_client(ctx: dict | None):
@ -193,10 +187,25 @@ def _get_clickup_client(ctx: dict | None):
def _sync_clickup(ctx: dict | None, task_id: str, step: str, message: str) -> None: def _sync_clickup(ctx: dict | None, task_id: str, step: str, message: str) -> None:
"""Post a progress comment to ClickUp.""" """Post a comment to ClickUp and update KV state."""
if not task_id or not ctx: if not task_id or not ctx:
return return
# Update KV store
db = ctx.get("db")
if db:
kv_key = f"clickup:task:{task_id}:state"
raw = db.kv_get(kv_key)
if raw:
try:
state = json.loads(raw)
state["last_step"] = step
state["last_message"] = message
db.kv_set(kv_key, json.dumps(state))
except json.JSONDecodeError:
pass
# Post comment to ClickUp
cu_client = _get_clickup_client(ctx) cu_client = _get_clickup_client(ctx)
if cu_client: if cu_client:
try: try:
@ -245,8 +254,26 @@ def _find_clickup_task(ctx: dict, keyword: str) -> str:
continue continue
if _fuzzy_keyword_match(keyword_norm, _normalize_for_match(str(task_keyword))): if _fuzzy_keyword_match(keyword_norm, _normalize_for_match(str(task_keyword))):
# Found a match — move to "automation underway" # Found a match — create executing state
task_id = task.id task_id = task.id
now = datetime.now(UTC).isoformat()
state = {
"state": "executing",
"clickup_task_id": task_id,
"clickup_task_name": task.name,
"task_type": task.task_type,
"skill_name": "run_link_building",
"discovered_at": now,
"started_at": now,
"completed_at": None,
"error": None,
"deliverable_paths": [],
"custom_fields": task.custom_fields,
}
db = ctx.get("db")
if db:
db.kv_set(f"clickup:task:{task_id}:state", json.dumps(state))
# Move to "automation underway" # Move to "automation underway"
cu_client2 = _get_clickup_client(ctx) cu_client2 = _get_clickup_client(ctx)
@ -272,24 +299,30 @@ def _normalize_for_match(text: str) -> str:
return text return text
def _fuzzy_keyword_match(a: str, b: str, llm_check: Callable[[str, str], bool] | None = None) -> bool: def _fuzzy_keyword_match(a: str, b: str) -> bool:
"""Check if two normalized strings match, allowing singular/plural differences. """Check if two normalized strings are a fuzzy match.
Fast path: exact match after normalization. Matches if: exact, substring in either direction, or >80% word overlap.
Slow path: ask an LLM if the two keywords are the same aside from plural form.
Falls back to False if no llm_check is provided and strings differ.
""" """
if not a or not b: if not a or not b:
return False return False
if a == b: if a == b:
return True return True
if llm_check is None: if a in b or b in a:
return True
# Word overlap check
words_a = set(a.split())
words_b = set(b.split())
if not words_a or not words_b:
return False return False
return llm_check(a, b) overlap = len(words_a & words_b)
min_len = min(len(words_a), len(words_b))
return overlap / min_len >= 0.8 if min_len > 0 else False
def _complete_clickup_task(ctx: dict | None, task_id: str, message: str, status: str = "") -> None: def _complete_clickup_task(ctx: dict | None, task_id: str, message: str, status: str = "") -> None:
"""Mark a ClickUp task as completed.""" """Mark a ClickUp task as completed and update KV state."""
if not task_id or not ctx: if not task_id or not ctx:
return return
@ -298,6 +331,19 @@ def _complete_clickup_task(ctx: dict | None, task_id: str, message: str, status:
lb_map = skill_map.get("Link Building", {}) lb_map = skill_map.get("Link Building", {})
complete_status = status or lb_map.get("complete_status", "complete") complete_status = status or lb_map.get("complete_status", "complete")
db = ctx.get("db")
if db:
kv_key = f"clickup:task:{task_id}:state"
raw = db.kv_get(kv_key)
if raw:
try:
state = json.loads(raw)
state["state"] = "completed"
state["completed_at"] = datetime.now(UTC).isoformat()
db.kv_set(kv_key, json.dumps(state))
except json.JSONDecodeError:
pass
cu_client = _get_clickup_client(ctx) cu_client = _get_clickup_client(ctx)
if cu_client: if cu_client:
try: try:
@ -310,19 +356,33 @@ def _complete_clickup_task(ctx: dict | None, task_id: str, message: str, status:
def _fail_clickup_task(ctx: dict | None, task_id: str, error_msg: str) -> None: def _fail_clickup_task(ctx: dict | None, task_id: str, error_msg: str) -> None:
"""Mark a ClickUp task as failed.""" """Mark a ClickUp task as failed and update KV state."""
if not task_id or not ctx: if not task_id or not ctx:
return return
config = ctx.get("config") config = ctx.get("config")
error_status = config.clickup.error_status if config else "error" error_status = config.clickup.error_status if config else "error"
db = ctx.get("db")
if db:
kv_key = f"clickup:task:{task_id}:state"
raw = db.kv_get(kv_key)
if raw:
try:
state = json.loads(raw)
state["state"] = "failed"
state["error"] = error_msg
state["completed_at"] = datetime.now(UTC).isoformat()
db.kv_set(kv_key, json.dumps(state))
except json.JSONDecodeError:
pass
cu_client = _get_clickup_client(ctx) cu_client = _get_clickup_client(ctx)
if cu_client: if cu_client:
try: try:
cu_client.add_comment( cu_client.add_comment(
task_id, task_id,
f"[FAILED]Link building pipeline failed.\n\nError: {error_msg[:2000]}", f"Link building pipeline failed.\n\nError: {error_msg[:2000]}",
) )
cu_client.update_task_status(task_id, error_status) cu_client.update_task_status(task_id, error_status)
except Exception as e: except Exception as e:
@ -436,7 +496,7 @@ def run_cora_backlinks(
# ── Step 1: ingest-cora ── # ── Step 1: ingest-cora ──
_set_status(ctx, f"Step 1/2: Ingesting CORA report for {project_name}...") _set_status(ctx, f"Step 1/2: Ingesting CORA report for {project_name}...")
if clickup_task_id: if clickup_task_id:
_sync_clickup(ctx, clickup_task_id, "ingest", "[STARTED]Starting Cora Backlinks pipeline...") _sync_clickup(ctx, clickup_task_id, "ingest", "🔄 Starting Cora Backlinks pipeline...")
# Convert branded_plus_ratio from string if needed # Convert branded_plus_ratio from string if needed
try: try:
@ -453,11 +513,10 @@ def run_cora_backlinks(
cli_flags=cli_flags, cli_flags=cli_flags,
) )
blm_timeout = _get_blm_timeout(ctx)
try: try:
ingest_result = _run_blm_command(ingest_args, blm_dir, timeout=blm_timeout) ingest_result = _run_blm_command(ingest_args, blm_dir)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
error = f"ingest-cora timed out after {blm_timeout // 60} minutes" error = "ingest-cora timed out after 30 minutes"
_set_status(ctx, "") _set_status(ctx, "")
if clickup_task_id: if clickup_task_id:
_fail_clickup_task(ctx, clickup_task_id, error) _fail_clickup_task(ctx, clickup_task_id, error)
@ -490,7 +549,7 @@ def run_cora_backlinks(
ctx, ctx,
clickup_task_id, clickup_task_id,
"ingest_done", "ingest_done",
f"[DONE]CORA report ingested. Project ID: {project_id}. Job file: {job_file}", f"CORA report ingested. Project ID: {project_id}. Job file: {job_file}",
) )
# ── Step 2: generate-batch ── # ── Step 2: generate-batch ──
@ -502,9 +561,9 @@ def run_cora_backlinks(
gen_args = ["generate-batch", "-j", str(job_path), "--continue-on-error"] gen_args = ["generate-batch", "-j", str(job_path), "--continue-on-error"]
try: try:
gen_result = _run_blm_command(gen_args, blm_dir, timeout=blm_timeout) gen_result = _run_blm_command(gen_args, blm_dir)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
error = f"generate-batch timed out after {blm_timeout // 60} minutes" error = "generate-batch timed out after 30 minutes"
_set_status(ctx, "") _set_status(ctx, "")
if clickup_task_id: if clickup_task_id:
_fail_clickup_task(ctx, clickup_task_id, error) _fail_clickup_task(ctx, clickup_task_id, error)
@ -534,7 +593,7 @@ def run_cora_backlinks(
if clickup_task_id: if clickup_task_id:
summary = ( summary = (
f"[DONE]Cora Backlinks pipeline completed for {project_name}.\n\n" f"Cora Backlinks pipeline completed for {project_name}.\n\n"
f"Project ID: {project_id}\n" f"Project ID: {project_id}\n"
f"Keyword: {ingest_parsed['main_keyword']}\n" f"Keyword: {ingest_parsed['main_keyword']}\n"
f"Job file: {gen_parsed['job_moved_to'] or job_file}" f"Job file: {gen_parsed['job_moved_to'] or job_file}"
@ -592,11 +651,10 @@ def blm_ingest_cora(
cli_flags=cli_flags, cli_flags=cli_flags,
) )
blm_timeout = _get_blm_timeout(ctx)
try: try:
result = _run_blm_command(ingest_args, blm_dir, timeout=blm_timeout) result = _run_blm_command(ingest_args, blm_dir)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
return f"Error: ingest-cora timed out after {blm_timeout // 60} minutes." return "Error: ingest-cora timed out after 30 minutes."
parsed = _parse_ingest_output(result.stdout) parsed = _parse_ingest_output(result.stdout)
@ -647,11 +705,10 @@ def blm_generate_batch(
if debug: if debug:
args.append("--debug") args.append("--debug")
blm_timeout = _get_blm_timeout(ctx)
try: try:
result = _run_blm_command(args, blm_dir, timeout=blm_timeout) result = _run_blm_command(args, blm_dir)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
return f"Error: generate-batch timed out after {blm_timeout // 60} minutes." return "Error: generate-batch timed out after 30 minutes."
parsed = _parse_generate_output(result.stdout) parsed = _parse_generate_output(result.stdout)
@ -692,6 +749,7 @@ def scan_cora_folder(ctx: dict | None = None) -> str:
if not watch_path.exists(): if not watch_path.exists():
return f"Watch folder does not exist: {watch_folder}" return f"Watch folder does not exist: {watch_folder}"
db = ctx.get("db")
xlsx_files = sorted(watch_path.glob("*.xlsx")) xlsx_files = sorted(watch_path.glob("*.xlsx"))
if not xlsx_files: if not xlsx_files:
@ -699,16 +757,18 @@ def scan_cora_folder(ctx: dict | None = None) -> str:
lines = [f"## Cora Inbox: {watch_folder}\n"] lines = [f"## Cora Inbox: {watch_folder}\n"]
processed_dir = watch_path / "processed"
processed_names = set()
if processed_dir.exists():
processed_names = {f.name for f in processed_dir.glob("*.xlsx")}
for f in xlsx_files: for f in xlsx_files:
filename = f.name filename = f.name
if filename.startswith("~$"): status = "new"
continue if db:
status = "processed" if filename in processed_names else "new" kv_val = db.kv_get(f"linkbuilding:watched:{filename}")
if kv_val:
try:
watched = json.loads(kv_val)
status = watched.get("status", "unknown")
except json.JSONDecodeError:
status = "tracked"
lines.append(f"- **{filename}** — status: {status}") lines.append(f"- **{filename}** — status: {status}")
# Check processed subfolder # Check processed subfolder

View File

@ -14,7 +14,7 @@ import json
import logging import logging
import re import re
import time import time
from datetime import datetime from datetime import UTC, datetime
from pathlib import Path from pathlib import Path
from ..docx_export import text_to_docx from ..docx_export import text_to_docx
@ -38,9 +38,9 @@ SONNET_CLI_MODEL = "sonnet"
def _set_status(ctx: dict | None, message: str) -> None: def _set_status(ctx: dict | None, message: str) -> None:
"""Log pipeline progress. Previously wrote to KV; now just logs.""" """Write pipeline progress to the DB so the UI can poll it."""
if message: if ctx and "db" in ctx:
log.info("[PR Pipeline] %s", message) ctx["db"].kv_set("pipeline:status", message)
def _fuzzy_company_match(name: str, candidate: str) -> bool: def _fuzzy_company_match(name: str, candidate: str) -> bool:
@ -88,15 +88,33 @@ def _find_clickup_task(ctx: dict, company_name: str) -> str:
if task.task_type != "Press Release": if task.task_type != "Press Release":
continue continue
client_field = task.custom_fields.get("Client", "") client_field = task.custom_fields.get("Customer", "")
if not ( if not (
_fuzzy_company_match(company_name, task.name) _fuzzy_company_match(company_name, task.name)
or _fuzzy_company_match(company_name, client_field) or _fuzzy_company_match(company_name, client_field)
): ):
continue continue
# Found a match — move to "automation underway" on ClickUp # Found a match — create kv_store entry and move to "in progress"
task_id = task.id task_id = task.id
now = datetime.now(UTC).isoformat()
state = {
"state": "executing",
"clickup_task_id": task_id,
"clickup_task_name": task.name,
"task_type": task.task_type,
"skill_name": "write_press_releases",
"discovered_at": now,
"started_at": now,
"completed_at": None,
"error": None,
"deliverable_paths": [],
"custom_fields": task.custom_fields,
}
db = ctx.get("db")
if db:
db.kv_set(f"clickup:task:{task_id}:state", json.dumps(state))
# Move to "automation underway" on ClickUp # Move to "automation underway" on ClickUp
cu_client2 = _get_clickup_client(ctx) cu_client2 = _get_clickup_client(ctx)
@ -236,28 +254,10 @@ def _clean_pr_output(raw: str, headline: str) -> str:
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _is_actual_news(topic: str) -> bool:
"""Detect whether the topic signals genuinely new news.
Returns True if the topic contains explicit markers like 'actual news',
'new product', 'launch', 'acquisition', 'partnership', 'certification',
or 'award'. The user is expected to signal this in the PR Topic field.
"""
signals = [
"actual news", "new product", "launch", "launches",
"acquisition", "partnership", "certification", "award",
"unveil", "unveils", "introduce", "introduces",
]
topic_lower = topic.lower()
return any(s in topic_lower for s in signals)
def _build_headline_prompt( def _build_headline_prompt(
topic: str, company_name: str, url: str, lsi_terms: str, headlines_ref: str topic: str, company_name: str, url: str, lsi_terms: str, headlines_ref: str
) -> str: ) -> str:
"""Build the prompt for Step 1: generate 7 headlines.""" """Build the prompt for Step 1: generate 7 headlines."""
is_news = _is_actual_news(topic)
prompt = ( prompt = (
f"Generate exactly 7 unique press release headline options for the following.\n\n" f"Generate exactly 7 unique press release headline options for the following.\n\n"
f"Topic: {topic}\n" f"Topic: {topic}\n"
@ -272,34 +272,14 @@ def _build_headline_prompt(
"\nRules for EVERY headline:\n" "\nRules for EVERY headline:\n"
"- Maximum 70 characters (including spaces)\n" "- Maximum 70 characters (including spaces)\n"
"- Title case\n" "- Title case\n"
"- News-focused, not promotional\n"
"- NO location/geographic keywords\n" "- NO location/geographic keywords\n"
"- NO superlatives (best, top, leading, #1)\n" "- NO superlatives (best, top, leading, #1)\n"
"- NO questions\n" "- NO questions\n"
"- NO colons — colons are considered lower quality\n" "- NO colons — colons are considered lower quality\n"
"- Must contain an actual news announcement\n"
) )
if is_news:
prompt += (
"\nThis topic is ACTUAL NEWS — a real new event, product, partnership, "
"or achievement. You may use announcement verbs like 'Announces', "
"'Launches', 'Introduces', 'Unveils'.\n"
)
else:
prompt += (
"\nIMPORTANT — AWARENESS FRAMING:\n"
"The company ALREADY offers this product/service/capability. Nothing is "
"new, nothing was just launched, expanded, or achieved. You are writing "
"an awareness piece about existing capabilities framed in news-wire style.\n\n"
"REQUIRED verbs — use these: 'Highlights', 'Reinforces', 'Delivers', "
"'Strengthens', 'Showcases', 'Details', 'Offers', 'Provides'\n\n"
"BANNED — do NOT use any of these:\n"
"- 'Announces', 'Launches', 'Introduces', 'Unveils', 'Expands', "
"'Reveals', 'Announces New'\n"
"- 'Significant expansion', 'major milestone', 'growing demand', "
"'new capabilities', 'celebrates X years'\n"
"- Any language that implies something CHANGED or is NEW when it is not\n"
)
if headlines_ref: if headlines_ref:
prompt += ( prompt += (
"\nHere are examples of high-quality headlines to use as reference " "\nHere are examples of high-quality headlines to use as reference "
@ -314,10 +294,8 @@ def _build_headline_prompt(
return prompt return prompt
def _build_judge_prompt(headlines: str, headlines_ref: str, topic: str = "") -> str: def _build_judge_prompt(headlines: str, headlines_ref: str) -> str:
"""Build the prompt for Step 2: pick the 2 best headlines.""" """Build the prompt for Step 2: pick the 2 best headlines."""
is_news = _is_actual_news(topic)
prompt = ( prompt = (
"You are judging press release headlines for Press Advantage distribution. " "You are judging press release headlines for Press Advantage distribution. "
"Pick the 2 best headlines from the candidates below.\n\n" "Pick the 2 best headlines from the candidates below.\n\n"
@ -327,25 +305,12 @@ def _build_judge_prompt(headlines: str, headlines_ref: str, topic: str = "") ->
"- Contains superlatives (best, top, leading, #1)\n" "- Contains superlatives (best, top, leading, #1)\n"
"- Is a question\n" "- Is a question\n"
"- Exceeds 70 characters\n" "- Exceeds 70 characters\n"
) "- Implies a NEW product launch when none exists (avoid 'launches', "
"'introduces', 'unveils', 'announces new' unless the topic is genuinely new)\n\n"
if is_news:
prompt += (
"- (This topic IS actual news — announcement verbs are acceptable)\n\n"
)
else:
prompt += (
"- Uses 'Announces', 'Launches', 'Introduces', 'Unveils', 'Expands', "
"'Reveals', or 'Announces New' (this is NOT actual news)\n"
"- Implies something CHANGED, is NEW, or was just achieved when it was not "
"(e.g. 'significant expansion', 'major milestone', 'growing demand')\n\n"
)
prompt += (
"PREFER headlines that:\n" "PREFER headlines that:\n"
"- Match the tone and structure of the reference examples below\n" "- Match the tone and structure of the reference examples below\n"
"- Use awareness verbs like 'Highlights', 'Strengthens', " "- Use action verbs like 'Highlights', 'Expands', 'Strengthens', "
"'Reinforces', 'Delivers', 'Showcases', 'Details'\n" "'Reinforces', 'Delivers', 'Adds'\n"
"- Describe what the company DOES or OFFERS, not what it just invented\n" "- Describe what the company DOES or OFFERS, not what it just invented\n"
"- Read like a real news wire headline, not a product announcement\n\n" "- Read like a real news wire headline, not a product announcement\n\n"
f"Candidates:\n{headlines}\n\n" f"Candidates:\n{headlines}\n\n"
@ -364,14 +329,16 @@ def _build_judge_prompt(headlines: str, headlines_ref: str, topic: str = "") ->
return prompt return prompt
def _derive_anchor_phrase(company_name: str, keyword: str) -> str: def _derive_anchor_phrase(company_name: str, topic: str) -> str:
"""Derive a 'brand + keyword' anchor phrase from company name and keyword. """Derive a 'brand + keyword' anchor phrase from company name and topic.
Examples: Examples:
("Advanced Industrial", "PEEK machining") -> "Advanced Industrial PEEK machining" ("Advanced Industrial", "PEEK machining") -> "Advanced Industrial PEEK machining"
("Metal Craft", "custom metal fabrication") -> "Metal Craft custom metal fabrication" ("Metal Craft", "custom metal fabrication") -> "Metal Craft custom metal fabrication"
""" """
return f"{company_name} {keyword.strip()}" # Clean up topic: strip leading articles, lowercase
keyword = topic.strip()
return f"{company_name} {keyword}"
def _find_anchor_in_text(text: str, anchor: str) -> bool: def _find_anchor_in_text(text: str, anchor: str) -> bool:
@ -439,8 +406,6 @@ def _build_pr_prompt(
anchor_phrase: str = "", anchor_phrase: str = "",
) -> str: ) -> str:
"""Build the prompt for Step 3: write one full press release.""" """Build the prompt for Step 3: write one full press release."""
is_news = _is_actual_news(topic)
prompt = ( prompt = (
f"{skill_text}\n\n" f"{skill_text}\n\n"
"---\n\n" "---\n\n"
@ -450,25 +415,6 @@ def _build_pr_prompt(
f"Topic: {topic}\n" f"Topic: {topic}\n"
f"Company: {company_name}\n" f"Company: {company_name}\n"
) )
if is_news:
prompt += (
"\nThis is ACTUAL NEWS — a real new event, product, or achievement. "
"You may use announcement language (announced, launched, introduced).\n"
)
else:
prompt += (
"\nAWARENESS FRAMING — CRITICAL:\n"
"The company ALREADY offers this product/service/capability. Nothing new "
"happened. Do NOT write that the company 'announced', 'expanded', 'launched', "
"'achieved a milestone', or 'saw growing demand'. These are LIES if nothing "
"actually changed.\n"
"Instead write about what the company DOES, what it OFFERS, what it PROVIDES. "
"Frame it as drawing attention to existing capabilities — highlighting, "
"reinforcing, detailing, showcasing.\n"
"The first paragraph should describe what the company offers, NOT announce "
"a fictional event.\n"
)
if url: if url:
prompt += f"Reference URL (fetch for context): {url}\n" prompt += f"Reference URL (fetch for context): {url}\n"
if lsi_terms: if lsi_terms:
@ -544,7 +490,6 @@ def write_press_releases(
topic: str, topic: str,
company_name: str, company_name: str,
url: str = "", url: str = "",
keyword: str = "",
lsi_terms: str = "", lsi_terms: str = "",
required_phrase: str = "", required_phrase: str = "",
ctx: dict | None = None, ctx: dict | None = None,
@ -574,7 +519,7 @@ def write_press_releases(
cu_client.update_task_status(clickup_task_id, config.clickup.automation_status) cu_client.update_task_status(clickup_task_id, config.clickup.automation_status)
cu_client.add_comment( cu_client.add_comment(
clickup_task_id, clickup_task_id,
f"[STARTED]CheddahBot starting press release creation.\n\n" f"🔄 CheddahBot starting press release creation.\n\n"
f"Topic: {topic}\nCompany: {company_name}", f"Topic: {topic}\nCompany: {company_name}",
) )
log.info("ClickUp task %s set to automation-underway", clickup_task_id) log.info("ClickUp task %s set to automation-underway", clickup_task_id)
@ -630,7 +575,7 @@ def write_press_releases(
log.info("[PR Pipeline] Step 2/4: AI judge selecting best 2 headlines...") log.info("[PR Pipeline] Step 2/4: AI judge selecting best 2 headlines...")
_set_status(ctx, "Step 2/4: AI judge selecting best 2 headlines...") _set_status(ctx, "Step 2/4: AI judge selecting best 2 headlines...")
step_start = time.time() step_start = time.time()
judge_prompt = _build_judge_prompt(headlines_raw, headlines_ref, topic) judge_prompt = _build_judge_prompt(headlines_raw, headlines_ref)
messages = [ messages = [
{"role": "system", "content": "You are a senior PR editor."}, {"role": "system", "content": "You are a senior PR editor."},
{"role": "user", "content": judge_prompt}, {"role": "user", "content": judge_prompt},
@ -667,7 +612,7 @@ def write_press_releases(
# ── Step 3: Write 2 press releases (execution brain x 2) ───────────── # ── Step 3: Write 2 press releases (execution brain x 2) ─────────────
log.info("[PR Pipeline] Step 3/4: Writing 2 press releases...") log.info("[PR Pipeline] Step 3/4: Writing 2 press releases...")
anchor_phrase = _derive_anchor_phrase(company_name, keyword) if keyword else "" anchor_phrase = _derive_anchor_phrase(company_name, topic)
pr_texts: list[str] = [] pr_texts: list[str] = []
pr_files: list[str] = [] pr_files: list[str] = []
docx_files: list[str] = [] docx_files: list[str] = []
@ -707,11 +652,11 @@ def write_press_releases(
if wc < 575 or wc > 800: if wc < 575 or wc > 800:
log.warning("PR %d word count %d outside 575-800 range", i + 1, wc) log.warning("PR %d word count %d outside 575-800 range", i + 1, wc)
# Validate anchor phrase (only when keyword provided) # Validate anchor phrase
if anchor_phrase and _find_anchor_in_text(clean_result, anchor_phrase): if _find_anchor_in_text(clean_result, anchor_phrase):
log.info("PR %d contains anchor phrase '%s'", i + 1, anchor_phrase) log.info("PR %d contains anchor phrase '%s'", i + 1, anchor_phrase)
elif anchor_phrase: else:
fuzzy = _fuzzy_find_anchor(clean_result, company_name, keyword) fuzzy = _fuzzy_find_anchor(clean_result, company_name, topic)
if fuzzy: if fuzzy:
log.info("PR %d: exact anchor not found, fuzzy match: '%s'", i + 1, fuzzy) log.info("PR %d: exact anchor not found, fuzzy match: '%s'", i + 1, fuzzy)
anchor_warnings.append( anchor_warnings.append(
@ -739,27 +684,18 @@ def write_press_releases(
# ── ClickUp: upload docx attachments + comment ───────────────────── # ── ClickUp: upload docx attachments + comment ─────────────────────
uploaded_count = 0 uploaded_count = 0
failed_uploads: list[str] = []
if clickup_task_id and cu_client: if clickup_task_id and cu_client:
try: try:
for path in docx_files: for path in docx_files:
if cu_client.upload_attachment(clickup_task_id, path): if cu_client.upload_attachment(clickup_task_id, path):
uploaded_count += 1 uploaded_count += 1
else: else:
failed_uploads.append(path)
log.warning("ClickUp: failed to upload %s for task %s", path, clickup_task_id) log.warning("ClickUp: failed to upload %s for task %s", path, clickup_task_id)
upload_warning = ""
if failed_uploads:
paths_list = "\n".join(f" - {p}" for p in failed_uploads)
upload_warning = (
f"\n[WARNING]Warning: {len(failed_uploads)} attachment(s) failed to upload. "
f"Files saved locally at:\n{paths_list}"
)
cu_client.add_comment( cu_client.add_comment(
clickup_task_id, clickup_task_id,
f"📎 Saved {len(docx_files)} press release(s). " f"📎 Saved {len(docx_files)} press release(s). "
f"{uploaded_count} file(s) attached.\n" f"{uploaded_count} file(s) attached.\n"
f"Generating JSON-LD schemas next...{upload_warning}", f"Generating JSON-LD schemas next...",
) )
log.info( log.info(
"ClickUp: uploaded %d attachments for task %s", uploaded_count, clickup_task_id "ClickUp: uploaded %d attachments for task %s", uploaded_count, clickup_task_id
@ -855,19 +791,31 @@ def write_press_releases(
attach_note = f"\n📎 {uploaded_count} file(s) attached." if uploaded_count else "" attach_note = f"\n📎 {uploaded_count} file(s) attached." if uploaded_count else ""
result_text = "\n".join(output_parts)[:3000] result_text = "\n".join(output_parts)[:3000]
comment = ( comment = (
f"[DONE]CheddahBot completed this task.\n\n" f"CheddahBot completed this task.\n\n"
f"Skill: write_press_releases\n" f"Skill: write_press_releases\n"
f"Result:\n{result_text}{attach_note}" f"Result:\n{result_text}{attach_note}"
) )
cu_client.add_comment(clickup_task_id, comment) cu_client.add_comment(clickup_task_id, comment)
# Set status to pr needs review # Set status to internal review
cu_client.update_task_status(clickup_task_id, config.clickup.pr_review_status) cu_client.update_task_status(clickup_task_id, config.clickup.review_status)
# Update kv_store state if one exists
db = ctx.get("db")
if db:
kv_key = f"clickup:task:{clickup_task_id}:state"
existing = db.kv_get(kv_key)
if existing:
state = json.loads(existing)
state["state"] = "completed"
state["completed_at"] = datetime.now(UTC).isoformat()
state["deliverable_paths"] = docx_files
db.kv_set(kv_key, json.dumps(state))
output_parts.append("\n## ClickUp Sync\n") output_parts.append("\n## ClickUp Sync\n")
output_parts.append(f"- Task `{clickup_task_id}` updated") output_parts.append(f"- Task `{clickup_task_id}` updated")
output_parts.append(f"- {uploaded_count} file(s) uploaded") output_parts.append(f"- {uploaded_count} file(s) uploaded")
output_parts.append(f"- Status set to '{config.clickup.pr_review_status}'") output_parts.append(f"- Status set to '{config.clickup.review_status}'")
log.info("ClickUp sync complete for task %s", clickup_task_id) log.info("ClickUp sync complete for task %s", clickup_task_id)
except Exception as e: except Exception as e:
@ -1077,7 +1025,7 @@ def _resolve_branded_url(branded_url: str, company_data: dict | None) -> str:
def _build_links( def _build_links(
pr_text: str, pr_text: str,
company_name: str, company_name: str,
keyword: str, topic: str,
target_url: str, target_url: str,
branded_url_resolved: str, branded_url_resolved: str,
) -> tuple[list[dict], list[str]]: ) -> tuple[list[dict], list[str]]:
@ -1090,13 +1038,13 @@ def _build_links(
warnings: list[str] = [] warnings: list[str] = []
# Link 1: brand+keyword → target_url # Link 1: brand+keyword → target_url
if target_url and keyword: if target_url:
anchor_phrase = _derive_anchor_phrase(company_name, keyword) anchor_phrase = _derive_anchor_phrase(company_name, topic)
if _find_anchor_in_text(pr_text, anchor_phrase): if _find_anchor_in_text(pr_text, anchor_phrase):
links.append({"url": target_url, "anchor": anchor_phrase}) links.append({"url": target_url, "anchor": anchor_phrase})
else: else:
# Try fuzzy match # Try fuzzy match
fuzzy = _fuzzy_find_anchor(pr_text, company_name, keyword) fuzzy = _fuzzy_find_anchor(pr_text, company_name, topic)
if fuzzy: if fuzzy:
links.append({"url": target_url, "anchor": fuzzy}) links.append({"url": target_url, "anchor": fuzzy})
warnings.append( warnings.append(
@ -1140,7 +1088,6 @@ def submit_press_release(
company_name: str, company_name: str,
target_url: str = "", target_url: str = "",
branded_url: str = "", branded_url: str = "",
keyword: str = "",
topic: str = "", topic: str = "",
pr_text: str = "", pr_text: str = "",
file_path: str = "", file_path: str = "",
@ -1178,6 +1125,13 @@ def submit_press_release(
f"Press Advantage requires at least 550 words. Please expand the content." f"Press Advantage requires at least 550 words. Please expand the content."
) )
# --- Derive topic from headline if not provided ---
if not topic:
topic = headline
for part in [company_name, "Inc.", "LLC", "Corp.", "Ltd.", "Limited", "Inc"]:
topic = topic.replace(part, "").strip()
topic = re.sub(r"\s+", " ", topic).strip(" -\u2013\u2014,")
# --- Load company data --- # --- Load company data ---
companies_text = _load_file_if_exists(_COMPANIES_FILE) companies_text = _load_file_if_exists(_COMPANIES_FILE)
company_all = _parse_company_data(companies_text) company_all = _parse_company_data(companies_text)
@ -1220,7 +1174,7 @@ def submit_press_release(
link_list, link_warnings = _build_links( link_list, link_warnings = _build_links(
pr_text, pr_text,
company_name, company_name,
keyword, topic,
target_url, target_url,
branded_url_resolved, branded_url_resolved,
) )
@ -1270,7 +1224,7 @@ def submit_press_release(
if link_list: if link_list:
output_parts.append("\n**Links:**") output_parts.append("\n**Links:**")
for link in link_list: for link in link_list:
output_parts.append(f' - "{link["anchor"]}" -> {link["url"]}') output_parts.append(f' - "{link["anchor"]}" {link["url"]}')
if link_warnings: if link_warnings:
output_parts.append("\n**Link warnings:**") output_parts.append("\n**Link warnings:**")

View File

@ -454,7 +454,13 @@ def create_ui(
return agent_name, agent_name, chatbot_msgs, convs, new_browser return agent_name, agent_name, chatbot_msgs, convs, new_browser
def poll_pipeline_status(agent_name): def poll_pipeline_status(agent_name):
"""Pipeline status indicator (no longer used — kept for UI timer).""" """Poll the DB for pipeline progress updates."""
agent = _get_agent(agent_name)
if not agent:
return gr.update(value="", visible=False)
status = agent.db.kv_get("pipeline:status")
if status:
return gr.update(value=f"{status}", visible=True)
return gr.update(value="", visible=False) return gr.update(value="", visible=False)
def poll_notifications(): def poll_notifications():

View File

@ -1,57 +0,0 @@
"""HTMX + FastAPI web frontend for CheddahBot."""
from __future__ import annotations
import logging
from pathlib import Path
from typing import TYPE_CHECKING
from fastapi import FastAPI
from fastapi.templating import Jinja2Templates
from starlette.staticfiles import StaticFiles
if TYPE_CHECKING:
from ..agent_registry import AgentRegistry
from ..config import Config
from ..db import Database
from ..llm import LLMAdapter
from ..notifications import NotificationBus
from ..scheduler import Scheduler
log = logging.getLogger(__name__)
_TEMPLATE_DIR = Path(__file__).resolve().parent.parent / "templates"
_STATIC_DIR = Path(__file__).resolve().parent.parent / "static"
templates = Jinja2Templates(directory=str(_TEMPLATE_DIR))
def mount_web_app(
app: FastAPI,
registry: AgentRegistry,
config: Config,
llm: LLMAdapter,
notification_bus: NotificationBus | None = None,
scheduler: Scheduler | None = None,
db: Database | None = None,
):
"""Mount all web routes and static files onto the FastAPI app."""
# Wire dependencies into route modules
from . import routes_chat, routes_pages, routes_sse
from .routes_chat import router as chat_router
from .routes_pages import router as pages_router
from .routes_sse import router as sse_router
routes_pages.setup(registry, config, llm, templates, db=db, scheduler=scheduler)
routes_chat.setup(registry, config, llm, db, templates)
routes_sse.setup(notification_bus, scheduler, db)
app.include_router(chat_router)
app.include_router(sse_router)
# Pages router last (it has catch-all GET /)
app.include_router(pages_router)
# Static files
app.mount("/static", StaticFiles(directory=str(_STATIC_DIR)), name="static")
log.info("Web UI mounted (templates: %s, static: %s)", _TEMPLATE_DIR, _STATIC_DIR)

View File

@ -1,270 +0,0 @@
"""Chat routes: send messages, stream responses, manage conversations."""
from __future__ import annotations
import asyncio
import logging
import tempfile
import time
from pathlib import Path
from typing import TYPE_CHECKING
from fastapi import APIRouter, Form, Request, UploadFile
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from sse_starlette.sse import EventSourceResponse
if TYPE_CHECKING:
from ..agent_registry import AgentRegistry
from ..config import Config
from ..db import Database
from ..llm import LLMAdapter
log = logging.getLogger(__name__)
router = APIRouter(prefix="/chat")
_registry: AgentRegistry | None = None
_config: Config | None = None
_llm: LLMAdapter | None = None
_db: Database | None = None
_templates: Jinja2Templates | None = None
# Pending responses: conv_id -> {text, files, timestamp}
_pending: dict[str, dict] = {}
def setup(registry, config, llm, db, templates):
global _registry, _config, _llm, _db, _templates
_registry = registry
_config = config
_llm = llm
_db = db
_templates = templates
def _get_agent(name: str):
if _registry:
return _registry.get(name) or _registry.default
return None
def _cleanup_pending():
"""Remove pending entries older than 60s."""
now = time.time()
expired = [k for k, v in _pending.items() if now - v["timestamp"] > 60]
for k in expired:
del _pending[k]
@router.post("/send")
async def send_message(
request: Request,
text: str = Form(""),
agent_name: str = Form("default"),
conv_id: str = Form(""),
files: list[UploadFile] | None = None,
):
"""Accept user message, return user bubble HTML + trigger SSE stream."""
_cleanup_pending()
agent = _get_agent(agent_name)
if not agent:
return HTMLResponse("<div class='error'>Agent not found</div>", status_code=400)
# Handle file uploads
saved_files = []
for f in (files or []):
if f.filename and f.size and f.size > 0:
tmp = Path(tempfile.mkdtemp()) / f.filename
content = await f.read()
tmp.write_bytes(content)
saved_files.append(str(tmp))
if not text.strip() and not saved_files:
return HTMLResponse("")
# Ensure conversation exists
if not conv_id:
agent.new_conversation()
conv_id = agent.ensure_conversation()
else:
agent.conv_id = conv_id
# Build display text
display_text = text
if saved_files:
file_names = [Path(f).name for f in saved_files]
display_text += f"\n[Attached: {', '.join(file_names)}]"
# Stash for SSE stream
_pending[conv_id] = {
"text": text,
"files": saved_files,
"timestamp": time.time(),
"agent_name": agent_name,
}
# Render user bubble + SSE trigger div
user_html = _templates.get_template("partials/chat_message.html").render(
role="user", content=display_text
)
# The SSE trigger div connects to the stream endpoint
sse_div = (
f'<div id="sse-trigger" '
f'hx-ext="sse" '
f'sse-connect="/chat/stream/{conv_id}" '
f'sse-swap="chunk" '
f'hx-target="#assistant-response" '
f'hx-swap="beforeend">'
f'</div>'
f'<div id="assistant-bubble" class="message assistant">'
f'<div class="message-avatar">CB</div>'
f'<div class="message-body">'
f'<div id="assistant-response" class="message-content"></div>'
f'</div></div>'
)
headers = {
"HX-Trigger-After-Swap": "scrollChat",
"HX-Push-Url": f"/?conv={conv_id}",
}
return HTMLResponse(user_html + sse_div, headers=headers)
@router.get("/stream/{conv_id}")
async def stream_response(conv_id: str):
"""SSE endpoint: stream assistant response chunks."""
pending = _pending.pop(conv_id, None)
if not pending:
async def empty():
yield {"event": "done", "data": ""}
return EventSourceResponse(empty())
agent = _get_agent(pending["agent_name"])
if not agent:
async def error():
yield {"event": "chunk", "data": "Agent not found"}
yield {"event": "done", "data": ""}
return EventSourceResponse(error())
agent.conv_id = conv_id
async def generate():
loop = asyncio.get_event_loop()
queue: asyncio.Queue = asyncio.Queue()
def run_agent():
try:
for chunk in agent.respond(pending["text"], files=pending.get("files")):
loop.call_soon_threadsafe(queue.put_nowait, ("chunk", chunk))
except Exception as e:
log.error("Stream error: %s", e, exc_info=True)
loop.call_soon_threadsafe(
queue.put_nowait, ("chunk", f"\n\nError: {e}")
)
finally:
loop.call_soon_threadsafe(queue.put_nowait, ("done", ""))
# Run agent.respond() in a thread
import threading
t = threading.Thread(target=run_agent, daemon=True)
t.start()
while True:
event, data = await queue.get()
if event == "done":
yield {"event": "done", "data": conv_id}
break
yield {"event": "chunk", "data": data}
return EventSourceResponse(generate())
@router.get("/conversations")
async def list_conversations(agent_name: str = "default"):
"""Return sidebar conversation list as HTML partial."""
agent = _get_agent(agent_name)
if not agent:
return HTMLResponse("")
convs = agent.db.list_conversations(limit=50, agent_name=agent_name)
html = _templates.get_template("partials/chat_sidebar.html").render(
conversations=convs
)
return HTMLResponse(html)
@router.post("/new")
async def new_conversation(agent_name: str = Form("default")):
"""Create a new conversation, return empty chat + updated sidebar."""
agent = _get_agent(agent_name)
if not agent:
return HTMLResponse("")
agent.new_conversation()
conv_id = agent.ensure_conversation()
convs = agent.db.list_conversations(limit=50, agent_name=agent_name)
sidebar_html = _templates.get_template("partials/chat_sidebar.html").render(
conversations=convs
)
# Return empty chat area + sidebar update via OOB swap
html = (
f'<div id="chat-messages"></div>'
f'<div id="sidebar-conversations" hx-swap-oob="innerHTML">'
f'{sidebar_html}</div>'
)
headers = {"HX-Push-Url": f"/?conv={conv_id}"}
return HTMLResponse(html, headers=headers)
@router.get("/load/{conv_id}")
async def load_conversation(conv_id: str, agent_name: str = "default"):
"""Load conversation history as HTML."""
agent = _get_agent(agent_name)
if not agent:
return HTMLResponse("")
messages = agent.load_conversation(conv_id)
parts = []
for msg in messages:
role = msg.get("role", "")
content = msg.get("content", "")
if role in ("user", "assistant") and content:
parts.append(
_templates.get_template("partials/chat_message.html").render(
role=role, content=content
)
)
headers = {"HX-Push-Url": f"/?conv={conv_id}"}
return HTMLResponse("\n".join(parts), headers=headers)
@router.post("/agent/{name}")
async def switch_agent(name: str):
"""Switch active agent. Returns updated sidebar via OOB."""
agent = _get_agent(name)
if not agent:
return HTMLResponse("<div class='error'>Agent not found</div>", status_code=400)
agent.new_conversation()
conv_id = agent.ensure_conversation()
convs = agent.db.list_conversations(limit=50, agent_name=name)
sidebar_html = _templates.get_template("partials/chat_sidebar.html").render(
conversations=convs
)
html = (
f'<div id="chat-messages"></div>'
f'<div id="sidebar-conversations" hx-swap-oob="innerHTML">'
f'{sidebar_html}</div>'
)
headers = {"HX-Push-Url": f"/?conv={conv_id}"}
return HTMLResponse(html, headers=headers)

View File

@ -1,172 +0,0 @@
"""Page routes: GET / (chat), GET /dashboard, dashboard partials."""
from __future__ import annotations
import logging
from datetime import UTC, datetime
from typing import TYPE_CHECKING
from fastapi import APIRouter, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
if TYPE_CHECKING:
from ..agent_registry import AgentRegistry
from ..config import Config
from ..db import Database
from ..llm import LLMAdapter
from ..scheduler import Scheduler
log = logging.getLogger(__name__)
router = APIRouter()
_registry: AgentRegistry | None = None
_config: Config | None = None
_llm: LLMAdapter | None = None
_db: Database | None = None
_scheduler: Scheduler | None = None
_templates: Jinja2Templates | None = None
def setup(registry, config, llm, templates, db=None, scheduler=None):
global _registry, _config, _llm, _templates, _db, _scheduler
_registry = registry
_config = config
_llm = llm
_templates = templates
_db = db
_scheduler = scheduler
@router.get("/")
async def chat_page(request: Request):
agent_names = _registry.list_agents() if _registry else []
agents = []
for name in agent_names:
agent = _registry.get(name)
display = agent.agent_config.display_name if agent else name
agents.append({"name": name, "display_name": display})
default_agent = _registry.default_name if _registry else "default"
chat_model = _config.chat_model if _config else "unknown"
exec_available = _llm.is_execution_brain_available() if _llm else False
clickup_enabled = _config.clickup.enabled if _config else False
return _templates.TemplateResponse("chat.html", {
"request": request,
"agents": agents,
"default_agent": default_agent,
"chat_model": chat_model,
"exec_available": exec_available,
"clickup_enabled": clickup_enabled,
})
@router.get("/dashboard")
async def dashboard_page(request: Request):
return _templates.TemplateResponse("dashboard.html", {
"request": request,
})
@router.get("/dashboard/pipeline")
async def dashboard_pipeline():
"""Return pipeline panel HTML partial with task data."""
if not _config or not _config.clickup.enabled:
return HTMLResponse('<p class="text-muted">ClickUp not configured</p>')
try:
from ..api import get_tasks
data = await get_tasks()
all_tasks = data.get("tasks", [])
except Exception as e:
log.error("Pipeline data fetch failed: %s", e)
return HTMLResponse(f'<p class="text-err">Error: {e}</p>')
# Group by work category, then by status
pipeline_statuses = [
"to do", "automation underway", "outline review", "internal review", "error",
]
categories = {} # category -> {status -> [tasks]}
for t in all_tasks:
cat = t.get("task_type") or "Other"
status = t.get("status", "unknown")
# Only show tasks in pipeline-relevant statuses
if status not in pipeline_statuses:
continue
if cat not in categories:
categories[cat] = {}
categories[cat].setdefault(status, []).append(t)
# Build HTML
html_parts = []
# Status summary counts
total_counts = {}
for cat_data in categories.values():
for status, tasks in cat_data.items():
total_counts[status] = total_counts.get(status, 0) + len(tasks)
if total_counts:
html_parts.append('<div class="pipeline-stats">')
for status in pipeline_statuses:
count = total_counts.get(status, 0)
html_parts.append(
f'<div class="pipeline-stat">'
f'<div class="stat-count">{count}</div>'
f'<div class="stat-label">{status}</div>'
f'</div>'
)
html_parts.append('</div>')
# Per-category tables
for cat_name in sorted(categories.keys()):
cat_data = categories[cat_name]
all_cat_tasks = []
for status in pipeline_statuses:
all_cat_tasks.extend(cat_data.get(status, []))
if not all_cat_tasks:
continue
html_parts.append(f'<div class="pipeline-group"><h4>{cat_name} ({len(all_cat_tasks)})</h4>')
html_parts.append('<table class="task-table"><thead><tr>'
'<th>Task</th><th>Customer</th><th>Status</th><th>Due</th>'
'</tr></thead><tbody>')
for task in all_cat_tasks:
name = task.get("name", "")
url = task.get("url", "")
customer = (task.get("custom_fields") or {}).get("Client", "N/A")
status = task.get("status", "")
status_class = "status-" + status.replace(" ", "-")
# Format due date
due_display = "-"
due_raw = task.get("due_date")
if due_raw:
try:
due_dt = datetime.fromtimestamp(int(due_raw) / 1000, tz=UTC)
due_display = due_dt.strftime("%b %d")
except (ValueError, TypeError, OSError):
pass
name_cell = (
f'<a href="{url}" target="_blank">{name}</a>' if url else name
)
html_parts.append(
f'<tr><td>{name_cell}</td><td>{customer}</td>'
f'<td><span class="status-badge {status_class}">{status}</span></td>'
f'<td>{due_display}</td></tr>'
)
html_parts.append('</tbody></table></div>')
if not html_parts:
return HTMLResponse('<p class="text-muted">No active pipeline tasks</p>')
return HTMLResponse('\n'.join(html_parts))

View File

@ -1,94 +0,0 @@
"""SSE routes for live dashboard updates."""
from __future__ import annotations
import asyncio
import json
import logging
from datetime import datetime
from typing import TYPE_CHECKING
from fastapi import APIRouter
from sse_starlette.sse import EventSourceResponse
if TYPE_CHECKING:
from ..db import Database
from ..notifications import NotificationBus
from ..scheduler import Scheduler
log = logging.getLogger(__name__)
router = APIRouter(prefix="/sse")
_notification_bus: NotificationBus | None = None
_scheduler: Scheduler | None = None
_db: Database | None = None
def setup(notification_bus, scheduler, db):
global _notification_bus, _scheduler, _db
_notification_bus = notification_bus
_scheduler = scheduler
_db = db
@router.get("/notifications")
async def sse_notifications():
"""Stream new notifications as they arrive."""
listener_id = f"sse-notif-{id(asyncio.current_task())}"
# Subscribe to notification bus
queue: asyncio.Queue = asyncio.Queue()
loop = asyncio.get_event_loop()
if _notification_bus:
def on_notify(msg, cat):
loop.call_soon_threadsafe(
queue.put_nowait, {"message": msg, "category": cat}
)
_notification_bus.subscribe(listener_id, on_notify)
async def generate():
try:
while True:
try:
notif = await asyncio.wait_for(queue.get(), timeout=30)
yield {
"event": "notification",
"data": json.dumps(notif),
}
except TimeoutError:
yield {"event": "heartbeat", "data": ""}
finally:
if _notification_bus:
_notification_bus.unsubscribe(listener_id)
return EventSourceResponse(generate())
@router.get("/loops")
async def sse_loops():
"""Push loop timestamps + active executions every 15s."""
async def generate():
while True:
data = {"loops": {}, "executions": {}}
if _scheduler:
ts = _scheduler.get_loop_timestamps()
data["loops"] = ts
# Serialize active executions (datetime -> str)
raw_exec = _scheduler.get_active_executions()
execs = {}
for tid, info in raw_exec.items():
execs[tid] = {
"name": info.get("name", ""),
"tool": info.get("tool", ""),
"started_at": info["started_at"].isoformat()
if isinstance(info.get("started_at"), datetime)
else str(info.get("started_at", "")),
"thread": info.get("thread", ""),
}
data["executions"] = execs
yield {"event": "loops", "data": json.dumps(data)}
await asyncio.sleep(15)
return EventSourceResponse(generate())

View File

@ -42,10 +42,8 @@ email:
# ClickUp integration # ClickUp integration
clickup: clickup:
poll_interval_minutes: 20 # 3x per hour poll_interval_minutes: 20 # 3x per hour
poll_statuses: ["to do", "outline approved"] poll_statuses: ["to do"]
poll_task_types: ["Press Release", "On Page Optimization", "Content Creation", "Link Building"]
review_status: "internal review" review_status: "internal review"
pr_review_status: "pr needs review"
in_progress_status: "in progress" in_progress_status: "in progress"
automation_status: "automation underway" automation_status: "automation underway"
error_status: "error" error_status: "error"
@ -55,35 +53,14 @@ clickup:
"Press Release": "Press Release":
tool: "write_press_releases" tool: "write_press_releases"
auto_execute: true auto_execute: true
required_fields: [topic, company_name, target_url]
field_mapping: field_mapping:
topic: "PR Topic" topic: "task_name"
keyword: "Keyword" company_name: "Customer"
company_name: "Client"
target_url: "IMSURL" target_url: "IMSURL"
branded_url: "SocialURL" branded_url: "SocialURL"
"On Page Optimization":
tool: "create_content"
auto_execute: false
trigger_hint: "content-cora-inbox file watcher"
required_fields: [keyword, url]
field_mapping:
url: "IMSURL"
keyword: "Keyword"
cli_flags: "CLIFlags"
"Content Creation":
tool: "create_content"
auto_execute: false
auto_execute_on_status: ["outline approved"]
trigger_hint: "content-cora-inbox file watcher (Phase 1), outline approved status (Phase 2)"
field_mapping:
url: "IMSURL"
keyword: "Keyword"
cli_flags: "CLIFlags"
"Link Building": "Link Building":
tool: "run_link_building" tool: "run_link_building"
auto_execute: false auto_execute: false
trigger_hint: "cora-inbox file watcher"
complete_status: "complete" complete_status: "complete"
error_status: "error" error_status: "error"
field_mapping: field_mapping:
@ -98,46 +75,18 @@ clickup:
# Link Building settings # Link Building settings
link_building: link_building:
blm_dir: "E:/dev/Big-Link-Man" blm_dir: "E:/dev/Big-Link-Man"
watch_folder: "//PennQnap1/SHARE1/cora-inbox" watch_folder: "Z:/cora-inbox"
watch_interval_minutes: 10 watch_interval_minutes: 60
default_branded_plus_ratio: 0.7 default_branded_plus_ratio: 0.7
# AutoCora job submission # AutoCora job submission
autocora: autocora:
jobs_dir: "//PennQnap1/SHARE1/AutoCora/jobs" jobs_dir: "//PennQnap1/SHARE1/AutoCora/jobs"
results_dir: "//PennQnap1/SHARE1/AutoCora/results" results_dir: "//PennQnap1/SHARE1/AutoCora/results"
poll_interval_minutes: 20 poll_interval_minutes: 5
success_status: "running cora" success_status: "running cora"
error_status: "error" error_status: "error"
enabled: true enabled: true
cora_human_inbox: "//PennQnap1/SHARE1/Cora-For-Human"
# Content creation settings
content:
cora_inbox: "//PennQnap1/SHARE1/content-cora-inbox"
outline_dir: "//PennQnap1/SHARE1/content-outlines"
# ntfy.sh push notifications
ntfy:
enabled: true
channels:
- name: human_action
topic_env_var: NTFY_TOPIC_HUMAN_ACTION
categories: [clickup, autocora, linkbuilding, content]
include_patterns: ["completed", "SUCCESS", "copied to"]
priority: high
tags: white_check_mark
- name: errors
topic_env_var: NTFY_TOPIC_ERRORS
categories: [clickup, autocora, linkbuilding, content]
include_patterns: ["failed", "FAILURE", "skipped", "no ClickUp match", "copy failed", "IMSURL is empty"]
priority: urgent
tags: rotating_light
- name: daily_briefing
topic_env_var: NTFY_TOPIC_DAILY_BRIEFING
categories: [briefing]
priority: high
tags: clipboard
# Multi-agent configuration # Multi-agent configuration
# Each agent gets its own personality, tool whitelist, and memory scope. # Each agent gets its own personality, tool whitelist, and memory scope.
@ -173,11 +122,6 @@ agents:
tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, submit_autocora_jobs, poll_autocora_results, delegate_task, remember, search_memory] tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, submit_autocora_jobs, poll_autocora_results, delegate_task, remember, search_memory]
memory_scope: "" memory_scope: ""
- name: content_creator
display_name: Content Creator
tools: [create_content, continue_content, delegate_task, remember, search_memory, web_search, web_fetch]
memory_scope: ""
- name: planner - name: planner
display_name: Planner display_name: Planner
model: "x-ai/grok-4.1-fast" model: "x-ai/grok-4.1-fast"

View File

@ -1,287 +0,0 @@
# Link Building Agent Plan
## Context
CheddahBot needs a link building agent that orchestrates the external Big-Link-Man CLI tool (`E:/dev/Big-Link-Man/`). The current workflow is manual: run Cora on another machine → get .xlsx → manually run `main.py ingest-cora` → manually run `main.py generate-batch`. This agent automates steps 2 and 3, triggered by folder watching, ClickUp tasks, or chat commands. It must be expandable for future link building methods (MCP server path, ingest-simple, etc.).
## Decisions Made
- **Watch folder**: `Z:/cora-inbox` (network drive, Cora machine accessible)
- **File→task matching**: Fuzzy match .xlsx filename stem against ClickUp task's `Keyword` custom field
- **New ClickUp field "LB Method"**: Dropdown with initial option "Cora Backlinks" (more added later)
- **Dashboard**: API endpoint + NotificationBus events only (no frontend work — separate project)
- **Sidecar files**: Not needed — all metadata comes from the matching ClickUp task
- **Tool naming**: Orchestrator pattern — `run_link_building` is a thin dispatcher that reads `LB Method` and routes to the specific pipeline tool (e.g., `run_cora_backlinks`). Future link building methods get their own tools and slot into the orchestrator.
## Files to Create
### 1. `cheddahbot/tools/linkbuilding.py` — Main tool module
Four `@tool`-decorated functions + private helpers:
**`run_link_building(lb_method="", xlsx_path="", project_name="", money_site_url="", branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- **Orchestrator/dispatcher** — reads `lb_method` (from ClickUp "LB Method" field or chat) and routes to the correct pipeline tool
- If `lb_method` is "Cora Backlinks" or empty (default): calls `run_cora_backlinks()`
- Future: if `lb_method` is "MCP Link Building": calls `run_mcp_link_building()` (not yet implemented)
- Passes all other args through to the sub-tool
- This is what the ClickUp skill_map always routes to
**`run_cora_backlinks(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- The actual Cora pipeline — runs ingest-cora → generate-batch
- Step 1: Build CLI args, call `_run_blm_command(["ingest-cora", ...])`, parse stdout for job file path
- Step 2: Call `_run_blm_command(["generate-batch", "-j", job_file, "--continue-on-error"])`
- Updates KV store state and posts ClickUp comments at each step (following press_release.py pattern)
- Returns `## ClickUp Sync` in output to signal scheduler that sync was handled internally
- Can also be called directly from chat for explicit Cora runs
**`blm_ingest_cora(xlsx_path, project_name, money_site_url, branded_plus_ratio=0.7, custom_anchors="", cli_flags="", ctx=None)`**
- Standalone ingest — runs ingest-cora only, returns project ID and job file path
- For cases where user wants to ingest but not generate yet
**`blm_generate_batch(job_file, continue_on_error=True, debug=False, ctx=None)`**
- Standalone generate — runs generate-batch only on an existing job file
- For re-running generation or running a manually-created job
**Private helpers:**
- `_run_blm_command(args, timeout=1800)` — subprocess wrapper, runs `uv run python main.py <args>` from BLM_DIR, injects `-u`/`-p` from `BLM_USERNAME`/`BLM_PASSWORD` env vars
- `_parse_ingest_output(stdout)` — regex extract project_id + job_file path
- `_parse_generate_output(stdout)` — extract completion stats
- `_build_ingest_args(...)` — construct CLI argument list from tool params
- `_set_status(ctx, message)` — write pipeline status to KV store (for UI polling)
- `_sync_clickup(ctx, task_id, step, message)` — post comment + update state
**Critical: always pass `-m` flag** to ingest-cora to prevent interactive stdin prompt from blocking the subprocess.
### 2. `skills/linkbuilding.md` — Skill file
YAML frontmatter linking to `[run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder]` tools and `[link_builder, default]` agents. Markdown body describes when to use, default flags, workflow steps.
### 3. `tests/test_linkbuilding.py` — Test suite (~40 tests)
All tests mock `subprocess.run` — never call Big-Link-Man. Categories:
- Output parser unit tests (`_parse_ingest_output`, `_parse_generate_output`)
- CLI arg builder tests (all flag combinations, missing required params)
- Full pipeline integration (happy path, ingest failure, generate failure)
- ClickUp state machine (executing → completed, executing → failed)
- Folder watcher scan logic (new files, skip processed, missing ClickUp match)
## Files to Modify
### 4. `cheddahbot/config.py` — Add LinkBuildingConfig
```python
@dataclass
class LinkBuildingConfig:
blm_dir: str = "E:/dev/Big-Link-Man"
watch_folder: str = "" # empty = disabled
watch_interval_minutes: int = 60
default_branded_plus_ratio: float = 0.7
```
Add `link_building: LinkBuildingConfig` field to `Config` dataclass. Add YAML loading block in `load_config()` (same pattern as memory/scheduler/shell). Add env var override for `BLM_DIR`.
### 5. `config.yaml` — Three additions
**New top-level section:**
```yaml
link_building:
blm_dir: "E:/dev/Big-Link-Man"
watch_folder: "Z:/cora-inbox"
watch_interval_minutes: 60
default_branded_plus_ratio: 0.7
```
**New skill_map entry under clickup:**
```yaml
"Link Building":
tool: "run_link_building"
auto_execute: false # Cora Backlinks triggered by folder watcher, not scheduler
complete_status: "complete" # Override: use "complete" instead of "internal review"
error_status: "internal review" # On failure, move to internal review
field_mapping:
lb_method: "LB Method"
project_name: "task_name"
money_site_url: "IMSURL"
custom_anchors: "CustomAnchors"
branded_plus_ratio: "BrandedPlusRatio"
cli_flags: "CLIFlags"
xlsx_path: "CoraFile"
```
**New agent:**
```yaml
- name: link_builder
display_name: Link Builder
tools: [run_link_building, run_cora_backlinks, blm_ingest_cora, blm_generate_batch, scan_cora_folder, delegate_task, remember, search_memory]
memory_scope: ""
```
### 6. `cheddahbot/scheduler.py` — Add folder watcher (4th daemon thread)
**New thread `_folder_watch_loop`** alongside existing poll, heartbeat, and ClickUp threads:
- Starts if `config.link_building.watch_folder` is non-empty
- Runs every `watch_interval_minutes` (default 60)
- `_scan_watch_folder()` globs `*.xlsx` in watch folder
- For each file, checks KV store `linkbuilding:watched:{filename}` — skip if already processed
- **Fuzzy-matches filename stem against ClickUp tasks** with `LB Method = "Cora Backlinks"` and status "to do":
- Queries ClickUp for Link Building tasks
- Compares normalized filename stem against each task's `Keyword` custom field
- If match found: extracts money_site_url from IMSURL field, cli_flags from CLIFlags field, etc.
- If no match: logs warning, marks as "unmatched" in KV store, sends notification asking user to create/link a ClickUp task
- On match: executes `run_link_building` tool with args from the ClickUp task fields
- On completion: moves .xlsx to `Z:/cora-inbox/processed/` subfolder, updates KV state
- On failure: updates KV state with error, notifies via NotificationBus
**File handling after pipeline:**
- On success: .xlsx moved from `Z:/cora-inbox/``Z:/cora-inbox/processed/`
- On failure: .xlsx stays in `Z:/cora-inbox/` (KV store marks it as failed so watcher doesn't retry automatically; user can reset KV entry to retry)
**Also adds `scan_cora_folder` tool** (can live in linkbuilding.py):
- Chat-invocable utility for the agent to check what's in the watch folder
- Returns list of unprocessed .xlsx files with ClickUp match status
- Internal agent tool, not a dashboard concern
### 7. `cheddahbot/clickup.py` — Add field creation method
Add `create_custom_field(list_id, name, field_type, type_config=None)` method that calls `POST /list/{list_id}/field`. Used by the setup tool to auto-create custom fields across lists.
### 8. `cheddahbot/__main__.py` — Add API endpoint
Add before Gradio mount:
```python
@fastapi_app.get("/api/linkbuilding/status")
async def linkbuilding_status():
"""Return link building status for dashboard consumption."""
# Returns:
# {
# "pending_cora_runs": [
# {"keyword": "precision cnc machining", "url": "https://...", "client": "Chapter 2", "task_id": "abc123"},
# ...
# ],
# "in_progress": [...], # Currently executing pipelines
# "completed": [...], # Recently completed (last 7 days)
# "failed": [...] # Failed tasks needing attention
# }
```
The `pending_cora_runs` section is the key dashboard data: queries ClickUp for "to do" tasks with Work Category="Link Building" and LB Method="Cora Backlinks", returns each task's `Keyword` field and `IMSURL` (copiable URL) so the user can see exactly which Cora reports need to be run.
Also push link building events to NotificationBus (category="linkbuilding") at each pipeline step for future real-time dashboard support.
No other `__main__.py` changes needed — agent wiring is automatic from config.yaml.
## ClickUp Custom Fields (Auto-Created)
New custom fields to be created programmatically:
| Field | Type | Purpose |
|-------|------|---------|
| `LB Method` | Dropdown | Link building subtype. Initial option: "Cora Backlinks" |
| `Keyword` | Short Text | Target keyword (used for file matching) |
| `CoraFile` | Short Text | Path to .xlsx file (optional, set by agent after file match) |
| `CustomAnchors` | Short Text | Comma-separated anchor text overrides |
| `BrandedPlusRatio` | Short Text | Override for `-bp` flag (e.g., "0.7") |
| `CLIFlags` | Short Text | Raw additional CLI flags (e.g., "-r 5 -t 0.3") |
Fields that already exist and will be reused: `Client`, `IMSURL`, `Work Category` (add "Link Building" option).
### Auto-creation approach
- Add `create_custom_field(list_id, name, type, type_config=None)` method to `cheddahbot/clickup.py` — calls `POST /list/{list_id}/field`
- Add a `setup_linkbuilding_fields` tool (category="linkbuilding") that:
1. Gets all list IDs in the space
2. For each list, checks if fields already exist (via `get_custom_fields`)
3. Creates missing fields via the new API method
4. For `LB Method` dropdown, creates with `type_config` containing "Cora Backlinks" option
5. For `Work Category`, adds "Link Building" option if missing
- This tool runs once during initial setup, or can be re-run if new lists are added
- Also add "Link Building" as an option to the existing `Work Category` dropdown if not present
## Data Flow & Status Lifecycle
### Primary Trigger: Folder Watcher (Cora Backlinks)
The folder watcher is the main trigger for Cora Backlinks. The ClickUp scheduler does NOT auto-execute these — it can't, because the .xlsx doesn't exist until the user runs Cora.
```
1. ClickUp task created:
Work Category="Link Building", LB Method="Cora Backlinks", status="to do"
Fields filled: Client, IMSURL, Keyword, CLIFlags, BrandedPlusRatio, etc.
→ Appears on dashboard as "needs Cora run"
2. User runs Cora manually, drops .xlsx in Z:/cora-inbox
3. Folder watcher (_scan_watch_folder, runs every 60 min):
→ Finds precision-cnc-machining.xlsx
→ Fuzzy matches "precision cnc machining" against Keyword field on ClickUp "to do" Link Building tasks
→ Match found → extracts metadata from ClickUp task (IMSURL, CLIFlags, etc.)
→ Sets CoraFile field on the ClickUp task to the file path
→ Moves task to "in progress"
→ Posts comment: "Starting Cora Backlinks pipeline..."
4. Pipeline runs:
→ Step 1: ingest-cora → comment: "CORA report ingested. Job file: jobs/xxx.json"
→ Step 2: generate-batch → comment: "Content generation complete. X articles across Y tiers."
5. On success:
→ Move task to "complete"
→ Post summary comment with stats
→ Move .xlsx to Z:/cora-inbox/processed/
6. On failure:
→ Move task to "internal review"
→ Post error comment with details
→ .xlsx stays in Z:/cora-inbox (can retry)
```
### Secondary Trigger: Chat
```
User: "Run link building for Z:/cora-inbox/precision-cnc-machining.xlsx"
→ Chat brain calls run_cora_backlinks (or run_link_building with explicit lb_method)
→ Tool auto-looks up matching ClickUp task via Keyword field (if exists)
→ Same pipeline + ClickUp sync as above
→ If no ClickUp match: runs pipeline without ClickUp tracking, returns results to chat only
```
### Future Trigger: ClickUp Scheduler (other LB Methods)
Future link building methods (MCP, etc.) that don't need a .xlsx CAN be auto-executed by the ClickUp scheduler. The `run_link_building` orchestrator checks `lb_method`:
- "Cora Backlinks" → requires xlsx_path, skips if empty (folder watcher handles these)
- Future methods → can execute directly from ClickUp task data
### ClickUp Skill Map Note
The skill_map entry for "Link Building" exists primarily for **field mapping reference** (so the folder watcher and chat know which ClickUp fields map to which tool params). The ClickUp scheduler will discover these tasks but `run_link_building` will skip Cora Backlinks that have no xlsx_path — they're waiting for the folder watcher.
## Implementation Order
1. **Config** — Add `LinkBuildingConfig` to config.py, add `link_building:` section to config.yaml, add `link_builder` agent to config.yaml
2. **Core tools** — Create `cheddahbot/tools/linkbuilding.py` with `_run_blm_command`, parsers, `run_link_building` orchestrator, and `run_cora_backlinks` pipeline
3. **Standalone tools** — Add `blm_ingest_cora` and `blm_generate_batch`
4. **Tests** — Create `tests/test_linkbuilding.py`, verify with `uv run pytest tests/test_linkbuilding.py -v`
5. **ClickUp field creation** — Add `create_custom_field` to clickup.py, add `setup_linkbuilding_fields` tool
6. **ClickUp integration** — Add skill_map entry, add ClickUp state tracking to tools
7. **Folder watcher** — Add `_folder_watch_loop` to scheduler.py, add `scan_cora_folder` tool
8. **API endpoint** — Add `/api/linkbuilding/status` to `__main__.py`
9. **Skill file** — Create `skills/linkbuilding.md`
10. **ClickUp setup** — Run `setup_linkbuilding_fields` to auto-create custom fields across all lists
11. **Full test run**`uv run pytest -v --no-cov`
## Verification
1. **Unit tests**: `uv run pytest tests/test_linkbuilding.py -v` — all pass with mocked subprocess
2. **Full suite**: `uv run pytest -v --no-cov` — no regressions
3. **Lint**: `uv run ruff check .` + `uv run ruff format .`
4. **Manual e2e**: Drop a real .xlsx in Z:/cora-inbox, verify ingest-cora runs, job JSON created, generate-batch runs
5. **ClickUp e2e**: Create a Link Building task in ClickUp with proper fields, wait for scheduler poll, verify execution
6. **Chat e2e**: Ask CheddahBot to "run link building for [keyword]" via chat UI
7. **API check**: Hit `http://localhost:7860/api/linkbuilding/status` and verify data returned
## Key Reference Files
- `cheddahbot/tools/press_release.py` — Reference pattern for multi-step pipeline tool
- `cheddahbot/scheduler.py:55-76` — Where to add 4th daemon thread
- `cheddahbot/config.py:108-200` — load_config() pattern for new config sections
- `E:/dev/Big-Link-Man/docs/CLI_COMMAND_REFERENCE.md` — Full CLI reference
- `E:/dev/Big-Link-Man/src/cli/commands.py` — Exact output formats to parse

View File

@ -1,721 +0,0 @@
# CheddahBot Architecture
## System Overview
CheddahBot is a personal AI assistant built in Python. It exposes a Gradio-based
web UI, routes user messages through an agent loop backed by a model-agnostic LLM
adapter, persists conversations in SQLite, maintains a 4-layer memory system with
optional semantic search, and provides an extensible tool registry that the LLM
can invoke mid-conversation. A background scheduler handles cron-based tasks and
periodic heartbeat checks.
### Data Flow Diagram
```
User (browser)
|
v
+-----------+ +------------+ +--------------+
| Gradio UI | ---> | Agent | ---> | LLM Adapter |
| (ui.py) | | (agent.py) | | (llm.py) |
+-----------+ +-----+------+ +------+-------+
| |
+------------+-------+ +-------+--------+
| | | | Claude CLI |
v v v | OpenRouter |
+---------+ +---------+ +---+ | Ollama |
| Router | | Tools | | DB| | LM Studio |
|(router) | |(tools/) | |(db| +----------------+
+----+----+ +----+----+ +---+
| |
+-------+--+ +----+----+
| Identity | | Memory |
| SOUL.md | | System |
| USER.md | |(memory) |
+----------+ +---------+
```
1. The user submits text (or voice / files) through the Gradio interface.
2. `ui.py` hands the message to `Agent.respond()`.
3. The agent stores the user message in SQLite, builds a system prompt via
`router.py` (loading identity files and memory context), and formats the
conversation history.
4. The agent sends messages to `LLMAdapter.chat()` which dispatches to the
correct provider backend.
5. The LLM response streams back. If it contains tool-call requests, the agent
executes them through `ToolRegistry.execute()`, appends the results, and loops
back to step 4 (up to 10 iterations).
6. The final assistant response is stored in the database and streamed to the UI.
7. After responding, the agent checks whether the conversation has exceeded the
flush threshold; if so, the memory system summarizes older messages into the
daily log.
---
## Module-by-Module Breakdown
### `__main__.py` -- Entry Point
**File:** `cheddahbot/__main__.py`
Orchestrates startup in this order:
1. `load_config()` -- loads configuration from env vars / YAML / defaults.
2. `Database(config.db_path)` -- opens (or creates) the SQLite database.
3. `LLMAdapter(...)` -- initializes the model-agnostic LLM client.
4. `Agent(config, db, llm)` -- creates the core agent.
5. `MemorySystem(config, db)` -- initializes the memory system and injects it
into the agent via `agent.set_memory()`.
6. `ToolRegistry(config, db, agent)` -- auto-discovers and loads all tool
modules, then injects via `agent.set_tools()`.
7. `Scheduler(config, db, agent)` -- starts two daemon threads (task poller and
heartbeat).
8. `create_ui(agent, config, llm)` -- builds the Gradio Blocks app and launches
it on the configured host/port.
Each subsystem (memory, tools, scheduler) is wrapped in a try/except so the
application degrades gracefully if optional dependencies are missing.
---
### `config.py` -- Configuration
**File:** `cheddahbot/config.py`
Defines four dataclasses:
| Dataclass | Key Fields |
|------------------|---------------------------------------------------------------|
| `Config` | `default_model`, `host`, `port`, `ollama_url`, `lmstudio_url`, `openrouter_api_key`, plus derived paths (`root_dir`, `data_dir`, `identity_dir`, `memory_dir`, `skills_dir`, `db_path`) |
| `MemoryConfig` | `max_context_messages` (50), `flush_threshold` (40), `embedding_model` ("all-MiniLM-L6-v2"), `search_top_k` (5) |
| `SchedulerConfig` | `heartbeat_interval_minutes` (30), `poll_interval_seconds` (60) |
| `ShellConfig` | `blocked_commands`, `require_approval` (False) |
`load_config()` applies three layers of configuration in priority order:
1. Dataclass defaults (lowest priority).
2. `config.yaml` at the project root (middle priority).
3. Environment variables with the `CHEDDAH_` prefix, plus `OPENROUTER_API_KEY`
(highest priority).
The function also ensures required data directories exist on disk.
---
### `db.py` -- Database Layer
**File:** `cheddahbot/db.py`
A thin wrapper around SQLite using thread-local connections (one connection per
thread), WAL journal mode, and foreign keys.
**Key methods:**
- `create_conversation(conv_id, title)` -- insert a new conversation row.
- `list_conversations(limit)` -- return recent conversations ordered by
`updated_at`.
- `add_message(conv_id, role, content, ...)` -- insert a message and touch the
conversation's `updated_at`.
- `get_messages(conv_id, limit)` -- return messages in chronological order.
- `count_messages(conv_id)` -- count messages for flush-threshold checks.
- `add_scheduled_task(name, prompt, schedule)` -- persist a scheduled task.
- `get_due_tasks()` -- return tasks whose `next_run` is in the past or NULL.
- `update_task_next_run(task_id, next_run)` -- update the next execution time.
- `log_task_run(task_id, result, error)` -- record the outcome of a task run.
- `kv_set(key, value)` / `kv_get(key)` -- generic key-value store.
---
### `agent.py` -- Core Agent Loop
**File:** `cheddahbot/agent.py`
Contains the `Agent` class, the central coordinator.
**Key members:**
- `conv_id` -- current conversation ID (a 12-character hex string).
- `_memory` -- optional `MemorySystem` reference.
- `_tools` -- optional `ToolRegistry` reference.
**Primary method: `respond(user_input, files)`**
This is a Python generator that yields text chunks for streaming. The detailed
flow is described in the next section.
**Helper: `respond_to_prompt(prompt)`**
Non-streaming wrapper that collects all chunks and returns a single string. Used
by the scheduler and heartbeat for internal prompts.
---
### `router.py` -- System Prompt Builder
**File:** `cheddahbot/router.py`
Two functions:
1. `build_system_prompt(identity_dir, memory_context, tools_description)` --
assembles the full system prompt by concatenating these sections separated by
horizontal rules:
- Contents of `identity/SOUL.md`
- Contents of `identity/USER.md`
- Memory context string (from the memory system)
- Tools description listing (from the tool registry)
- A fixed "Instructions" section with core behavioral directives.
2. `format_messages_for_llm(system_prompt, history, max_messages)` --
converts raw database rows into the `[{role, content}]` format expected by
the LLM. The system prompt becomes the first message. Tool results are
converted to user messages prefixed with `[Tool Result]`. History is trimmed
to the most recent `max_messages` entries.
---
### `llm.py` -- LLM Adapter
**File:** `cheddahbot/llm.py`
Described in detail in a dedicated section below.
---
### `memory.py` -- Memory System
**File:** `cheddahbot/memory.py`
Described in detail in a dedicated section below.
---
### `media.py` -- Audio/Video Processing
**File:** `cheddahbot/media.py`
Three utility functions:
- `transcribe_audio(path)` -- Speech-to-text. Tries local Whisper first, then
falls back to the OpenAI Whisper API.
- `text_to_speech(text, output_path, voice)` -- Text-to-speech via `edge-tts`
(free, no API key). Defaults to the `en-US-AriaNeural` voice.
- `extract_video_frames(video_path, max_frames)` -- Extracts key frames from
video using `ffprobe` (to get duration) and `ffmpeg` (to extract JPEG frames).
---
### `scheduler.py` -- Scheduler and Heartbeat
**File:** `cheddahbot/scheduler.py`
Described in detail in a dedicated section below.
---
### `ui.py` -- Gradio Web Interface
**File:** `cheddahbot/ui.py`
Builds a Gradio Blocks application with:
- A model dropdown (populated from `llm.list_available_models()`) with a refresh
button and a "New Chat" button.
- A `gr.Chatbot` widget for the conversation (500px height, copy buttons).
- A `gr.MultimodalTextbox` supporting text, file upload, and microphone input.
- A "Voice Chat" accordion for record-and-respond audio interaction.
- A "Conversation History" accordion showing past conversations from the
database.
- A "Settings" accordion with guidance on editing identity and config files.
**Event wiring:**
- Model dropdown change calls `llm.switch_model()`.
- Refresh button re-discovers local models.
- Message submit calls `agent.respond()` in streaming mode, updating the chatbot
widget with each chunk.
- Audio files attached to messages are transcribed via `media.transcribe_audio()`
before being sent to the agent.
- Voice Chat records audio, transcribes it, gets a text response from the agent,
converts it to speech via `media.text_to_speech()`, and plays it back.
---
### `tools/__init__.py` -- Tool Registry
**File:** `cheddahbot/tools/__init__.py`
Described in detail in a dedicated section below.
---
### `skills/__init__.py` -- Skill Registry
**File:** `cheddahbot/skills/__init__.py`
Defines a parallel registry for "skills" (multi-step operations). Key pieces:
- `SkillDef` -- dataclass holding `name`, `description`, `func`.
- `@skill(name, description)` -- decorator that registers a skill in the global
`_SKILLS` dict.
- `load_skill(path)` -- dynamically loads a `.py` file as a module (triggering
any `@skill` decorators inside it).
- `discover_skills(skills_dir)` -- loads all `.py` files from the skills
directory.
- `list_skills()` / `run_skill(name, **kwargs)` -- query and execute skills.
---
### `providers/__init__.py` -- Provider Extensions
**File:** `cheddahbot/providers/__init__.py`
Reserved for future custom provider implementations. Currently empty.
---
## The Agent Loop in Detail
When `Agent.respond(user_input)` is called, the following sequence occurs:
```
1. ensure_conversation()
|-- Creates a new conversation in the DB if one doesn't exist
|
2. db.add_message(conv_id, "user", user_input)
|-- Persists the user's message
|
3. Build system prompt
|-- memory.get_context(user_input) --> memory context string
|-- tools.get_tools_schema() --> OpenAI-format JSON schemas
|-- tools.get_tools_description() --> human-readable tool list
|-- router.build_system_prompt(identity_dir, memory_context, tools_description)
|
4. Load conversation history from DB
|-- db.get_messages(conv_id, limit=max_context_messages)
|-- router.format_messages_for_llm(system_prompt, history, max_messages)
|
5. AGENT LOOP (up to MAX_TOOL_ITERATIONS = 10):
|
|-- llm.chat(messages, tools=tools_schema, stream=True)
| |-- Yields {"type":"text","content":"..."} chunks --> streamed to user
| |-- Yields {"type":"tool_use","name":"...","input":{...}} chunks
|
|-- If no tool_calls: store assistant message, BREAK
|
|-- If tool_calls present:
| |-- Store assistant message with tool_calls metadata
| |-- For each tool call:
| | |-- yield "Using tool: <name>" indicator
| | |-- tools.execute(name, input) --> result string
| | |-- yield tool result (truncated to 2000 chars)
| | |-- db.add_message(conv_id, "tool", result)
| | |-- Append result to messages as user message
| |-- Continue loop (LLM sees tool results and can respond or call more tools)
|
6. After loop: check if memory flush is needed
|-- If message count > flush_threshold:
| |-- memory.auto_flush(conv_id)
```
The loop allows the LLM to chain up to 10 consecutive tool calls before being
cut off. Each tool result is injected back into the conversation as a user
message so the LLM can reason about it in the next iteration.
---
## LLM Adapter Design
**File:** `cheddahbot/llm.py`
### Provider Routing
The `LLMAdapter` supports four provider paths. The active provider is determined
by examining the current model ID:
| Model ID Pattern | Provider | Backend |
|-----------------------------|---------------|----------------------------------|
| `claude-*` | `claude` | Claude Code CLI (subprocess) |
| `local/ollama/<model>` | `ollama` | Ollama HTTP API (OpenAI-compat) |
| `local/lmstudio/<model>` | `lmstudio` | LM Studio HTTP API (OpenAI-compat) |
| Anything else | `openrouter` | OpenRouter API (OpenAI-compat) |
### The `chat()` Method
This is the single entry point. It accepts a list of messages, an optional tools
schema, and a stream flag. It returns a generator yielding dictionaries:
- `{"type": "text", "content": "..."}` -- a text chunk to display.
- `{"type": "tool_use", "id": "...", "name": "...", "input": {...}}` -- a tool
invocation request.
### Claude Code CLI Path (`_chat_claude_sdk`)
For Claude models, CheddahBot shells out to the `claude` CLI binary (the Claude
Code SDK):
1. Separates system prompt, conversation history, and the latest user message
from the messages list.
2. Builds a full system prompt by appending conversation history under a
"Conversation So Far" heading.
3. Invokes `claude -p <prompt> --model <model> --output-format json --system-prompt <system>`.
4. The `CLAUDECODE` environment variable is stripped from the subprocess
environment to avoid nested-session errors.
5. Parses the JSON output and yields the `result` field as a text chunk.
6. On Windows, `shell=True` is used for compatibility with npm-installed
binaries.
### OpenAI-Compatible Path (`_chat_openai_sdk`)
For OpenRouter, Ollama, and LM Studio, the adapter uses the `openai` Python SDK:
1. `_resolve_endpoint(provider)` returns the base URL and API key:
- OpenRouter: `https://openrouter.ai/api/v1` with the configured API key.
- Ollama: `http://localhost:11434/v1` with dummy key `"ollama"`.
- LM Studio: `http://localhost:1234/v1` with dummy key `"lm-studio"`.
2. `_resolve_model_id(provider)` strips the `local/ollama/` or
`local/lmstudio/` prefix from the model ID.
3. Creates an `openai.OpenAI` client with the resolved base URL and API key.
4. In streaming mode: iterates over `client.chat.completions.create(stream=True)`,
accumulates tool call arguments across chunks (indexed by `tc.index`), yields
text deltas immediately, and yields completed tool calls at the end of the
stream.
5. In non-streaming mode: makes a single call and yields text and tool calls from
the response.
### Model Discovery
- `discover_local_models()` -- probes the Ollama tags endpoint and LM Studio
models endpoint (3-second timeout each) and returns `ModelInfo` objects.
- `list_available_models()` -- returns a combined list of hardcoded Claude
models, hardcoded OpenRouter models (if an API key is configured), and
dynamically discovered local models.
### Model Switching
`switch_model(model_id)` updates `current_model`. The `provider` property
re-evaluates on every access, so switching models also implicitly switches
providers.
---
## Memory System
**File:** `cheddahbot/memory.py`
### The 4 Layers
```
Layer 1: Identity -- identity/SOUL.md, identity/USER.md
(loaded by router.py into the system prompt)
Layer 2: Long-term -- memory/MEMORY.md
(persisted facts and instructions, appended over time)
Layer 3: Daily logs -- memory/YYYY-MM-DD.md
(timestamped entries per day, including auto-flush summaries)
Layer 4: Semantic -- memory/embeddings.db
(SQLite with vector embeddings for similarity search)
```
### How Memory Context is Built
`MemorySystem.get_context(query)` is called once per agent turn. It assembles a
string from:
1. **Long-term memory** -- the last 2000 characters of `MEMORY.md`.
2. **Today's log** -- the last 1500 characters of today's date file.
3. **Semantic search results** -- the top-k most similar entries to the user's
query, formatted as a bulleted list.
This string is injected into the system prompt by `router.py` under the heading
"Relevant Memory".
### Embedding and Search
- The embedding model is `all-MiniLM-L6-v2` from `sentence-transformers` (lazy
loaded, thread-safe via a lock).
- `_index_text(text, doc_id)` -- encodes the text into a vector and stores it in
`memory/embeddings.db` (table: `embeddings` with columns `id TEXT`, `text TEXT`,
`vector BLOB`).
- `search(query, top_k)` -- encodes the query, loads all vectors from the
database, computes cosine similarity against each one, sorts by score, and
returns the top-k results.
- If `sentence-transformers` is not installed, `_fallback_search()` performs
simple case-insensitive substring matching across all `.md` files in the memory
directory.
### Writing to Memory
- `remember(text)` -- appends a timestamped entry to `memory/MEMORY.md` and
indexes it for semantic search. Exposed to the LLM via the `remember_this`
tool.
- `log_daily(text)` -- appends a timestamped entry to today's daily log file and
indexes it. Exposed via the `log_note` tool.
### Auto-Flush
When `Agent.respond()` finishes, it checks `db.count_messages(conv_id)`. If the
count exceeds `config.memory.flush_threshold` (default 40):
1. `auto_flush(conv_id)` loads up to 200 messages.
2. All but the last 10 are selected for summarization.
3. A summary string is built from the selected messages (truncated to 1000
chars).
4. The summary is appended to the daily log via `log_daily()`.
This prevents conversations from growing unbounded while preserving context in
the daily log for future semantic search.
### Reindexing
`reindex_all()` clears all embeddings and re-indexes every line (longer than 10
characters) from every `.md` file in the memory directory. This can be called
to rebuild the search index from scratch.
---
## Tool System
**File:** `cheddahbot/tools/__init__.py` (registry) and `cheddahbot/tools/*.py`
(tool modules)
### The `@tool` Decorator
```python
from cheddahbot.tools import tool
@tool("my_tool_name", "Description of what this tool does", category="general")
def my_tool_name(param1: str, param2: int = 10) -> str:
return f"Result: {param1}, {param2}"
```
The decorator:
1. Creates a `ToolDef` object containing the function, name, description,
category, and auto-extracted parameter schema.
2. Registers it in the global `_TOOLS` dictionary keyed by name.
3. Attaches the `ToolDef` as `func._tool_def` on the original function.
### Parameter Schema Generation
`_extract_params(func)` inspects the function signature using `inspect`:
- Skips parameters named `self` or `ctx`.
- Maps type annotations to JSON Schema types: `str` -> `"string"`, `int` ->
`"integer"`, `float` -> `"number"`, `bool` -> `"boolean"`, `list` ->
`"array"`. Unannotated parameters default to `"string"`.
- Parameters without defaults are marked as required.
### Schema Output
`ToolDef.to_openai_schema()` returns the tool definition in OpenAI
function-calling format:
```json
{
"type": "function",
"function": {
"name": "tool_name",
"description": "...",
"parameters": {
"type": "object",
"properties": { ... },
"required": [ ... ]
}
}
}
```
### Auto-Discovery
When `ToolRegistry.__init__()` is called, `_discover_tools()` uses
`pkgutil.iter_modules` to find every `.py` file in `cheddahbot/tools/` (skipping
files starting with `_`). Each module is imported via `importlib.import_module`,
which triggers the `@tool` decorators and populates the global registry.
### Tool Execution
`ToolRegistry.execute(name, args)`:
1. Looks up the `ToolDef` in the global `_TOOLS` dict.
2. Inspects the function signature for a `ctx` parameter. If present, injects a
context dictionary containing `config`, `db`, `agent`, and `memory`.
3. Calls the function with the provided arguments.
4. Returns the result as a string (or `"Done."` if the function returns `None`).
5. Catches all exceptions and returns `"Tool error: ..."`.
### Meta-Tools
Two special tools enable runtime extensibility:
**`build_tool`** (in `cheddahbot/tools/build_tool.py`):
- Accepts `name`, `description`, and `code` (Python source using the `@tool`
decorator).
- Writes a new `.py` file into `cheddahbot/tools/`.
- Hot-imports the module via `importlib.import_module`, which triggers the
`@tool` decorator and registers the new tool immediately.
- If the import fails, the file is deleted.
**`build_skill`** (in `cheddahbot/tools/build_skill.py`):
- Accepts `name`, `description`, and `steps` (Python source using the `@skill`
decorator).
- Writes a new `.py` file into the configured `skills/` directory.
- Calls `skills.load_skill()` to dynamically import it.
---
## Scheduler and Heartbeat Design
**File:** `cheddahbot/scheduler.py`
The `Scheduler` class starts two daemon threads at application boot.
### Task Poller Thread
- Runs in `_poll_loop()`, sleeping for `poll_interval_seconds` (default 60)
between iterations.
- Each iteration calls `_run_due_tasks()`:
1. Queries `db.get_due_tasks()` for tasks where `next_run` is NULL or in the
past.
2. For each due task, calls `agent.respond_to_prompt(task["prompt"])` to
generate a response.
3. Logs the result via `db.log_task_run()`.
4. If the schedule is `"once:<datetime>"`, the task is disabled.
5. Otherwise, the schedule is treated as a cron expression: `croniter` is used
to calculate the next run time, which is saved via
`db.update_task_next_run()`.
### Heartbeat Thread
- Runs in `_heartbeat_loop()`, sleeping for `heartbeat_interval_minutes`
(default 30) between iterations.
- Waits 60 seconds before the first heartbeat to let the system initialize.
- Each iteration calls `_run_heartbeat()`:
1. Reads `identity/HEARTBEAT.md`.
2. Sends the checklist to the agent as a prompt: "HEARTBEAT CHECK. Review this
checklist and take action if needed."
3. If the response contains `"HEARTBEAT_OK"`, no action is logged.
4. Otherwise, the response is logged to the daily log via
`memory.log_daily()`.
### Thread Safety
Both threads are daemon threads (they die when the main process exits). The
`_stop_event` threading event can be set to gracefully shut down both loops. The
database layer uses thread-local connections, so concurrent access from the
scheduler threads and the Gradio request threads is safe.
---
## Database Schema
The SQLite database (`data/cheddahbot.db`) contains five tables:
### `conversations`
| Column | Type | Notes |
|--------------|------|--------------------|
| `id` | TEXT | Primary key (hex) |
| `title` | TEXT | Display title |
| `created_at` | TEXT | ISO 8601 UTC |
| `updated_at` | TEXT | ISO 8601 UTC |
### `messages`
| Column | Type | Notes |
|---------------|---------|--------------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `conv_id` | TEXT | Foreign key to `conversations.id` |
| `role` | TEXT | `"user"`, `"assistant"`, or `"tool"` |
| `content` | TEXT | Message body |
| `tool_calls` | TEXT | JSON array of `{name, input}` (nullable) |
| `tool_result` | TEXT | Name of the tool that produced this result (nullable) |
| `model` | TEXT | Model ID used for this response (nullable) |
| `created_at` | TEXT | ISO 8601 UTC |
Index: `idx_messages_conv` on `(conv_id, created_at)`.
### `scheduled_tasks`
| Column | Type | Notes |
|--------------|---------|---------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `name` | TEXT | Human-readable task name |
| `prompt` | TEXT | The prompt to send to the agent |
| `schedule` | TEXT | Cron expression or `"once:<datetime>"`|
| `enabled` | INTEGER | 1 = active, 0 = disabled |
| `next_run` | TEXT | ISO 8601 UTC (nullable) |
| `created_at` | TEXT | ISO 8601 UTC |
### `task_run_logs`
| Column | Type | Notes |
|---------------|---------|------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `task_id` | INTEGER | Foreign key to `scheduled_tasks.id`|
| `started_at` | TEXT | ISO 8601 UTC |
| `finished_at` | TEXT | ISO 8601 UTC (nullable) |
| `result` | TEXT | Agent response (nullable) |
| `error` | TEXT | Error message if failed (nullable) |
### `kv_store`
| Column | Type | Notes |
|---------|------|-----------------|
| `key` | TEXT | Primary key |
| `value` | TEXT | Arbitrary value |
### Embeddings Database
A separate SQLite file at `memory/embeddings.db` holds one table:
### `embeddings`
| Column | Type | Notes |
|----------|------|--------------------------------------|
| `id` | TEXT | Primary key (e.g. `"daily:2026-02-14:08:30"`) |
| `text` | TEXT | The original text that was embedded |
| `vector` | BLOB | Raw float32 bytes of the embedding vector |
---
## Identity Files
Three Markdown files in the `identity/` directory define the agent's personality,
user context, and background behavior.
### `identity/SOUL.md`
Defines the agent's personality, communication style, boundaries, and quirks.
This is loaded first into the system prompt, making it the most prominent
identity influence on every response.
Contents are read by `router.build_system_prompt()` at the beginning of each
agent turn.
### `identity/USER.md`
Contains a user profile template: name, technical level, primary language,
current projects, and communication preferences. The user edits this file to
customize how the agent addresses them and what context it assumes.
Loaded by `router.build_system_prompt()` immediately after SOUL.md.
### `identity/HEARTBEAT.md`
A checklist of items to review on each heartbeat cycle. The scheduler reads this
file and sends it to the agent as a prompt every `heartbeat_interval_minutes`
(default 30 minutes). The agent processes the checklist and either confirms
"HEARTBEAT_OK" or takes action and logs it.
### Loading Order in the System Prompt
The system prompt assembled by `router.build_system_prompt()` concatenates these
sections, separated by `\n\n---\n\n`:
1. SOUL.md contents
2. USER.md contents
3. Memory context (long-term + daily log + semantic search results)
4. Tools description (categorized list of available tools)
5. Core instructions (hardcoded behavioral directives)

View File

@ -1,61 +0,0 @@
# ClickUp Task Creation
## CLI Script
```bash
uv run python scripts/create_clickup_task.py --name "LINKS - keyword" --client "Client Name" \
--category "Link Building" --due-date 2026-03-18 --tag mar26 --time-estimate 2h \
--field "Keyword=keyword" --field "IMSURL=https://example.com" --field "LB Method=Cora Backlinks"
```
## Defaults
- Priority: High (2)
- Assignee: Bryan (10765627)
- Status: "to do"
- Due date format: YYYY-MM-DD
- Tag format: mmmYY (e.g. feb26, mar26)
## Custom Fields
Any field can be set via `--field "Name=Value"`. Dropdowns are auto-resolved by name (case-insensitive).
## Task Types
### Link Building
- **Prefix**: `LINKS - {keyword}`
- **Work Category**: "Link Building"
- **Required fields**: Keyword, IMSURL
- **LB Method**: default "Cora Backlinks"
- **CLIFlags**: only add `--tier1-count N` when count is specified
- **BrandedPlusRatio**: default to 0.7
- **CustomAnchors**: only if given a list of custom anchors
- **time estimate**: 2.5h
### On Page Optimization
- **Prefix**: `OPT - {keyword}`
- **Work Category**: "On Page Optimization"
- **Required fields**: Keyword, IMSURL
- **time estimate**: 3h
-
### Content Creation
- **Prefix**: `CREATE - {keyword}`
- **Work Category**: "Content Creation"
- **Required fields**: Keyword
- **time estimate**: 4h
### Press Release
- **Prefix**: `PR - {keyword}`
- **Required fields**: Keyword, IMSURL
- **Work Category**: "Press Release"
- **PR Topic**: if not provided, ask if there is a topic. it can be blank if they respond with none.
- **time estimate**: 1.5h
## Chat Tool
The `clickup_create_task` tool provides the same capabilities via CheddahBot UI. Arbitrary custom fields are passed as JSON via `custom_fields_json`.
## Client Folder Lookup
Tasks are created in the "Overall" list inside the client's folder. Folder name is matched case-insensitively.

View File

@ -1,110 +0,0 @@
# ntfy.sh Push Notifications Setup
CheddahBot sends push notifications to your phone and desktop via [ntfy.sh](https://ntfy.sh) when tasks complete, reports are ready, or errors occur.
## 1. Install the ntfy App
- **Android:** [Play Store](https://play.google.com/store/apps/details?id=io.heckel.ntfy)
- **iOS:** [App Store](https://apps.apple.com/us/app/ntfy/id1625396347)
- **Desktop:** Open [ntfy.sh](https://ntfy.sh) in your browser and enable browser notifications when prompted
## 2. Pick Topic Names
Topics are like channels. Anyone who knows the topic name can subscribe, so use random strings:
```
cheddahbot-a8f3k9x2m7
cheddahbot-errors-p4w2j6n8
```
Generate your own — any random string works. No account or registration needed.
## 3. Subscribe to Your Topics
**Phone app:**
1. Open the ntfy app
2. Tap the + button
3. Enter your topic name (e.g. `cheddahbot-a8f3k9x2m7`)
4. Server: `https://ntfy.sh` (default)
5. Repeat for your errors topic
**Browser:**
1. Go to [ntfy.sh](https://ntfy.sh)
2. Click "Subscribe to topic"
3. Enter the same topic names
4. Allow browser notifications when prompted
## 4. Add Topics to .env
Add these lines to your `.env` file in the CheddahBot root:
```
NTFY_TOPIC_HUMAN_ACTION=cheddahbot-a8f3k9x2m7
NTFY_TOPIC_ERRORS=cheddahbot-errors-p4w2j6n8
```
Replace with your actual topic names.
## 5. Restart CheddahBot
Kill the running instance and restart:
```bash
uv run python -m cheddahbot
```
You should see in the startup logs:
```
ntfy notifier initialized with 2 channel(s): human_action, errors
ntfy notifier subscribed to notification bus
```
## What Gets Notified
### human_action channel (high priority)
Notifications where you need to do something:
- Cora report finished and ready
- Press release completed
- Content outline ready for review
- Content optimization completed
- Link building pipeline finished
- Cora report distributed to inbox
### errors channel (urgent priority)
Notifications when something went wrong:
- ClickUp task failed or was skipped
- AutoCora job failed
- Link building pipeline error
- Content pipeline error
- Missing ClickUp field matches
- File copy failures
## Configuration
Channel routing is configured in `config.yaml` under the `ntfy:` section. Each channel has:
- `topic_env_var` — which env var holds the topic name
- `categories` — notification categories to listen to (`clickup`, `autocora`, `linkbuilding`, `content`)
- `include_patterns` — regex patterns the message must match (at least one)
- `exclude_patterns` — regex patterns that reject the message (takes priority over include)
- `priority` — ntfy priority level: `min`, `low`, `default`, `high`, `urgent`
- `tags` — emoji shortcodes shown on the notification (e.g. `white_check_mark`, `rotating_light`)
### Adding a New Channel
1. Add a new entry under `ntfy.channels` in `config.yaml`
2. Add the topic env var to `.env`
3. Subscribe to the topic in your ntfy app
4. Restart CheddahBot
### Privacy
The public ntfy.sh server has no authentication by default. Your topic name is the only security — use a long random string to make it unguessable. Alternatively:
- Create a free ntfy.sh account and set read/write ACLs on your topics
- Self-host ntfy (single binary) and set `server: http://localhost:8080` in config.yaml
### Disabling
Set `enabled: false` in the `ntfy:` section of `config.yaml`, or remove the env vars from `.env`.

View File

@ -1,43 +0,0 @@
# Scheduler Refactor Notes
## Issue: AutoCora Single-Day Window (found 2026-02-27)
**Symptom:** Task `86b8grf16` ("LINKS - anti vibration rubber mounts", due Feb 18) has been sitting in "to do" forever with no Cora report generated.
**Root cause:** `_find_qualifying_tasks()` in `tools/autocora.py` filters tasks to **exactly one calendar day** (the `target_date`, which defaults to today). The scheduler calls this daily with `today`:
```python
today = datetime.now(UTC).strftime("%Y-%m-%d")
result = submit_autocora_jobs(target_date=today, ctx=ctx)
```
If CheddahBot isn't running on the task's due date (or the DB is empty/wiped), the task is **permanently orphaned** — no catch-up, no retry, no visibility.
**Affected task types:** All three `cora_categories` — Link Building, On Page Optimization, Content Creation.
**What needs to change:** Auto-submit should also pick up overdue tasks (due date in the past, still "to do", no existing AutoCora job in KV store).
---
## Empty Database State (found 2026-02-27)
`cheddahbot.db` has zero rows in all tables (kv_store, notifications, scheduled_tasks, etc.). Either fresh DB or wiped. This means:
- No task state tracking is happening
- No AutoCora job submissions are recorded
- Folder watcher has no history
- All loops show no `last_run` timestamps
---
## Context: Claude Scheduled Tasks
Claude released scheduled tasks (2026-02-26). Need to evaluate whether parts of CheddahBot's scheduler (heartbeat, poll loop, ClickUp polling, folder watchers, AutoCora) could be replaced or augmented by Claude's native scheduling.
---
## Additional Issues to Investigate
- [ ] `auto_execute: false` on Link Building — is this intentional given the folder-watcher pipeline?
- [ ] Folder watcher at `Z:/cora-inbox` — does this path stay accessible?
- [ ] No dashboard/UI surfacing "tasks waiting for action" — stuck tasks are invisible
- [ ] AutoCora loop waits 30s before first poll, then runs every 5min — but auto-submit only checks today's tasks each cycle (redundant repeated calls)

View File

@ -1,417 +0,0 @@
# CheddahBot Task Pipeline Flows — Complete Reference
## ClickUp Statuses Used
These are the ClickUp task statuses that CheddahBot reads and writes:
| Status | Set By | Meaning |
|--------|--------|---------|
| `to do` | Human (or default) | Task is waiting to be picked up |
| `automation underway` | CheddahBot | Bot is actively working on this task |
| `running cora` | CheddahBot (AutoCora) | Cora report is being generated by external worker |
| `outline review` | CheddahBot (Content) | Phase 1 outline is ready for human review |
| `outline approved` | Human | Human reviewed the outline, ready for Phase 2 |
| `pr needs review` | CheddahBot (Press Release) | Press release pipeline finished, PRs ready for human review |
| `internal review` | CheddahBot (Content/OPT) | Content/OPT pipeline finished, deliverables ready for human review |
| `complete` | CheddahBot (Link Building) | Pipeline fully done |
| `error` | CheddahBot | Something failed, needs attention |
| `in progress` | (configured but not used in automation) | — |
**What CheddahBot polls for:** `["to do", "outline approved"]` (config.yaml line 45)
---
## ClickUp Custom Fields Used
| Field Name | Type | Used By | What It Holds |
|------------|------|---------|---------------|
| `Work Category` | Dropdown | All pipelines | Determines which pipeline runs: "Press Release", "Link Building", "On Page Optimization", "Content Creation" |
| `PR Topic` | Text | Press Release | Press release topic/keyword (e.g. "Peek Plastic") — required |
| `Customer` | Text | Press Release | Client/company name — required |
| `Keyword` | Text | Link Building, Content, OPT | Target SEO keyword |
| `IMSURL` | Text | All pipelines | Target page URL (money site) — required for Press Release |
| `SocialURL` | Text | Press Release | Branded/social URL for the PR |
| `LB Method` | Dropdown | Link Building | "Cora Backlinks" or other methods |
| `CustomAnchors` | Text | Link Building | Custom anchor text overrides |
| `BrandedPlusRatio` | Number | Link Building | Ratio for branded anchors (default 0.7) |
| `CLIFlags` | Text | Link Building, Content, OPT | Extra flags passed to tools (e.g., "service") |
| `CoraFile` | Text | Link Building | Path to Cora xlsx file |
**Tags:** Tasks are tagged with month in `mmmyy` format (e.g., `feb26`, `mar26`).
---
## Background Threads
CheddahBot runs 6 daemon threads. All start at boot and run until shutdown.
| Thread | Interval | What It Does |
|--------|----------|-------------|
| **poll** | 60 seconds | Runs cron-scheduled tasks from the database |
| **heartbeat** | 30 minutes | Reads HEARTBEAT.md checklist, takes action if needed |
| **clickup** | 20 minutes | Polls ClickUp for tasks to auto-execute (only Press Releases currently) |
| **folder_watch** | 40 minutes | Scans `//PennQnap1/SHARE1/cora-inbox` for .xlsx files → triggers Link Building |
| **autocora** | 5 minutes | Submits Cora jobs for today's tasks + polls for results |
| **content_watch** | 40 minutes | Scans `//PennQnap1/SHARE1/content-cora-inbox` for .xlsx files → triggers Content/OPT Phase 1 |
| **cora_distribute** | 40 minutes | Scans `//PennQnap1/SHARE1/Cora-For-Human` for .xlsx files → distributes to pipeline inboxes |
---
## Pipeline 1: PRESS RELEASE
**Work Category:** "Press Release"
**auto_execute:** TRUE — the only pipeline that runs automatically from ClickUp polling
**Tool:** `write_press_releases`
### Flow
```
CLICKUP POLL (every 20 min)
├─ Finds task with Work Category = "Press Release", status = "to do", due within 3 weeks
CHECK LOCAL DB
│ Key: clickup:task:{id}:state
│ If state = "executing" or "completed" or "failed" → SKIP (already handled)
SET STATUS → "automation underway"
│ ClickUp API: PUT /task/{id} status
│ Local DB: state = "executing"
STEP 1: Generate 7 Headlines (chat brain - GPT-4o-mini)
│ Uses configured chat model
│ Saves to: data/generated/press_releases/{company}/{slug}_headlines.txt
STEP 2: AI Judge Picks Best 2 (chat brain)
│ Filters out rule-violating headlines (colons, superlatives, etc.)
│ Falls back to first 2 if judge returns < 2
STEP 3: Write 2 Full Press Releases (execution brain - Claude Code CLI)
│ For each winning headline:
│ - Claude writes full 575-800 word PR
│ - Validates anchor phrase
│ - Saves .txt and .docx
│ - Uploads .docx to ClickUp as attachment
STEP 4: Generate JSON-LD Schemas (execution brain - Sonnet)
│ For each PR:
│ - Generates NewsArticle schema
│ - Saves .json file
SET STATUS → "internal review"
│ ClickUp API: comment with results + PUT status
│ Local DB: state = "completed"
DONE — Human reviews in ClickUp
```
### ClickUp Fields Read
- `PR Topic` → press release topic/keyword (required)
- `Customer` → company name in PR (required)
- `IMSURL` → target URL for anchor link (required)
- `SocialURL` → branded URL (optional)
### What Can Go Wrong
- **BUG: Crash mid-step → stuck forever.** DB says "executing", never retries. Manual reset needed.
- **BUG: DB says "completed" but ClickUp API failed → out of sync.** DB written before API call.
- **BUG: Attachment upload fails silently.** Task marked complete, files missing from ClickUp.
- Headline generation returns empty → tool exits with error, task marked "failed"
- Schema JSON invalid → warning logged but task still completes
---
## Pipeline 2: LINK BUILDING (Cora Backlinks)
**Work Category:** "Link Building"
**auto_execute:** FALSE — triggered by folder watcher, not ClickUp polling
**Tool:** `run_cora_backlinks`
### Full Lifecycle (3 stages)
```
STAGE A: AUTOCORA SUBMITS CORA JOB
══════════════════════════════════
AUTOCORA LOOP (every 5 min)
├─ Calls submit_autocora_jobs(target_date = today)
│ Finds tasks: Work Category in ["Link Building", "On Page Optimization", "Content Creation"]
│ status = "to do"
│ due date = TODAY (exact 24h window) ← ★ BUG: misses overdue tasks
├─ Groups tasks by Keyword (case-insensitive)
│ If same keyword across multiple tasks → one job covers all
├─ For each keyword group:
│ Check local DB: autocora:job:{keyword_lower}
│ If already submitted → SKIP
WRITE JOB FILE
│ Path: //PennQnap1/SHARE1/AutoCora/jobs/{job-id}.json
│ Content: {"keyword": "...", "url": "IMSURL", "task_ids": ["id1", "id2"]}
│ Local DB: autocora:job:{keyword} = {status: "submitted", job_id: "..."}
SET ALL TASK STATUSES → "automation underway"
STAGE B: EXTERNAL WORKER RUNS CORA (not CheddahBot code)
═════════════════════════════════════════════════════════
Worker on another machine:
│ Watches //PennQnap1/SHARE1/AutoCora/jobs/
│ Picks up .json, runs Cora SEO tool
│ Writes .xlsx report to Z:/cora-inbox/ ← auto-deposited
│ Writes //PennQnap1/SHARE1/AutoCora/results/{job-id}.result = "SUCCESS" or "FAILURE: reason"
STAGE C: AUTOCORA POLLS FOR RESULTS
════════════════════════════════════
AUTOCORA LOOP (every 5 min)
├─ Scans local DB for autocora:job:* with status = "submitted"
│ For each: checks if results/{job-id}.result exists
├─ If SUCCESS:
│ Local DB: status = "completed"
│ ClickUp: all task_ids → status = "running cora"
│ ClickUp: comment "Cora report completed for keyword: ..."
├─ If FAILURE:
│ Local DB: status = "failed"
│ ClickUp: all task_ids → status = "error"
│ ClickUp: comment with failure reason
└─ If no result file yet: skip, check again in 5 min
STAGE D: FOLDER WATCHER TRIGGERS LINK BUILDING
═══════════════════════════════════════════════
FOLDER WATCHER (every 60 min)
├─ Scans Z:/cora-inbox/ for .xlsx files
│ Skips: ~$ temp files, already-completed files (via local DB)
├─ For each new .xlsx:
│ Normalize filename: "anti-vibration-rubber-mounts.xlsx" → "anti vibration rubber mounts"
MATCH TO CLICKUP TASK
│ Queries all tasks in space with Work Category = "Link Building"
│ Fuzzy matches Keyword field against normalized filename:
│ - Exact match
│ - Substring match (either direction)
│ - >80% word overlap
├─ NO MATCH → local DB: status = "unmatched", notification sent, retry next scan
├─ MATCH FOUND but IMSURL empty → local DB: status = "blocked", ClickUp → "error"
SET STATUS → "automation underway"
STEP 1: Ingest CORA Report (Big-Link-Man subprocess)
│ Runs: E:/dev/Big-Link-Man/.venv/Scripts/python.exe main.py ingest-cora -f {xlsx} -n {keyword} ...
│ BLM parses xlsx, creates project, writes job file
│ Timeout: 30 minutes
│ ClickUp: comment "CORA report ingested. Project ID: ..."
STEP 2: Generate Content Batch (Big-Link-Man subprocess)
│ Runs: python main.py generate-batch -j {job_file} --continue-on-error
│ BLM generates content for each prospect
│ Moves job file to jobs/done/
SET STATUS → "complete"
│ ClickUp: comment with results
│ Move .xlsx to Z:/cora-inbox/processed/
│ Local DB: linkbuilding:watched:{filename} = {status: "completed"}
DONE
```
### ClickUp Fields Read
- `Keyword` → matches against .xlsx filename + used as project name
- `IMSURL` → money site URL (required)
- `LB Method` → must be "Cora Backlinks" or empty
- `CustomAnchors`, `BrandedPlusRatio`, `CLIFlags` → passed to BLM
### What Can Go Wrong
- **BUG: AutoCora only checks today's tasks.** Due date missed = never gets a Cora report.
- **BUG: Crash mid-step → stuck "executing".** Same as PR pipeline.
- No ClickUp task with matching Keyword → file sits unmatched, notification sent
- IMSURL empty → blocked, ClickUp set to "error"
- BLM subprocess timeout (30 min) or crash → task fails
- Network share offline → can't write job file or read results
### Retry Behavior
- "processing", "blocked", "unmatched" .xlsx files → retried on next scan (KV entry deleted)
- "completed", "failed" → never retried
---
## Pipeline 3: CONTENT CREATION
**Work Category:** "Content Creation"
**auto_execute:** FALSE — triggered by content folder watcher
**Tool:** `create_content` (two-phase)
### Flow
```
STAGE A: AUTOCORA SUBMITS CORA JOB (same as Link Building Stage A)
══════════════════════════════════════════════════════════════════
Same AutoCora loop, same BUG with today-only filtering.
Worker generates .xlsx → deposits in Z:/content-cora-inbox/
STAGE B: CONTENT WATCHER TRIGGERS PHASE 1
══════════════════════════════════════════
CONTENT WATCHER (every 60 min)
├─ Scans Z:/content-cora-inbox/ for .xlsx files
│ Same skip/retry logic as link building watcher
├─ Normalize filename, fuzzy match to ClickUp task
│ Matches: Work Category in ["Content Creation", "On Page Optimization"]
├─ NO MATCH → "unmatched", notification
PHASE 1: Research + Outline (execution brain - Claude Code CLI)
│ ★ BUG: Does NOT set "automation underway" status (link building watcher does)
│ Build prompt based on content type:
│ - If IMSURL present → "optimize existing page" (scrape it, analyze, outline improvements)
│ - If IMSURL empty → "new content" (competitor research, outline from scratch)
│ - If Cora .xlsx found → "use this Cora report for keyword targets and entities"
│ - If CLIFlags contains "service" → includes service page template
│ Claude Code runs: web searches, scrapes competitors, reads Cora report
│ Generates outline with entity recommendations
SAVE OUTLINE
│ Path: Z:/content-outlines/{keyword-slug}/outline.md
│ Local DB: clickup:task:{id}:state = {state: "outline_review", outline_path: "..."}
SET STATUS → "outline review"
│ ClickUp: comment "Outline ready for review"
│ ★ BUG: .xlsx NOT moved to processed/ (link building watcher moves files)
WAITING FOR HUMAN
│ Human opens outline at Z:/content-outlines/{slug}/outline.md
│ Human edits/approves
│ Human moves ClickUp task to "outline approved"
STAGE C: CLICKUP POLL TRIGGERS PHASE 2
═══════════════════════════════════════
CLICKUP POLL (every 20 min)
├─ Finds task with status = "outline approved" (in poll_statuses list)
├─ Check local DB: clickup:task:{id}:state
│ Sees state = "outline_review" → this means Phase 2 is ready
│ ★ BUG: If DB was wiped, no entry → runs Phase 1 AGAIN, overwrites outline
PHASE 2: Write Full Content (execution brain - Claude Code CLI)
│ Reads outline from path stored in local DB (outline_path)
│ ★ BUG: If outline file was deleted → Phase 2 fails every time, no recovery
│ Claude Code writes full content using the approved outline
│ Includes entity optimization, keyword density targets from Cora
SAVE FINAL CONTENT
│ Path: Z:/content-outlines/{keyword-slug}/final-content.md
│ Local DB: state = "completed"
SET STATUS → "internal review"
│ ClickUp: comment with content path
DONE — Human reviews final content
```
### ClickUp Fields Read
- `Keyword` → target keyword, used for Cora matching and content generation
- `IMSURL` → if present = optimization, if empty = new content
- `CLIFlags` → hints like "service" for service page template
### What Can Go Wrong
- **BUG: AutoCora only checks today → Cora report never generated for overdue tasks**
- **BUG: DB wipe → Phase 2 reruns Phase 1, destroys approved outline**
- **BUG: Outline file deleted → Phase 2 permanently fails**
- **BUG: No "automation underway" set during Phase 1 from watcher**
- **BUG: .xlsx not moved to processed/**
- Network share offline → can't save outline or read it back
---
## Pipeline 4: ON PAGE OPTIMIZATION
**Work Category:** "On Page Optimization"
**auto_execute:** FALSE
**Tool:** `create_content` (same as Content Creation)
### Flow
Identical to Content Creation except:
- Phase 1 prompt says "optimize existing page at {IMSURL}" instead of "create new content"
- Phase 1 scrapes the existing page first, then builds optimization outline
- IMSURL is always present (it's the page being optimized)
Same bugs apply.
---
## The Local DB (KV Store) — What It Tracks
| Key Pattern | What It Stores | Read By | Actually Needed? |
|---|---|---|---|
| `clickup:task:{id}:state` | Full task execution state (status, timestamps, outline_path, errors) | ClickUp poll dedup check, Phase 2 detection | **PARTIALLY** — outline_path is needed for Phase 2, but dedup could use ClickUp status instead |
| `autocora:job:{keyword}` | Job submission tracking (job_id, status, task_ids) | AutoCora result poller | **YES** — maps keyword to job_id for result file lookup |
| `linkbuilding:watched:{filename}` | File processing state (processing/completed/failed/unmatched/blocked) | Folder watcher scan | **YES** — prevents re-processing files |
| `content:watched:{filename}` | Same as above for content files | Content watcher scan | **YES** — prevents re-processing |
| `pipeline:status` | Current step text for UI ("Step 2/4: Judging...") | Gradio UI polling | **NO** — just a display string, could be in-memory |
| `linkbuilding:status` | Same for link building UI | Gradio UI polling | **NO** — same |
| `system:loop:*:last_run` (x6) | Timestamp of last loop run | Dashboard API | **NO** — informational only, never used in logic |
---
## Summary of All Bugs
| # | Bug | Severity | Pipelines Affected |
|---|-----|----------|-------------------|
| 1 | AutoCora only submits for today's due date | HIGH | Link Building, Content, OPT |
| 2 | DB wipe → Phase 2 reruns Phase 1 | HIGH | Content, OPT |
| 3 | Stuck "executing" after crash, no recovery | HIGH | All 4 |
| 4 | Content watcher missing "automation underway" | MEDIUM | Content, OPT |
| 5 | Content watcher doesn't move .xlsx to processed/ | MEDIUM | Content, OPT |
| 6 | KV written before ClickUp API → out of sync | MEDIUM | All 4 |
| 7 | Silent attachment upload failures | MEDIUM | Press Release |
| 8 | Phase 2 fails permanently if outline file gone | LOW | Content, OPT |

View File

@ -15,10 +15,6 @@ dependencies = [
"croniter>=2.0", "croniter>=2.0",
"edge-tts>=6.1", "edge-tts>=6.1",
"python-docx>=1.2.0", "python-docx>=1.2.0",
"openpyxl>=3.1.5",
"jinja2>=3.1.6",
"python-multipart>=0.0.22",
"sse-starlette>=3.3.3",
] ]
[build-system] [build-system]

View File

@ -1,94 +0,0 @@
"""Query ClickUp 'to do' tasks tagged feb26 in OPT/LINKS/Content categories."""
import sys
from datetime import datetime, timezone
from pathlib import Path
sys.stdout.reconfigure(line_buffering=True)
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from cheddahbot.config import load_config
from cheddahbot.clickup import ClickUpClient
CATEGORY_PREFIXES = ("opt", "link", "content", "ai content")
TAG_FILTER = "feb26"
def ms_to_date(ms_str: str) -> str:
if not ms_str:
return ""
try:
ts = int(ms_str) / 1000
return datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%m/%d")
except (ValueError, OSError):
return ""
def main():
cfg = load_config()
if not cfg.clickup.api_token or not cfg.clickup.space_id:
print("ERROR: CLICKUP_API_TOKEN or CLICKUP_SPACE_ID not set.")
return
client = ClickUpClient(
api_token=cfg.clickup.api_token,
workspace_id=cfg.clickup.workspace_id,
task_type_field_name=cfg.clickup.task_type_field_name,
)
try:
# Fetch all 'to do' tasks across the space
tasks = client.get_tasks_from_space(cfg.clickup.space_id, statuses=["to do"])
# Filter by feb26 tag
tagged = [t for t in tasks if TAG_FILTER in [tag.lower() for tag in t.tags]]
if not tagged:
all_tags = set()
for t in tasks:
all_tags.update(t.tags)
print(f"No tasks with tag '{TAG_FILTER}'. Tags seen: {sorted(all_tags)}")
print(f"Total 'to do' tasks found: {len(tasks)}")
return
# Filter to OPT/LINKS/Content categories (by task name, Work Category, or list name)
def is_target_category(t):
name_lower = t.name.lower().strip()
wc = (t.custom_fields.get("Work Category") or "").lower()
ln = (t.list_name or "").lower()
for prefix in CATEGORY_PREFIXES:
if name_lower.startswith(prefix) or prefix in wc or prefix in ln:
return True
return False
filtered = [t for t in tagged if is_target_category(t)]
skipped = [t for t in tagged if not is_target_category(t)]
# Sort by due date (oldest first), tasks with no due date go last
filtered.sort(key=lambda t: int(t.due_date) if t.due_date else float("inf"))
top = filtered[:10]
# Build table
print(f"feb26-tagged 'to do' tasks — OPT / LINKS / Content (top 10, oldest first)")
print(f"\n{'#':>2} | {'ID':<11} | {'Keyword/Name':<50} | {'Due':<6} | {'Customer':<25} | Tags")
print("-" * 120)
for i, t in enumerate(top, 1):
customer = t.custom_fields.get("Customer", "") or ""
due = ms_to_date(t.due_date)
tags = ", ".join(t.tags)
name = t.name[:50]
print(f"{i:>2} | {t.id:<11} | {name:<50} | {due:<6} | {customer:<25} | {tags}")
print(f"\nShowing {len(top)} of {len(filtered)} OPT/LINKS/Content tasks ({len(tagged)} total feb26-tagged).")
if skipped:
print(f"\nSkipped {len(skipped)} non-OPT/LINKS/Content tasks:")
for t in skipped:
print(f" - {t.name} ({t.id})")
finally:
client.close()
if __name__ == "__main__":
main()

View File

@ -1,120 +0,0 @@
"""Query ClickUp 'to do' tasks tagged feb26 in OPT/LINKS/Content categories."""
import sys
from pathlib import Path
from datetime import datetime, timezone
# Add project root to path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from cheddahbot.config import load_config
from cheddahbot.clickup import ClickUpClient
def ms_to_date(ms_str: str) -> str:
"""Convert Unix-ms timestamp string to YYYY-MM-DD."""
if not ms_str:
return ""
try:
ts = int(ms_str) / 1000
return datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%Y-%m-%d")
except (ValueError, OSError):
return ""
def main():
cfg = load_config()
if not cfg.clickup.api_token or not cfg.clickup.space_id:
print("ERROR: CLICKUP_API_TOKEN or CLICKUP_SPACE_ID not set.")
return
client = ClickUpClient(
api_token=cfg.clickup.api_token,
workspace_id=cfg.clickup.workspace_id,
task_type_field_name=cfg.clickup.task_type_field_name,
)
# Step 1: Get folders, find OPT/LINKS/Content
target_folders = {"opt", "links", "content"}
try:
folders = client.get_folders(cfg.clickup.space_id)
except Exception as e:
print(f"ERROR fetching folders: {e}")
client.close()
return
print(f"All folders: {[f['name'] for f in folders]}")
matched_lists = [] # (list_id, list_name, folder_name)
for folder in folders:
if folder["name"].lower() in target_folders:
for lst in folder["lists"]:
matched_lists.append((lst["id"], lst["name"], folder["name"]))
if not matched_lists:
print(f"No folders matching {target_folders}. Falling back to full space scan.")
try:
tasks = client.get_tasks_from_space(cfg.clickup.space_id, statuses=["to do"])
finally:
client.close()
else:
print(f"Querying lists: {[(ln, fn) for _, ln, fn in matched_lists]}")
tasks = []
for list_id, list_name, folder_name in matched_lists:
try:
batch = client.get_tasks(list_id, statuses=["to do"])
# Stash folder name on each task for display
for t in batch:
t._folder = folder_name
tasks.extend(batch)
except Exception as e:
print(f" Error fetching {list_name}: {e}")
client.close()
print(f"Total 'to do' tasks from target folders: {len(tasks)}")
# Filter by "feb26" tag (case-insensitive)
tagged = [t for t in tasks if any(tag.lower() == "feb26" for tag in t.tags)]
if not tagged:
print(f"No 'to do' tasks with 'feb26' tag found.")
all_tags = set()
for t in tasks:
all_tags.update(t.tags)
print(f"Tags found across all to-do tasks: {sorted(all_tags)}")
return
filtered = tagged
# Sort by due date (oldest first), tasks without due date go last
def sort_key(t):
if t.due_date:
return (0, int(t.due_date))
return (1, 0)
filtered.sort(key=sort_key)
# Take top 10
top10 = filtered[:10]
# Build table
print(f"\n## ClickUp 'to do' — feb26 tag — OPT/LINKS/Content ({len(filtered)} total, showing top 10)\n")
print(f"{'#':<3} | {'ID':<12} | {'Keyword/Name':<40} | {'Due':<12} | {'Customer':<20} | Tags")
print(f"{''*3} | {''*12} | {''*40} | {''*12} | {''*20} | {''*15}")
for i, t in enumerate(top10, 1):
customer = t.custom_fields.get("Customer", "") or ""
due = ms_to_date(t.due_date)
tags = ", ".join(t.tags) if t.tags else ""
name = t.name[:38] + ".." if len(t.name) > 40 else t.name
print(f"{i:<3} | {t.id:<12} | {name:<40} | {due:<12} | {customer:<20} | {tags}")
print(f"\nCategory breakdown:")
from collections import Counter
cats = Counter(t.task_type for t in filtered)
for cat, count in cats.most_common():
print(f" {cat or '(none)'}: {count}")
if __name__ == "__main__":
main()

View File

@ -1,184 +0,0 @@
"""CLI script to create a ClickUp task in a client's Overall list.
Usage:
uv run python scripts/create_clickup_task.py --name "Task" --client "Client"
uv run python scripts/create_clickup_task.py --name "LB" --client "Acme" \\
--category "Link Building" --due-date 2026-03-11 --tag feb26 \\
--field "Keyword=some keyword" --field "CLIFlags=--tier1-count 5"
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from datetime import UTC, datetime
from pathlib import Path
# Add project root to path so we can import cheddahbot
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from dotenv import load_dotenv
from cheddahbot.clickup import ClickUpClient
DEFAULT_ASSIGNEE = 10765627 # Bryan Bigari
def _date_to_unix_ms(date_str: str) -> int:
"""Convert YYYY-MM-DD to Unix milliseconds (noon UTC).
Noon UTC ensures the date displays correctly in US timezones.
"""
dt = datetime.strptime(date_str, "%Y-%m-%d").replace(
hour=12, tzinfo=UTC
)
return int(dt.timestamp() * 1000)
def _parse_time_estimate(s: str) -> int:
"""Parse a human time string like '2h', '30m', '1h30m' to ms."""
import re
total_min = 0
match = re.match(r"(?:(\d+)h)?(?:(\d+)m)?$", s.strip())
if not match or not any(match.groups()):
raise ValueError(f"Invalid time estimate: '{s}' (use e.g. '2h', '30m', '1h30m')")
if match.group(1):
total_min += int(match.group(1)) * 60
if match.group(2):
total_min += int(match.group(2))
return total_min * 60 * 1000
def main():
load_dotenv()
parser = argparse.ArgumentParser(description="Create a ClickUp task")
parser.add_argument("--name", required=True, help="Task name")
parser.add_argument(
"--client", required=True, help="Client folder name"
)
parser.add_argument(
"--category", default="", help="Work Category dropdown value"
)
parser.add_argument("--description", default="", help="Task description")
parser.add_argument(
"--status", default="to do", help="Initial status (default: 'to do')"
)
parser.add_argument(
"--due-date", default="", help="Due date as YYYY-MM-DD"
)
parser.add_argument(
"--tag", action="append", default=[], help="Tag (mmmYY, repeatable)"
)
parser.add_argument(
"--field",
action="append",
default=[],
help="Custom field as Name=Value (repeatable)",
)
parser.add_argument(
"--priority",
type=int,
default=2,
help="Priority: 1=Urgent, 2=High, 3=Normal, 4=Low (default: 2)",
)
parser.add_argument(
"--assignee",
type=int,
action="append",
default=[],
help="ClickUp user ID (default: Bryan 10765627)",
)
parser.add_argument(
"--time-estimate",
default="",
help="Time estimate (e.g. '2h', '30m', '1h30m')",
)
args = parser.parse_args()
api_token = os.environ.get("CLICKUP_API_TOKEN", "")
space_id = os.environ.get("CLICKUP_SPACE_ID", "")
if not api_token:
print("Error: CLICKUP_API_TOKEN not set", file=sys.stderr)
sys.exit(1)
if not space_id:
print("Error: CLICKUP_SPACE_ID not set", file=sys.stderr)
sys.exit(1)
# Parse custom fields
custom_fields: dict[str, str] = {}
for f in args.field:
if "=" not in f:
print(f"Error: --field must be Name=Value, got: {f}", file=sys.stderr)
sys.exit(1)
name, value = f.split("=", 1)
custom_fields[name] = value
client = ClickUpClient(api_token=api_token)
try:
# Find the client's Overall list
list_id = client.find_list_in_folder(space_id, args.client)
if not list_id:
msg = f"Error: No folder '{args.client}' with 'Overall' list"
print(msg, file=sys.stderr)
sys.exit(1)
# Build create_task kwargs
create_kwargs: dict = {
"list_id": list_id,
"name": args.name,
"description": args.description,
"status": args.status,
}
if args.due_date:
create_kwargs["due_date"] = _date_to_unix_ms(args.due_date)
if args.tag:
create_kwargs["tags"] = args.tag
create_kwargs["priority"] = args.priority
create_kwargs["assignees"] = args.assignee or [DEFAULT_ASSIGNEE]
if args.time_estimate:
create_kwargs["time_estimate"] = _parse_time_estimate(
args.time_estimate
)
# Create the task
result = client.create_task(**create_kwargs)
task_id = result.get("id", "")
# Set Client dropdown field
client.set_custom_field_smart(task_id, list_id, "Client", args.client)
# Set Work Category if provided
if args.category:
client.set_custom_field_smart(
task_id, list_id, "Work Category", args.category
)
# Set any additional custom fields
for field_name, field_value in custom_fields.items():
ok = client.set_custom_field_smart(
task_id, list_id, field_name, field_value
)
if not ok:
print(
f"Warning: Failed to set '{field_name}'",
file=sys.stderr,
)
print(json.dumps({
"id": task_id,
"name": args.name,
"client": args.client,
"url": result.get("url", ""),
"status": args.status,
}, indent=2))
finally:
client.close()
if __name__ == "__main__":
main()

View File

@ -1,97 +0,0 @@
"""Query ClickUp for feb26-tagged to-do tasks in OPT/LINKS/Content categories."""
from datetime import datetime, UTC
from cheddahbot.config import load_config
from cheddahbot.clickup import ClickUpClient
cfg = load_config()
client = ClickUpClient(
api_token=cfg.clickup.api_token,
workspace_id=cfg.clickup.workspace_id,
task_type_field_name=cfg.clickup.task_type_field_name,
)
tasks = client.get_tasks_from_overall_lists(cfg.clickup.space_id, statuses=["to do"])
client.close()
# Filter: tagged feb26
feb26 = [t for t in tasks if "feb26" in t.tags]
# Filter: OPT / LINKS / Content categories (by Work Category or name prefix)
def is_target(t):
cat = (t.task_type or "").lower()
name = t.name.upper()
if cat in ("on page optimization", "link building", "content creation"):
return True
if name.startswith("OPT") or name.startswith("LINKS") or name.startswith("NEW -"):
return True
return False
filtered = [t for t in feb26 if is_target(t)]
# Sort by due date ascending (no due date = sort last)
def sort_key(t):
if t.due_date:
return int(t.due_date)
return float("inf")
filtered.sort(key=sort_key)
top10 = filtered[:10]
def fmt_due(ms_str):
if not ms_str:
return "No due"
ts = int(ms_str) / 1000
return datetime.fromtimestamp(ts, tz=UTC).strftime("%b %d")
def fmt_customer(t):
c = t.custom_fields.get("Customer", "")
if c and str(c) != "None":
return str(c)
return t.list_name
def fmt_cat(t):
cat = t.task_type
name = t.name.upper()
if not cat or cat.strip() == "":
if name.startswith("LINKS"):
return "LINKS"
elif name.startswith("OPT"):
return "OPT"
elif name.startswith("NEW"):
return "Content"
return "?"
mapping = {
"On Page Optimization": "OPT",
"Link Building": "LINKS",
"Content Creation": "Content",
}
return mapping.get(cat, cat)
def fmt_tags(t):
return ", ".join(t.tags) if t.tags else ""
print(f"## feb26 To-Do: OPT / LINKS / Content ({len(filtered)} total, showing top 10 oldest)")
print()
print("| # | ID | Keyword/Name | Due | Customer | Tags |")
print("|---|-----|-------------|-----|----------|------|")
for i, t in enumerate(top10, 1):
name = t.name[:55]
tid = t.id
due = fmt_due(t.due_date)
cust = fmt_customer(t)
tags = fmt_tags(t)
print(f"| {i} | {tid} | {name} | {due} | {cust} | {tags} |")
if len(filtered) > 10:
print()
remaining = filtered[10:]
print(f"### Remaining {len(remaining)} tasks:")
print("| # | ID | Keyword/Name | Due | Customer | Tags |")
print("|---|-----|-------------|-----|----------|------|")
for i, t in enumerate(remaining, 11):
name = t.name[:55]
print(f"| {i} | {t.id} | {name} | {fmt_due(t.due_date)} | {fmt_customer(t)} | {fmt_tags(t)} |")
print()
print(f"*{len(filtered)} matching tasks, {len(feb26)} total feb26 tasks, {len(tasks)} total to-do*")

View File

@ -1,87 +0,0 @@
"""Query ClickUp 'to do' tasks tagged feb26 in OPT/LINKS/Content categories."""
import os
import sys
from datetime import datetime, timezone
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from dotenv import load_dotenv
load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
from cheddahbot.clickup import ClickUpClient
TOKEN = os.getenv("CLICKUP_API_TOKEN", "")
SPACE_ID = os.getenv("CLICKUP_SPACE_ID", "")
if not TOKEN or not SPACE_ID:
print("ERROR: CLICKUP_API_TOKEN and CLICKUP_SPACE_ID must be set in .env")
sys.exit(1)
CATEGORIES = {"On Page Optimization", "Content Creation", "Link Building"}
TAG_FILTER = "feb26"
client = ClickUpClient(api_token=TOKEN, workspace_id="", task_type_field_name="Work Category")
print(f"Querying ClickUp space {SPACE_ID} for 'to do' tasks...")
tasks = client.get_tasks_from_space(SPACE_ID, statuses=["to do"])
client.close()
print(f"Total 'to do' tasks found: {len(tasks)}")
# Filter by feb26 tag
tagged = [t for t in tasks if TAG_FILTER in [tag.lower() for tag in t.tags]]
print(f"Tasks with '{TAG_FILTER}' tag: {len(tagged)}")
# Filter by Work Category (OPT / LINKS / Content)
filtered = []
for t in tagged:
cat = (t.custom_fields.get("Work Category") or t.task_type or "").strip()
if cat in CATEGORIES:
filtered.append(t)
if not filtered and tagged:
# Show what categories exist so we can refine
cats_found = set()
for t in tagged:
cats_found.add(t.custom_fields.get("Work Category") or t.task_type or "(none)")
print(f"\nNo tasks matched categories {CATEGORIES}.")
print(f"Work Categories found on feb26-tagged tasks: {cats_found}")
print("\nShowing ALL feb26-tagged tasks instead:\n")
filtered = tagged
# Sort by due date (oldest first), tasks without due date go last
def sort_key(t):
if t.due_date:
return int(t.due_date)
return float("inf")
filtered.sort(key=sort_key)
# Take top 10
top = filtered[:10]
# Format table
def fmt_due(raw_due: str) -> str:
if not raw_due:
return ""
try:
ts = int(raw_due) / 1000
return datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%m/%d")
except (ValueError, OSError):
return raw_due
def fmt_customer(t) -> str:
return t.custom_fields.get("Customer", "") or ""
print(f"\n{'#':<3} | {'ID':<12} | {'Keyword/Name':<45} | {'Cat':<15} | {'Due':<6} | {'Customer':<20} | Tags")
print("-" * 120)
for i, t in enumerate(top, 1):
tags_str = ", ".join(t.tags)
name = t.name[:45]
cat = t.custom_fields.get("Work Category") or t.task_type or ""
print(f"{i:<3} | {t.id:<12} | {name:<45} | {cat:<15} | {fmt_due(t.due_date):<6} | {fmt_customer(t):<20} | {tags_str}")
print(f"\nTotal shown: {len(top)} of {len(filtered)} matching tasks")

View File

@ -1,64 +0,0 @@
"""Find all Press Release tasks due in February 2026, any status."""
import logging
from datetime import UTC, datetime
logging.basicConfig(level=logging.WARNING)
from cheddahbot.config import load_config
from cheddahbot.clickup import ClickUpClient
import json
config = load_config()
client = ClickUpClient(
api_token=config.clickup.api_token,
workspace_id=config.clickup.workspace_id,
task_type_field_name=config.clickup.task_type_field_name,
)
space_id = config.clickup.space_id
list_ids = client.get_list_ids_from_space(space_id)
field_filter = client.discover_field_filter(
next(iter(list_ids)), config.clickup.task_type_field_name
)
pr_opt_id = field_filter["options"]["Press Release"]
custom_fields_filter = json.dumps(
[{"field_id": field_filter["field_id"], "operator": "ANY", "value": [pr_opt_id]}]
)
# February 2026 window
feb_start = int(datetime(2026, 2, 1, tzinfo=UTC).timestamp() * 1000)
feb_end = int(datetime(2026, 3, 1, tzinfo=UTC).timestamp() * 1000)
# Query with broad statuses, include closed
tasks = client.get_tasks_from_space(
space_id,
custom_fields=custom_fields_filter,
)
# Filter for due in February 2026
feb_prs = []
for t in tasks:
if t.task_type != "Press Release":
continue
if not t.due_date:
continue
try:
due_ms = int(t.due_date)
if feb_start <= due_ms < feb_end:
feb_prs.append(t)
except (ValueError, TypeError):
continue
print(f"\nPress Release tasks due in February 2026: {len(feb_prs)}\n")
for t in feb_prs:
due_dt = datetime.fromtimestamp(int(t.due_date) / 1000, tz=UTC)
due = due_dt.strftime("%Y-%m-%d")
tags_str = ", ".join(t.tags) if t.tags else "(none)"
customer = t.custom_fields.get("Customer", "?")
imsurl = t.custom_fields.get("IMSURL", "")
print(f" [{t.status:20s}] {t.name}")
print(f" id={t.id} due={due} tags={tags_str}")
print(f" customer={customer} imsurl={imsurl or '(none)'}")
print()

View File

@ -1,61 +0,0 @@
"""Find all feb26-tagged Press Release tasks regardless of due date or status."""
import logging
from datetime import UTC, datetime
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s", datefmt="%H:%M:%S")
from cheddahbot.config import load_config
from cheddahbot.clickup import ClickUpClient
config = load_config()
client = ClickUpClient(
api_token=config.clickup.api_token,
workspace_id=config.clickup.workspace_id,
task_type_field_name=config.clickup.task_type_field_name,
)
space_id = config.clickup.space_id
# Query ALL statuses (no status filter, no due date filter) but filter by Press Release
list_ids = client.get_list_ids_from_space(space_id)
field_filter = client.discover_field_filter(
next(iter(list_ids)), config.clickup.task_type_field_name
)
import json
pr_opt_id = field_filter["options"]["Press Release"]
custom_fields_filter = json.dumps(
[{"field_id": field_filter["field_id"], "operator": "ANY", "value": [pr_opt_id]}]
)
# Get tasks with NO status filter and NO due date filter
tasks = client.get_tasks_from_space(
space_id,
statuses=["to do", "outline approved", "in progress", "automation underway"],
custom_fields=custom_fields_filter,
)
# Filter for feb26 tag
feb26_tasks = [t for t in tasks if "feb26" in t.tags]
all_pr = [t for t in tasks if t.task_type == "Press Release"]
print(f"\n{'='*70}")
print(f"Total tasks returned: {len(tasks)}")
print(f"Press Release tasks: {len(all_pr)}")
print(f"feb26-tagged PR tasks: {len(feb26_tasks)}")
print(f"{'='*70}\n")
for t in all_pr:
due = ""
if t.due_date:
try:
due_dt = datetime.fromtimestamp(int(t.due_date) / 1000, tz=UTC)
due = due_dt.strftime("%Y-%m-%d")
except (ValueError, TypeError):
due = t.due_date
tags_str = ", ".join(t.tags) if t.tags else "(no tags)"
customer = t.custom_fields.get("Customer", "?")
print(f" [{t.status:20s}] {t.name}")
print(f" id={t.id} due={due or '(none)'} tags={tags_str} customer={customer}")
print()

View File

@ -1,161 +0,0 @@
"""Migrate ClickUp 'Customer' (list-level) → 'Client' (space-level) field.
This script does NOT create the field you must create the space-level "Client"
dropdown manually in ClickUp UI first, using the company names this script prints.
Steps:
1. Fetch all folders, filter to those with an 'Overall' list
2. Print sorted company names for dropdown creation
3. Pause for you to create the field in ClickUp UI
4. Discover the new 'Client' field's UUID + option IDs
5. Set 'Client' on every active task (folder name as value)
6. Report results
Usage:
DRY_RUN=1 uv run python scripts/migrate_client_field.py # preview only
uv run python scripts/migrate_client_field.py # live run
"""
from __future__ import annotations
import os
import sys
import time
from pathlib import Path
# Allow running from repo root
_root = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_root))
from dotenv import load_dotenv
load_dotenv(_root / ".env")
from cheddahbot.clickup import ClickUpClient
# ── Config ──────────────────────────────────────────────────────────────────
DRY_RUN = os.environ.get("DRY_RUN", "0") not in ("0", "false", "")
NEW_FIELD_NAME = "Client"
API_TOKEN = os.environ.get("CLICKUP_API_TOKEN", "")
SPACE_ID = os.environ.get("CLICKUP_SPACE_ID", "")
if not API_TOKEN:
sys.exit("ERROR: CLICKUP_API_TOKEN env var is required")
if not SPACE_ID:
sys.exit("ERROR: CLICKUP_SPACE_ID env var is required")
def main() -> None:
client = ClickUpClient(api_token=API_TOKEN)
# 1. Get folders, filter to those with an Overall list
print(f"\n{'=' * 60}")
print(f" Migrate to '{NEW_FIELD_NAME}' field -- Space {SPACE_ID}")
print(f" Mode: {'DRY RUN' if DRY_RUN else 'LIVE'}")
print(f"{'=' * 60}\n")
folders = client.get_folders(SPACE_ID)
print(f"Found {len(folders)} folders:\n")
client_folders = []
for f in folders:
overall = next(
(lst for lst in f["lists"] if lst["name"].lower() == "overall"), None
)
if overall:
print(f" {f['name']:35s} Overall list: {overall['id']}")
client_folders.append({"name": f["name"], "overall_id": overall["id"]})
else:
print(f" {f['name']:35s} [SKIP - no Overall list]")
if not client_folders:
sys.exit("\nNo client folders with Overall lists found.")
# 2. Print company names for dropdown creation
option_names = sorted(cf["name"] for cf in client_folders)
print(f"\n--- Dropdown options for '{NEW_FIELD_NAME}' ({len(option_names)}) ---")
for name in option_names:
print(f" {name}")
# 3. Build plan: fetch active tasks per folder
print("\nFetching active tasks from Overall lists ...")
plan: list[dict] = []
for cf in client_folders:
tasks = client.get_tasks(cf["overall_id"], include_closed=False)
plan.append({
"folder_name": cf["name"],
"list_id": cf["overall_id"],
"tasks": tasks,
})
total_tasks = sum(len(p["tasks"]) for p in plan)
print(f"\n--- Update Plan (active tasks only) ---")
for p in plan:
print(f" {p['folder_name']:35s} {len(p['tasks']):3d} tasks")
print(f" {'TOTAL':35s} {total_tasks:3d} tasks")
if DRY_RUN:
print("\n** DRY RUN -- no changes made. Unset DRY_RUN to execute. **\n")
return
# 4. Field should already be created in ClickUp UI
print(f"\nLooking for space-level '{NEW_FIELD_NAME}' field ...")
# 5. Discover the field UUID + option IDs
first_list_id = client_folders[0]["overall_id"]
print(f"\nDiscovering '{NEW_FIELD_NAME}' field UUID and option IDs ...")
field_info = client.discover_field_filter(first_list_id, NEW_FIELD_NAME)
if field_info is None:
sys.exit(
f"\nERROR: Could not find '{NEW_FIELD_NAME}' field on list {first_list_id}.\n"
f"Make sure you created it as a SPACE-level field (visible to all lists)."
)
field_id = field_info["field_id"]
option_map = field_info["options"] # {name: uuid}
print(f" Field ID: {field_id}")
print(f" Options found: {len(option_map)}")
# Verify all folder names have matching options
missing = [cf["name"] for cf in client_folders if cf["name"] not in option_map]
if missing:
print(f"\n WARNING: These folder names have no matching dropdown option:")
for name in missing:
print(f" - {name}")
print(" Tasks in these folders will be SKIPPED.")
# 6. Set Client field on each task
updated = 0
skipped = 0
failed = 0
for p in plan:
folder_name = p["folder_name"]
opt_id = option_map.get(folder_name)
if not opt_id:
skipped += len(p["tasks"])
print(f"\n SKIP: '{folder_name}' -- no matching option")
continue
print(f"\nUpdating {len(p['tasks'])} tasks in '{folder_name}' ...")
for task in p["tasks"]:
ok = client.set_custom_field_value(task.id, field_id, opt_id)
if ok:
updated += 1
else:
failed += 1
print(f" FAILED: task {task.id} ({task.name})")
time.sleep(0.15)
print(f"\n{'=' * 60}")
print(f" Done! Updated: {updated} | Skipped: {skipped} | Failed: {failed}")
print(f"{'=' * 60}")
print(f"\n Next steps:")
print(f" 1. Verify tasks in ClickUp have the '{NEW_FIELD_NAME}' field set correctly")
print(f" 2. Update config.yaml: change 'Customer''{NEW_FIELD_NAME}' in field_mapping")
print(f" 3. Test CheddahBot with the new field")
print(f" 4. Delete the old list-level 'Customer' fields from ClickUp\n")
if __name__ == "__main__":
main()

View File

@ -1,102 +0,0 @@
"""Query ClickUp 'to do' tasks tagged 'feb26' in OPT/LINKS/Content categories."""
from __future__ import annotations
import os
import sys
from datetime import datetime, timezone
from pathlib import Path
_root = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_root))
from dotenv import load_dotenv
load_dotenv(_root / ".env")
from cheddahbot.clickup import ClickUpClient
API_TOKEN = os.environ.get("CLICKUP_API_TOKEN", "")
SPACE_ID = os.environ.get("CLICKUP_SPACE_ID", "")
if not API_TOKEN:
sys.exit("ERROR: CLICKUP_API_TOKEN env var is required")
if not SPACE_ID:
sys.exit("ERROR: CLICKUP_SPACE_ID env var is required")
# Work Category values to include (case-insensitive partial match)
CATEGORY_FILTERS = ["opt", "link", "content"]
TAG_FILTER = "feb26"
def ms_to_date(ms_str: str) -> str:
"""Convert Unix-ms timestamp string to YYYY-MM-DD."""
if not ms_str:
return ""
try:
ts = int(ms_str) / 1000
return datetime.fromtimestamp(ts, tz=timezone.utc).strftime("%Y-%m-%d")
except (ValueError, OSError):
return ms_str
def main() -> None:
client = ClickUpClient(api_token=API_TOKEN, task_type_field_name="Work Category")
print(f"Fetching 'to do' tasks from space {SPACE_ID} ...")
tasks = client.get_tasks_from_overall_lists(SPACE_ID, statuses=["to do"])
print(f"Total 'to do' tasks: {len(tasks)}")
# Filter by feb26 tag
tagged = [t for t in tasks if TAG_FILTER in [tag.lower() for tag in t.tags]]
print(f"Tasks with '{TAG_FILTER}' tag: {len(tagged)}")
# Show all Work Category values for debugging
categories = set()
for t in tagged:
wc = t.custom_fields.get("Work Category", "") or ""
categories.add(wc)
print(f"Work Categories found: {categories}")
# Filter by OPT/LINKS/Content categories
filtered = []
for t in tagged:
wc = str(t.custom_fields.get("Work Category", "") or "").lower()
if any(cat in wc for cat in CATEGORY_FILTERS):
filtered.append(t)
print(f"After category filter (OPT/LINKS/Content): {len(filtered)}")
# Sort by due date (oldest first), tasks with no due date go last
def sort_key(t):
if t.due_date:
try:
return (0, int(t.due_date))
except ValueError:
return (1, 0)
return (2, 0)
filtered.sort(key=sort_key)
# Top 10
top10 = filtered[:10]
# Print table
print(f"\n{'#':>3} | {'ID':>11} | {'Keyword/Name':<45} | {'Due':>10} | {'Customer':<20} | Tags")
print("-" * 120)
for i, t in enumerate(top10, 1):
customer = t.custom_fields.get("Customer", "") or ""
due = ms_to_date(t.due_date)
wc = t.custom_fields.get("Work Category", "") or ""
tags_str = ", ".join(t.tags)
name_display = t.name[:45] if len(t.name) > 45 else t.name
print(f"{i:>3} | {t.id:>11} | {name_display:<45} | {due:>10} | {customer:<20} | {tags_str}")
if not top10:
print(" (no matching tasks found)")
print(f"\n--- {len(filtered)} total matching tasks, showing top {len(top10)} (oldest first) ---")
if __name__ == "__main__":
main()

View File

@ -1,149 +0,0 @@
"""One-time script: rebuild the 'Customer' dropdown custom field in ClickUp.
Steps:
1. Fetch all folders from the PII-Agency-SEO space
2. Filter out non-client folders
3. Create a 'Customer' dropdown field with folder names as options
4. For each client folder, find the 'Overall' list and set Customer on all tasks
Usage:
DRY_RUN=1 uv run python scripts/rebuild_customer_field.py # preview only
uv run python scripts/rebuild_customer_field.py # live run
"""
from __future__ import annotations
import os
import sys
import time
from pathlib import Path
# Allow running from repo root
_root = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(_root))
from dotenv import load_dotenv
load_dotenv(_root / ".env")
from cheddahbot.clickup import ClickUpClient
# ── Config ──────────────────────────────────────────────────────────────────
DRY_RUN = os.environ.get("DRY_RUN", "0") not in ("0", "false", "")
EXCLUDED_FOLDERS = {"SEO Audits", "SEO Projects", "Business Related"}
FIELD_NAME = "Customer"
API_TOKEN = os.environ.get("CLICKUP_API_TOKEN", "")
SPACE_ID = os.environ.get("CLICKUP_SPACE_ID", "")
if not API_TOKEN:
sys.exit("ERROR: CLICKUP_API_TOKEN env var is required")
if not SPACE_ID:
sys.exit("ERROR: CLICKUP_SPACE_ID env var is required")
def main() -> None:
client = ClickUpClient(api_token=API_TOKEN)
# 1. Get folders
print(f"\n{'=' * 60}")
print(f" Rebuild '{FIELD_NAME}' field -- Space {SPACE_ID}")
print(f" Mode: {'DRY RUN' if DRY_RUN else 'LIVE'}")
print(f"{'=' * 60}\n")
folders = client.get_folders(SPACE_ID)
print(f"Found {len(folders)} folders:\n")
client_folders = []
for f in folders:
excluded = f["name"] in EXCLUDED_FOLDERS
marker = " [SKIP]" if excluded else ""
list_names = [lst["name"] for lst in f["lists"]]
print(f" {f['name']}{marker} (lists: {', '.join(list_names) or 'none'})")
if not excluded:
client_folders.append(f)
if not client_folders:
sys.exit("\nNo client folders found -- nothing to do.")
option_names = sorted(f["name"] for f in client_folders)
print(f"\nDropdown options ({len(option_names)}): {', '.join(option_names)}")
# 2. Build a plan: folder → Overall list → tasks
plan: list[dict] = [] # {folder_name, list_id, tasks: [ClickUpTask]}
first_list_id = None
for f in client_folders:
overall = next((lst for lst in f["lists"] if lst["name"] == "Overall"), None)
if overall is None:
print(f"\n WARNING: '{f['name']}' has no 'Overall' list -- skipping task update")
continue
if first_list_id is None:
first_list_id = overall["id"]
tasks = client.get_tasks(overall["id"])
plan.append({"folder_name": f["name"], "list_id": overall["id"], "tasks": tasks})
# 3. Print summary
total_tasks = sum(len(p["tasks"]) for p in plan)
print("\n--- Update Plan ---")
for p in plan:
print(f" {p['folder_name']:30s} -> {len(p['tasks']):3d} tasks in list {p['list_id']}")
print(f" {'TOTAL':30s} -> {total_tasks:3d} tasks")
if DRY_RUN:
print("\n** DRY RUN -- no changes made. Unset DRY_RUN to execute. **\n")
return
if first_list_id is None:
sys.exit("\nNo 'Overall' list found in any client folder -- cannot create field.")
# 4. Create the dropdown field
print(f"\nCreating '{FIELD_NAME}' dropdown on list {first_list_id} ...")
type_config = {
"options": [{"name": name, "color": None} for name in option_names],
}
client.create_custom_field(first_list_id, FIELD_NAME, "drop_down", type_config)
print(" Field created.")
# Brief pause for ClickUp to propagate
time.sleep(2)
# 5. Discover the field UUID + option IDs
print("Discovering field UUID and option IDs ...")
field_info = client.discover_field_filter(first_list_id, FIELD_NAME)
if field_info is None:
sys.exit(f"\nERROR: Could not find '{FIELD_NAME}' field after creation!")
field_id = field_info["field_id"]
option_map = field_info["options"] # {name: uuid}
print(f" Field ID: {field_id}")
print(f" Options: {option_map}")
# 6. Set Customer field on each task
updated = 0
failed = 0
for p in plan:
folder_name = p["folder_name"]
opt_id = option_map.get(folder_name)
if not opt_id:
print(f"\n WARNING: No option ID for '{folder_name}' -- skipping")
continue
print(f"\nUpdating {len(p['tasks'])} tasks in '{folder_name}' ...")
for task in p["tasks"]:
ok = client.set_custom_field_value(task.id, field_id, opt_id)
if ok:
updated += 1
else:
failed += 1
print(f" FAILED: task {task.id} ({task.name})")
# Light rate-limit courtesy
time.sleep(0.15)
print(f"\n{'=' * 60}")
print(f" Done! Updated: {updated} | Failed: {failed}")
print(f"{'=' * 60}\n")
if __name__ == "__main__":
main()

View File

@ -1,144 +0,0 @@
"""Re-run press release pipeline for specific tasks that are missing attachments."""
import logging
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8")
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
handlers=[logging.StreamHandler(stream=io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8"))],
)
log = logging.getLogger("pr_rerun")
from cheddahbot.config import load_config
from cheddahbot.db import Database
from cheddahbot.llm import LLMAdapter
from cheddahbot.agent import Agent
from cheddahbot.clickup import ClickUpClient
TASKS_TO_RERUN = [
("86b8ebfk9", "Advanced Industrial highlights medical grade plastic expertise", "Advanced Industrial"),
]
def bootstrap():
config = load_config()
db = Database(config.db_path)
llm = LLMAdapter(
default_model=config.chat_model,
openrouter_key=config.openrouter_api_key,
ollama_url=config.ollama_url,
lmstudio_url=config.lmstudio_url,
)
agent_cfg = config.agents[0] if config.agents else None
agent = Agent(config, db, llm, agent_config=agent_cfg)
try:
from cheddahbot.memory import MemorySystem
scope = agent_cfg.memory_scope if agent_cfg else ""
memory = MemorySystem(config, db, scope=scope)
agent.set_memory(memory)
except Exception as e:
log.warning("Memory not available: %s", e)
from cheddahbot.tools import ToolRegistry
tools = ToolRegistry(config, db, agent)
agent.set_tools(tools)
try:
from cheddahbot.skills import SkillRegistry
skills = SkillRegistry(config.skills_dir)
agent.set_skills_registry(skills)
except Exception as e:
log.warning("Skills not available: %s", e)
return config, db, agent, tools
def run_task(agent, tools, config, client, task_id, task_name, customer):
"""Execute write_press_releases for a specific task."""
# Build args matching the field_mapping from config
args = {
"topic": task_name,
"company_name": customer,
"clickup_task_id": task_id,
}
# Also fetch IMSURL from the task
import httpx as _httpx
resp = _httpx.get(
f"https://api.clickup.com/api/v2/task/{task_id}",
headers={"Authorization": config.clickup.api_token},
timeout=30.0,
)
task_data = resp.json()
for cf in task_data.get("custom_fields", []):
if cf["name"] == "IMSURL":
val = cf.get("value")
if val:
args["url"] = val
elif cf["name"] == "SocialURL":
val = cf.get("value")
if val:
args["branded_url"] = val
log.info("=" * 70)
log.info("EXECUTING: %s", task_name)
log.info(" Task ID: %s", task_id)
log.info(" Customer: %s", customer)
log.info(" Args: %s", {k: v for k, v in args.items() if k != "clickup_task_id"})
log.info("=" * 70)
try:
result = tools.execute("write_press_releases", args)
if result.startswith("Skipped:") or result.startswith("Error:"):
log.error("Task skipped/errored: %s", result[:500])
return False
log.info("Task completed!")
# Print first 500 chars of result
print(f"\n--- Result for {task_name} ---")
print(result[:1000])
print("--- End ---\n")
return True
except Exception as e:
log.error("Task failed: %s", e, exc_info=True)
return False
def main():
log.info("Bootstrapping CheddahBot...")
config, db, agent, tools = bootstrap()
client = ClickUpClient(
api_token=config.clickup.api_token,
workspace_id=config.clickup.workspace_id,
task_type_field_name=config.clickup.task_type_field_name,
)
log.info("Will re-run %d tasks", len(TASKS_TO_RERUN))
results = []
for i, (task_id, name, customer) in enumerate(TASKS_TO_RERUN):
log.info("\n>>> Task %d/%d <<<", i + 1, len(TASKS_TO_RERUN))
success = run_task(agent, tools, config, client, task_id, name, customer)
results.append((name, success))
print(f"\n{'=' * 70}")
print("RESULTS SUMMARY")
print(f"{'=' * 70}")
for name, success in results:
status = "OK" if success else "FAILED"
print(f" [{status}] {name}")
if __name__ == "__main__":
main()

View File

@ -1,241 +0,0 @@
"""Run the press-release pipeline for up to N ClickUp tasks.
Usage:
uv run python scripts/run_pr_pipeline.py # discover + execute up to 3
uv run python scripts/run_pr_pipeline.py --dry-run # discover only, don't execute
uv run python scripts/run_pr_pipeline.py --max 1 # execute only 1 task
"""
import argparse
import logging
import sys
from datetime import UTC, datetime
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
datefmt="%H:%M:%S",
)
log = logging.getLogger("pr_pipeline")
# ── Bootstrap CheddahBot (config, db, agent, tools) ─────────────────────
from cheddahbot.config import load_config
from cheddahbot.db import Database
from cheddahbot.llm import LLMAdapter
from cheddahbot.agent import Agent
from cheddahbot.clickup import ClickUpClient
def bootstrap():
"""Set up config, db, agent, and tool registry — same as __main__.py."""
config = load_config()
db = Database(config.db_path)
llm = LLMAdapter(
default_model=config.chat_model,
openrouter_key=config.openrouter_api_key,
ollama_url=config.ollama_url,
lmstudio_url=config.lmstudio_url,
)
agent_cfg = config.agents[0] if config.agents else None
agent = Agent(config, db, llm, agent_config=agent_cfg)
# Memory
try:
from cheddahbot.memory import MemorySystem
scope = agent_cfg.memory_scope if agent_cfg else ""
memory = MemorySystem(config, db, scope=scope)
agent.set_memory(memory)
except Exception as e:
log.warning("Memory not available: %s", e)
# Tools
from cheddahbot.tools import ToolRegistry
tools = ToolRegistry(config, db, agent)
agent.set_tools(tools)
# Skills
try:
from cheddahbot.skills import SkillRegistry
skills = SkillRegistry(config.skills_dir)
agent.set_skills_registry(skills)
except Exception as e:
log.warning("Skills not available: %s", e)
return config, db, agent, tools
def discover_pr_tasks(config):
"""Poll ClickUp for Press Release tasks — same logic as scheduler._poll_clickup()."""
client = ClickUpClient(
api_token=config.clickup.api_token,
workspace_id=config.clickup.workspace_id,
task_type_field_name=config.clickup.task_type_field_name,
)
space_id = config.clickup.space_id
skill_map = config.clickup.skill_map
if not space_id:
log.error("No space_id configured")
return [], client
# Discover field filter (Work Category UUID + options)
list_ids = client.get_list_ids_from_space(space_id)
if not list_ids:
log.error("No lists found in space %s", space_id)
return [], client
first_list = next(iter(list_ids))
field_filter = client.discover_field_filter(
first_list, config.clickup.task_type_field_name
)
# Build custom fields filter for API query
custom_fields_filter = None
if field_filter and field_filter.get("options"):
import json
field_id = field_filter["field_id"]
options = field_filter["options"]
# Only Press Release
pr_opt_id = options.get("Press Release")
if pr_opt_id:
custom_fields_filter = json.dumps(
[{"field_id": field_id, "operator": "ANY", "value": [pr_opt_id]}]
)
log.info("Filtering for Press Release option ID: %s", pr_opt_id)
else:
log.warning("'Press Release' not found in Work Category options: %s", list(options.keys()))
return [], client
# Due date window (3 weeks)
now_ms = int(datetime.now(UTC).timestamp() * 1000)
due_date_lt = now_ms + (3 * 7 * 24 * 60 * 60 * 1000)
tasks = client.get_tasks_from_space(
space_id,
statuses=config.clickup.poll_statuses,
due_date_lt=due_date_lt,
custom_fields=custom_fields_filter,
)
# Client-side filter: must be Press Release + have due date in window
pr_tasks = []
for task in tasks:
if task.task_type != "Press Release":
continue
if not task.due_date:
continue
try:
if int(task.due_date) > due_date_lt:
continue
except (ValueError, TypeError):
continue
pr_tasks.append(task)
return pr_tasks, client
def execute_task(agent, tools, config, client, task):
"""Execute a single PR task — same logic as scheduler._execute_task()."""
skill_map = config.clickup.skill_map
mapping = skill_map.get("Press Release", {})
tool_name = mapping.get("tool", "write_press_releases")
task_id = task.id
# Build tool args from field mapping
field_mapping = mapping.get("field_mapping", {})
args = {}
for tool_param, source in field_mapping.items():
if source == "task_name":
args[tool_param] = task.name
elif source == "task_description":
args[tool_param] = task.custom_fields.get("description", "")
else:
args[tool_param] = task.custom_fields.get(source, "")
args["clickup_task_id"] = task_id
log.info("=" * 70)
log.info("EXECUTING: %s", task.name)
log.info(" Task ID: %s", task_id)
log.info(" Tool: %s", tool_name)
log.info(" Args: %s", {k: v for k, v in args.items() if k != "clickup_task_id"})
log.info("=" * 70)
# Move to "automation underway"
client.update_task_status(task_id, config.clickup.automation_status)
try:
result = tools.execute(tool_name, args)
if result.startswith("Skipped:") or result.startswith("Error:"):
log.error("Task skipped/errored: %s", result[:500])
client.add_comment(
task_id,
f"⚠️ CheddahBot could not execute this task.\n\n{result[:2000]}",
)
client.update_task_status(task_id, config.clickup.error_status)
return False
log.info("Task completed successfully!")
log.info("Result preview:\n%s", result[:1000])
return True
except Exception as e:
log.error("Task failed with exception: %s", e, exc_info=True)
client.add_comment(
task_id,
f"❌ CheddahBot failed to complete this task.\n\nError: {str(e)[:2000]}",
)
client.update_task_status(task_id, config.clickup.error_status)
return False
def main():
parser = argparse.ArgumentParser(description="Run PR pipeline from ClickUp")
parser.add_argument("--dry-run", action="store_true", help="Discover only, don't execute")
parser.add_argument("--max", type=int, default=3, help="Max tasks to execute (default: 3)")
args = parser.parse_args()
log.info("Bootstrapping CheddahBot...")
config, db, agent, tools = bootstrap()
log.info("Polling ClickUp for Press Release tasks...")
pr_tasks, client = discover_pr_tasks(config)
if not pr_tasks:
log.info("No Press Release tasks found in statuses %s", config.clickup.poll_statuses)
return
log.info("Found %d Press Release task(s):", len(pr_tasks))
for i, task in enumerate(pr_tasks):
status_str = f"status={task.status}" if hasattr(task, "status") else ""
log.info(" %d. %s (id=%s) %s", i + 1, task.name, task.id, status_str)
log.info(" Custom fields: %s", task.custom_fields)
if args.dry_run:
log.info("Dry run — not executing. Use without --dry-run to execute.")
return
# Execute up to --max tasks
to_run = pr_tasks[: args.max]
log.info("Will execute %d task(s) (max=%d)", len(to_run), args.max)
results = []
for i, task in enumerate(to_run):
log.info("\n>>> Task %d/%d <<<", i + 1, len(to_run))
success = execute_task(agent, tools, config, client, task)
results.append((task.name, success))
log.info("\n" + "=" * 70)
log.info("RESULTS SUMMARY")
log.info("=" * 70)
for name, success in results:
status = "OK" if success else "FAILED"
log.info(" [%s] %s", status, name)
if __name__ == "__main__":
main()

View File

@ -1,29 +0,0 @@
---
name: create-task
description: Create new ClickUp tasks for clients. Use when the user asks to create, add, or make a new task.
tools: [clickup_create_task]
agents: [default]
---
# Create ClickUp Task
Creates a new task in the client's "Overall" list in ClickUp.
## Required Information
- **name**: The task name (e.g., "Write blog post about AI trends")
- **client**: The client/folder name in ClickUp (e.g., "Acme Corp")
## Optional Information
- **work_category**: The work category dropdown value (e.g., "Press Release", "Link Building", "Content Creation", "On Page Optimization")
- **description**: Task description (supports markdown)
- **status**: Initial status (default: "to do")
## Examples
"Create a press release task for Acme Corp about their new product launch"
-> name: "Press Release - New Product Launch", client: "Acme Corp", work_category: "Press Release"
"Add a link building task for Widget Co"
-> name: "Link Building Campaign", client: "Widget Co", work_category: "Link Building"

View File

@ -18,10 +18,9 @@ When the user provides a press release topic, follow this workflow:
- Each headline must be: - Each headline must be:
- Maximum 70 characters - Maximum 70 characters
- Title case - Title case
- News-wire style (not promotional) - News-focused (not promotional)
- Free of location keywords, superlatives (best/top/leading/#1), and questions - Free of location keywords, superlatives (best/top/leading/#1), and questions
- MUST NOT fabricate events, expansions, milestones, or demand claims - Not make up information that isn't true.
- Unless the topic explicitly signals actual news (e.g. "Actual News", "New Product", "Launch"), assume the company ALREADY offers this — use awareness verbs like "Highlights", "Reinforces", "Delivers", "Showcases", NOT announcement verbs like "Announces", "Launches", "Expands"
- Present all 7 titles to an AI agent to judge which is best. This can be decided by looking at titles on Press Advantage for other businesses, and seeing how closely the headline follows the instructions. - Present all 7 titles to an AI agent to judge which is best. This can be decided by looking at titles on Press Advantage for other businesses, and seeing how closely the headline follows the instructions.
** EXAMPLE GREAT HEADLINES: ** ** EXAMPLE GREAT HEADLINES: **
@ -76,10 +75,8 @@ When generating the 7 headline options:
### Content Type ### Content Type
- This is a PRESS RELEASE, not an advertorial, blog post, or promotional content - This is a PRESS RELEASE, not an advertorial, blog post, or promotional content
- Must be written in objective, journalistic style - Must be objective news announcement written in journalistic style
- By default this is an AWARENESS piece — the company already offers this capability. Frame it as highlighting/reinforcing existing offerings, NOT as announcing something new - Must announce actual NEWS (about products/services, milestones, awards, reactions to current events)
- Only use announcement language (announces, launches, introduces) when the topic explicitly signals actual news (e.g. topic contains "Actual News", "New Product", "Launch")
- Do NOT fabricate events, expansions, milestones, or demand claims. If nothing new happened, do not pretend it did.
- Must read like it could appear verbatim in a newspaper - Must read like it could appear verbatim in a newspaper
### Writing Style - MANDATORY ### Writing Style - MANDATORY
@ -206,7 +203,7 @@ Before finalizing, verify:
3. Include 1-2 executive quotes for human perspective 3. Include 1-2 executive quotes for human perspective
4. Provide context about the company/organization 4. Provide context about the company/organization
5. Explain significance and impact 5. Explain significance and impact
6. Do NOT include an "About" section or company boilerplate — Press Advantage adds this automatically 6. End with company boilerplate and contact information
7. Write in inverted pyramid style - can be cut from bottom up 7. Write in inverted pyramid style - can be cut from bottom up
## Tone Guidelines ## Tone Guidelines

View File

@ -1,3 +0,0 @@
#!/usr/bin/env bash
cd "$(dirname "$0")"
exec uv run python -m cheddahbot

View File

@ -12,7 +12,6 @@ import pytest
from cheddahbot.config import AutoCoraConfig, ClickUpConfig, Config from cheddahbot.config import AutoCoraConfig, ClickUpConfig, Config
from cheddahbot.tools.autocora import ( from cheddahbot.tools.autocora import (
_find_qualifying_tasks_sweep,
_group_by_keyword, _group_by_keyword,
_make_job_id, _make_job_id,
_parse_result, _parse_result,
@ -37,7 +36,6 @@ class FakeTask:
task_type: str = "Content Creation" task_type: str = "Content Creation"
due_date: str = "" due_date: str = ""
custom_fields: dict[str, Any] = field(default_factory=dict) custom_fields: dict[str, Any] = field(default_factory=dict)
tags: list[str] = field(default_factory=list)
@pytest.fixture() @pytest.fixture()
@ -149,12 +147,11 @@ class TestGroupByKeyword:
assert len(groups) == 0 assert len(groups) == 0
assert any("missing Keyword" in a for a in alerts) assert any("missing Keyword" in a for a in alerts)
def test_missing_imsurl_uses_fallback(self): def test_missing_imsurl(self):
"""Missing IMSURL gets a fallback blank URL."""
tasks = [FakeTask(id="t1", name="No URL", custom_fields={"Keyword": "test"})] tasks = [FakeTask(id="t1", name="No URL", custom_fields={"Keyword": "test"})]
groups, alerts = _group_by_keyword(tasks, tasks) groups, alerts = _group_by_keyword(tasks, tasks)
assert len(groups) == 1 assert len(groups) == 0
assert groups["test"]["url"] == "https://seotoollab.com/blank.html" assert any("missing IMSURL" in a for a in alerts)
def test_sibling_tasks(self): def test_sibling_tasks(self):
"""Tasks sharing a keyword from all_tasks should be included.""" """Tasks sharing a keyword from all_tasks should be included."""
@ -204,7 +201,9 @@ class TestSubmitAutocoraJobs:
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task] "cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task]
) )
monkeypatch.setattr(
"cheddahbot.tools.autocora._find_all_todo_tasks", lambda *a, **kw: [task]
)
result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
assert "Submitted 1 job" in result assert "Submitted 1 job" in result
@ -220,8 +219,8 @@ class TestSubmitAutocoraJobs:
assert job_data["url"] == "http://example.com" assert job_data["url"] == "http://example.com"
assert job_data["task_ids"] == ["t1"] assert job_data["task_ids"] == ["t1"]
def test_submit_writes_job_with_task_ids(self, ctx, monkeypatch): def test_submit_tracks_kv(self, ctx, monkeypatch):
"""Job file contains task_ids for the result poller.""" """KV store tracks submitted jobs."""
task = FakeTask( task = FakeTask(
id="t1", id="t1",
name="Test", name="Test",
@ -231,18 +230,20 @@ class TestSubmitAutocoraJobs:
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task] "cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task]
) )
monkeypatch.setattr(
"cheddahbot.tools.autocora._find_all_todo_tasks", lambda *a, **kw: [task]
)
submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
jobs_dir = Path(ctx["config"].autocora.jobs_dir) raw = ctx["db"].kv_get("autocora:job:test keyword")
job_files = list(jobs_dir.glob("job-*.json")) assert raw is not None
assert len(job_files) == 1 state = json.loads(raw)
data = json.loads(job_files[0].read_text()) assert state["status"] == "submitted"
assert "t1" in data["task_ids"] assert "t1" in state["task_ids"]
def test_duplicate_prevention(self, ctx, monkeypatch): def test_duplicate_prevention(self, ctx, monkeypatch):
"""Already-submitted keywords are skipped (job file exists).""" """Already-submitted keywords are skipped."""
task = FakeTask( task = FakeTask(
id="t1", id="t1",
name="Test", name="Test",
@ -252,12 +253,14 @@ class TestSubmitAutocoraJobs:
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task] "cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task]
) )
monkeypatch.setattr(
"cheddahbot.tools.autocora._find_all_todo_tasks", lambda *a, **kw: [task]
)
# First submit # First submit
submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
# Second submit — should skip (job file already exists) # Second submit — should skip
result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
assert "Skipped 1" in result assert "Skipped 1" in result
@ -272,13 +275,15 @@ class TestSubmitAutocoraJobs:
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task] "cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task]
) )
monkeypatch.setattr(
"cheddahbot.tools.autocora._find_all_todo_tasks", lambda *a, **kw: [task]
)
result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
assert "missing Keyword" in result assert "missing Keyword" in result
def test_missing_imsurl_uses_fallback(self, ctx, monkeypatch): def test_missing_imsurl_alert(self, ctx, monkeypatch):
"""Tasks without IMSURL use fallback URL and still submit.""" """Tasks without IMSURL field produce alerts."""
task = FakeTask( task = FakeTask(
id="t1", id="t1",
name="No URL Task", name="No URL Task",
@ -288,10 +293,12 @@ class TestSubmitAutocoraJobs:
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task] "cheddahbot.tools.autocora._find_qualifying_tasks", lambda *a, **kw: [task]
) )
monkeypatch.setattr(
"cheddahbot.tools.autocora._find_all_todo_tasks", lambda *a, **kw: [task]
)
result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx) result = submit_autocora_jobs(target_date="2025-01-01", ctx=ctx)
assert "Submitted 1 job" in result assert "missing IMSURL" in result
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -305,18 +312,33 @@ class TestPollAutocoraResults:
result = poll_autocora_results(ctx=ctx) result = poll_autocora_results(ctx=ctx)
assert "disabled" in result.lower() assert "disabled" in result.lower()
def test_no_result_files(self, ctx): def test_no_pending(self, ctx):
result = poll_autocora_results(ctx=ctx) result = poll_autocora_results(ctx=ctx)
assert "No result files" in result assert "No pending" in result
def test_success_json(self, ctx, monkeypatch): def test_success_json(self, ctx, monkeypatch):
"""JSON SUCCESS result updates ClickUp and moves result file.""" """JSON SUCCESS result updates KV and ClickUp."""
db = ctx["db"]
results_dir = Path(ctx["config"].autocora.results_dir) results_dir = Path(ctx["config"].autocora.results_dir)
# Write result file directly (no KV needed) # Set up submitted job in KV
result_data = {"status": "SUCCESS", "task_ids": ["t1", "t2"], "keyword": "test keyword"} job_id = "job-123-test"
(results_dir / "job-123-test.result").write_text(json.dumps(result_data)) kv_key = "autocora:job:test keyword"
db.kv_set(
kv_key,
json.dumps({
"status": "submitted",
"job_id": job_id,
"keyword": "test keyword",
"task_ids": ["t1", "t2"],
}),
)
# Write result file
result_data = {"status": "SUCCESS", "task_ids": ["t1", "t2"]}
(results_dir / f"{job_id}.result").write_text(json.dumps(result_data))
# Mock ClickUp client
mock_client = MagicMock() mock_client = MagicMock()
monkeypatch.setattr( monkeypatch.setattr(
"cheddahbot.tools.autocora._get_clickup_client", lambda ctx: mock_client "cheddahbot.tools.autocora._get_clickup_client", lambda ctx: mock_client
@ -325,27 +347,39 @@ class TestPollAutocoraResults:
result = poll_autocora_results(ctx=ctx) result = poll_autocora_results(ctx=ctx)
assert "SUCCESS: test keyword" in result assert "SUCCESS: test keyword" in result
# Verify KV updated
state = json.loads(db.kv_get(kv_key))
assert state["status"] == "completed"
# Verify ClickUp calls # Verify ClickUp calls
assert mock_client.update_task_status.call_count == 2 assert mock_client.update_task_status.call_count == 2
mock_client.update_task_status.assert_any_call("t1", "running cora") mock_client.update_task_status.assert_any_call("t1", "running cora")
mock_client.update_task_status.assert_any_call("t2", "running cora") mock_client.update_task_status.assert_any_call("t2", "running cora")
assert mock_client.add_comment.call_count == 2 assert mock_client.add_comment.call_count == 2
# Verify result file moved to processed/
assert not (results_dir / "job-123-test.result").exists()
assert (results_dir / "processed" / "job-123-test.result").exists()
def test_failure_json(self, ctx, monkeypatch): def test_failure_json(self, ctx, monkeypatch):
"""JSON FAILURE result updates ClickUp with error.""" """JSON FAILURE result updates KV and ClickUp with error."""
db = ctx["db"]
results_dir = Path(ctx["config"].autocora.results_dir) results_dir = Path(ctx["config"].autocora.results_dir)
job_id = "job-456-fail"
kv_key = "autocora:job:fail keyword"
db.kv_set(
kv_key,
json.dumps({
"status": "submitted",
"job_id": job_id,
"keyword": "fail keyword",
"task_ids": ["t3"],
}),
)
result_data = { result_data = {
"status": "FAILURE", "status": "FAILURE",
"reason": "Cora not running", "reason": "Cora not running",
"task_ids": ["t3"], "task_ids": ["t3"],
"keyword": "fail keyword",
} }
(results_dir / "job-456-fail.result").write_text(json.dumps(result_data)) (results_dir / f"{job_id}.result").write_text(json.dumps(result_data))
mock_client = MagicMock() mock_client = MagicMock()
monkeypatch.setattr( monkeypatch.setattr(
@ -356,14 +390,31 @@ class TestPollAutocoraResults:
assert "FAILURE: fail keyword" in result assert "FAILURE: fail keyword" in result
assert "Cora not running" in result assert "Cora not running" in result
state = json.loads(db.kv_get(kv_key))
assert state["status"] == "failed"
assert state["error"] == "Cora not running"
mock_client.update_task_status.assert_called_once_with("t3", "error") mock_client.update_task_status.assert_called_once_with("t3", "error")
def test_legacy_plain_text(self, ctx, monkeypatch): def test_legacy_plain_text(self, ctx, monkeypatch):
"""Legacy plain-text SUCCESS result still works (keyword from filename).""" """Legacy plain-text SUCCESS result still works."""
db = ctx["db"]
results_dir = Path(ctx["config"].autocora.results_dir) results_dir = Path(ctx["config"].autocora.results_dir)
job_id = "job-789-legacy"
kv_key = "autocora:job:legacy kw"
db.kv_set(
kv_key,
json.dumps({
"status": "submitted",
"job_id": job_id,
"keyword": "legacy kw",
"task_ids": ["t5"],
}),
)
# Legacy format — plain text, no JSON # Legacy format — plain text, no JSON
(results_dir / "job-789-legacy-kw.result").write_text("SUCCESS") (results_dir / f"{job_id}.result").write_text("SUCCESS")
mock_client = MagicMock() mock_client = MagicMock()
monkeypatch.setattr( monkeypatch.setattr(
@ -371,17 +422,31 @@ class TestPollAutocoraResults:
) )
result = poll_autocora_results(ctx=ctx) result = poll_autocora_results(ctx=ctx)
assert "SUCCESS:" in result assert "SUCCESS: legacy kw" in result
# No task_ids in legacy format, so no ClickUp calls # task_ids come from KV fallback
mock_client.update_task_status.assert_not_called() mock_client.update_task_status.assert_called_once_with("t5", "running cora")
def test_task_ids_from_result_file(self, ctx, monkeypatch): def test_task_ids_from_result_preferred(self, ctx, monkeypatch):
"""task_ids from result file drive ClickUp updates.""" """task_ids from result file take precedence over KV."""
db = ctx["db"]
results_dir = Path(ctx["config"].autocora.results_dir) results_dir = Path(ctx["config"].autocora.results_dir)
result_data = {"status": "SUCCESS", "task_ids": ["new_t1", "new_t2"], "keyword": "pref kw"} job_id = "job-100-pref"
(results_dir / "job-100-pref.result").write_text(json.dumps(result_data)) kv_key = "autocora:job:pref kw"
db.kv_set(
kv_key,
json.dumps({
"status": "submitted",
"job_id": job_id,
"keyword": "pref kw",
"task_ids": ["old_t1"], # KV has old IDs
}),
)
# Result has updated task_ids
result_data = {"status": "SUCCESS", "task_ids": ["new_t1", "new_t2"]}
(results_dir / f"{job_id}.result").write_text(json.dumps(result_data))
mock_client = MagicMock() mock_client = MagicMock()
monkeypatch.setattr( monkeypatch.setattr(
@ -390,107 +455,25 @@ class TestPollAutocoraResults:
poll_autocora_results(ctx=ctx) poll_autocora_results(ctx=ctx)
# Should use result file task_ids, not KV
calls = [c.args for c in mock_client.update_task_status.call_args_list] calls = [c.args for c in mock_client.update_task_status.call_args_list]
assert ("new_t1", "running cora") in calls assert ("new_t1", "running cora") in calls
assert ("new_t2", "running cora") in calls assert ("new_t2", "running cora") in calls
assert ("old_t1", "running cora") not in calls
def test_still_pending(self, ctx):
# --------------------------------------------------------------------------- """Jobs without result files show as still pending."""
# Sweep tests db = ctx["db"]
# --------------------------------------------------------------------------- db.kv_set(
"autocora:job:waiting",
json.dumps({
class TestFindQualifyingTasksSweep: "status": "submitted",
"""Test the multi-pass sweep logic.""" "job_id": "job-999-wait",
"keyword": "waiting",
def _make_client(self, tasks): "task_ids": ["t99"],
client = MagicMock() }),
client.get_tasks_from_space.return_value = tasks
return client
def _make_config(self):
config = MagicMock()
config.clickup.space_id = "sp1"
return config
def test_finds_tasks_due_today(self):
from datetime import UTC, datetime
now = datetime.now(UTC)
today_ms = int(now.replace(hour=12).timestamp() * 1000)
task = FakeTask(id="t1", name="Today", due_date=str(today_ms))
client = self._make_client([task])
config = self._make_config()
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"])
assert any(t.id == "t1" for t in result)
def test_finds_overdue_with_month_tag(self):
from datetime import UTC, datetime
now = datetime.now(UTC)
month_tag = now.strftime("%b%y").lower()
# Due 3 days ago
overdue_ms = int((now.timestamp() - 3 * 86400) * 1000)
task = FakeTask(
id="t2", name="Overdue", due_date=str(overdue_ms), tags=[month_tag]
) )
client = self._make_client([task])
config = self._make_config()
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"]) result = poll_autocora_results(ctx=ctx)
assert any(t.id == "t2" for t in result) assert "Still pending" in result
assert "waiting" in result
def test_finds_last_month_tagged(self):
from datetime import UTC, datetime
now = datetime.now(UTC)
if now.month == 1:
last = now.replace(year=now.year - 1, month=12)
else:
last = now.replace(month=now.month - 1)
last_tag = last.strftime("%b%y").lower()
# No due date needed for month-tag pass
task = FakeTask(id="t3", name="Last Month", tags=[last_tag])
client = self._make_client([task])
config = self._make_config()
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"])
assert any(t.id == "t3" for t in result)
def test_finds_lookahead(self):
from datetime import UTC, datetime
now = datetime.now(UTC)
tomorrow_ms = int((now.timestamp() + 36 * 3600) * 1000)
task = FakeTask(id="t4", name="Tomorrow", due_date=str(tomorrow_ms))
client = self._make_client([task])
config = self._make_config()
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"])
assert any(t.id == "t4" for t in result)
def test_deduplicates_across_passes(self):
from datetime import UTC, datetime
now = datetime.now(UTC)
month_tag = now.strftime("%b%y").lower()
today_ms = int(now.replace(hour=12).timestamp() * 1000)
# Task is due today AND has month tag — should only appear once
task = FakeTask(
id="t5", name="Multi", due_date=str(today_ms), tags=[month_tag]
)
client = self._make_client([task])
config = self._make_config()
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"])
ids = [t.id for t in result]
assert ids.count("t5") == 1
def test_empty_space_id(self):
config = self._make_config()
config.clickup.space_id = ""
client = self._make_client([])
result = _find_qualifying_tasks_sweep(client, config, ["Content Creation"])
assert result == []

View File

@ -434,208 +434,3 @@ class TestClickUpClient:
assert result is None assert result is None
client.close() client.close()
@respx.mock
def test_create_task(self):
respx.post(f"{BASE_URL}/list/list_1/task").mock(
return_value=httpx.Response(
200,
json={
"id": "new_task_1",
"name": "Test Task",
"url": "https://app.clickup.com/t/new_task_1",
},
)
)
client = ClickUpClient(api_token="pk_test")
result = client.create_task(
list_id="list_1",
name="Test Task",
description="A test description",
status="to do",
)
assert result["id"] == "new_task_1"
assert result["url"] == "https://app.clickup.com/t/new_task_1"
request = respx.calls.last.request
import json
body = json.loads(request.content)
assert body["name"] == "Test Task"
assert body["description"] == "A test description"
assert body["status"] == "to do"
client.close()
@respx.mock
def test_create_task_with_optional_fields(self):
respx.post(f"{BASE_URL}/list/list_1/task").mock(
return_value=httpx.Response(
200,
json={"id": "new_task_2", "name": "Tagged Task", "url": ""},
)
)
client = ClickUpClient(api_token="pk_test")
result = client.create_task(
list_id="list_1",
name="Tagged Task",
due_date=1740000000000,
tags=["urgent", "mar26"],
custom_fields=[{"id": "cf_1", "value": "opt_1"}],
)
assert result["id"] == "new_task_2"
import json
body = json.loads(respx.calls.last.request.content)
assert body["due_date"] == 1740000000000
assert body["tags"] == ["urgent", "mar26"]
assert body["custom_fields"] == [{"id": "cf_1", "value": "opt_1"}]
client.close()
@respx.mock
def test_find_list_in_folder_found(self):
respx.get(f"{BASE_URL}/space/space_1/folder").mock(
return_value=httpx.Response(
200,
json={
"folders": [
{
"id": "f1",
"name": "Acme Corp",
"lists": [
{"id": "list_overall", "name": "Overall"},
{"id": "list_archive", "name": "Archive"},
],
},
{
"id": "f2",
"name": "Widget Co",
"lists": [
{"id": "list_w_overall", "name": "Overall"},
],
},
]
},
)
)
client = ClickUpClient(api_token="pk_test")
result = client.find_list_in_folder("space_1", "Acme Corp")
assert result == "list_overall"
client.close()
@respx.mock
def test_find_list_in_folder_case_insensitive(self):
respx.get(f"{BASE_URL}/space/space_1/folder").mock(
return_value=httpx.Response(
200,
json={
"folders": [
{
"id": "f1",
"name": "Acme Corp",
"lists": [{"id": "list_overall", "name": "Overall"}],
},
]
},
)
)
client = ClickUpClient(api_token="pk_test")
result = client.find_list_in_folder("space_1", "acme corp")
assert result == "list_overall"
client.close()
@respx.mock
def test_find_list_in_folder_not_found(self):
respx.get(f"{BASE_URL}/space/space_1/folder").mock(
return_value=httpx.Response(
200,
json={
"folders": [
{
"id": "f1",
"name": "Acme Corp",
"lists": [{"id": "list_1", "name": "Overall"}],
},
]
},
)
)
client = ClickUpClient(api_token="pk_test")
result = client.find_list_in_folder("space_1", "NonExistent Client")
assert result is None
client.close()
@respx.mock
def test_set_custom_field_smart_dropdown(self):
"""Resolves dropdown option name to UUID automatically."""
respx.get(f"{BASE_URL}/list/list_1/field").mock(
return_value=httpx.Response(
200,
json={
"fields": [
{
"id": "cf_lb",
"name": "LB Method",
"type": "drop_down",
"type_config": {
"options": [
{"id": "opt_cora", "name": "Cora Backlinks"},
{"id": "opt_manual", "name": "Manual"},
]
},
},
]
},
)
)
respx.post(f"{BASE_URL}/task/t1/field/cf_lb").mock(
return_value=httpx.Response(200, json={})
)
client = ClickUpClient(api_token="pk_test")
result = client.set_custom_field_smart(
"t1", "list_1", "LB Method", "Cora Backlinks"
)
assert result is True
import json
body = json.loads(respx.calls.last.request.content)
assert body["value"] == "opt_cora"
client.close()
@respx.mock
def test_set_custom_field_smart_text(self):
"""Passes text field values through without resolution."""
respx.get(f"{BASE_URL}/list/list_1/field").mock(
return_value=httpx.Response(
200,
json={
"fields": [
{
"id": "cf_kw",
"name": "Keyword",
"type": "short_text",
},
]
},
)
)
respx.post(f"{BASE_URL}/task/t1/field/cf_kw").mock(
return_value=httpx.Response(200, json={})
)
client = ClickUpClient(api_token="pk_test")
result = client.set_custom_field_smart(
"t1", "list_1", "Keyword", "shaft manufacturing"
)
assert result is True
import json
body = json.loads(respx.calls.last.request.content)
assert body["value"] == "shaft manufacturing"
client.close()

View File

@ -1,180 +1,147 @@
"""Tests for the ClickUp chat tools (API-backed, no KV store).""" """Tests for the ClickUp chat tools."""
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass, field import json
from unittest.mock import MagicMock, patch
from cheddahbot.tools.clickup_tool import ( from cheddahbot.tools.clickup_tool import (
clickup_list_tasks, clickup_list_tasks,
clickup_query_tasks, clickup_reset_all,
clickup_reset_task, clickup_reset_task,
clickup_task_status, clickup_task_status,
get_active_tasks,
) )
@dataclass def _make_ctx(db):
class FakeTask: return {"db": db}
id: str = "t1"
name: str = "Test Task"
status: str = "to do"
task_type: str = "Press Release"
url: str = "https://app.clickup.com/t/t1"
due_date: str = ""
date_updated: str = ""
tags: list = field(default_factory=list)
custom_fields: dict = field(default_factory=dict)
def _make_ctx(): def _seed_task(db, task_id, state, **overrides):
config = MagicMock() """Insert a task state into kv_store."""
config.clickup.api_token = "test-token" data = {
config.clickup.workspace_id = "ws1" "state": state,
config.clickup.space_id = "sp1" "clickup_task_id": task_id,
config.clickup.task_type_field_name = "Work Category" "clickup_task_name": f"Task {task_id}",
config.clickup.automation_status = "automation underway" "task_type": "Press Release",
config.clickup.review_status = "internal review" "skill_name": "write_press_releases",
config.clickup.error_status = "error" "discovered_at": "2026-01-01T00:00:00",
config.clickup.poll_statuses = ["to do"] "started_at": None,
return {"config": config, "db": MagicMock()} "completed_at": None,
"error": None,
"deliverable_paths": [],
class TestClickupQueryTasks: "custom_fields": {},
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") }
def test_returns_tasks(self, mock_client_fn): data.update(overrides)
mock_client = MagicMock() db.kv_set(f"clickup:task:{task_id}:state", json.dumps(data))
mock_client.get_tasks_from_space.return_value = [
FakeTask(id="t1", name="PR Task", task_type="Press Release"),
]
mock_client_fn.return_value = mock_client
result = clickup_query_tasks(ctx=_make_ctx())
assert "PR Task" in result
assert "t1" in result
@patch("cheddahbot.tools.clickup_tool._get_clickup_client")
def test_no_tasks_found(self, mock_client_fn):
mock_client = MagicMock()
mock_client.get_tasks_from_space.return_value = []
mock_client_fn.return_value = mock_client
result = clickup_query_tasks(ctx=_make_ctx())
assert "No tasks found" in result
class TestClickupListTasks: class TestClickupListTasks:
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") def test_empty_when_no_tasks(self, tmp_db):
def test_lists_automation_tasks(self, mock_client_fn): result = clickup_list_tasks(ctx=_make_ctx(tmp_db))
mock_client = MagicMock() assert "No ClickUp tasks" in result
mock_client.get_tasks_from_space.return_value = [
FakeTask(id="t1", name="Active Task", status="automation underway"),
]
mock_client_fn.return_value = mock_client
result = clickup_list_tasks(ctx=_make_ctx()) def test_lists_all_tracked_tasks(self, tmp_db):
assert "Active Task" in result _seed_task(tmp_db, "a1", "discovered")
assert "t1" in result _seed_task(tmp_db, "a2", "approved")
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") result = clickup_list_tasks(ctx=_make_ctx(tmp_db))
def test_no_automation_tasks(self, mock_client_fn):
mock_client = MagicMock()
mock_client.get_tasks_from_space.return_value = []
mock_client_fn.return_value = mock_client
result = clickup_list_tasks(ctx=_make_ctx()) assert "a1" in result
assert "No tasks found" in result assert "a2" in result
assert "2" in result # count
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") def test_filter_by_status(self, tmp_db):
def test_filter_by_status(self, mock_client_fn): _seed_task(tmp_db, "a1", "discovered")
mock_client = MagicMock() _seed_task(tmp_db, "a2", "approved")
mock_client.get_tasks_from_space.return_value = [ _seed_task(tmp_db, "a3", "completed")
FakeTask(id="t1", name="Error Task", status="error"),
]
mock_client_fn.return_value = mock_client
result = clickup_list_tasks(status="error", ctx=_make_ctx()) result = clickup_list_tasks(status="approved", ctx=_make_ctx(tmp_db))
assert "Error Task" in result
assert "a2" in result
assert "a1" not in result
assert "a3" not in result
def test_filter_returns_empty_message(self, tmp_db):
_seed_task(tmp_db, "a1", "discovered")
result = clickup_list_tasks(status="completed", ctx=_make_ctx(tmp_db))
assert "No ClickUp tasks with state" in result
class TestClickupTaskStatus: class TestClickupTaskStatus:
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") def test_shows_details(self, tmp_db):
def test_shows_details(self, mock_client_fn): _seed_task(tmp_db, "a1", "executing", started_at="2026-01-01T12:00:00")
mock_client = MagicMock()
mock_client.get_task.return_value = FakeTask(
id="t1",
name="My Task",
status="automation underway",
task_type="Press Release",
)
mock_client_fn.return_value = mock_client
result = clickup_task_status(task_id="t1", ctx=_make_ctx()) result = clickup_task_status(task_id="a1", ctx=_make_ctx(tmp_db))
assert "My Task" in result
assert "automation underway" in result assert "Task a1" in result
assert "executing" in result
assert "Press Release" in result assert "Press Release" in result
assert "2026-01-01T12:00:00" in result
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") def test_unknown_task(self, tmp_db):
def test_api_error(self, mock_client_fn): result = clickup_task_status(task_id="nonexistent", ctx=_make_ctx(tmp_db))
mock_client = MagicMock()
mock_client.get_task.side_effect = Exception("Not found")
mock_client_fn.return_value = mock_client
result = clickup_task_status(task_id="bad", ctx=_make_ctx()) assert "No tracked state" in result
assert "Error" in result
def test_shows_error_when_failed(self, tmp_db):
_seed_task(tmp_db, "f1", "failed", error="API timeout")
result = clickup_task_status(task_id="f1", ctx=_make_ctx(tmp_db))
assert "API timeout" in result
def test_shows_deliverables(self, tmp_db):
_seed_task(tmp_db, "c1", "completed", deliverable_paths=["/data/pr1.txt", "/data/pr2.txt"])
result = clickup_task_status(task_id="c1", ctx=_make_ctx(tmp_db))
assert "/data/pr1.txt" in result
class TestClickupResetTask: class TestClickupResetTask:
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") def test_resets_failed_task(self, tmp_db):
def test_resets_task(self, mock_client_fn): _seed_task(tmp_db, "f1", "failed")
mock_client = MagicMock()
mock_client_fn.return_value = mock_client
result = clickup_reset_task(task_id="t1", ctx=_make_ctx()) result = clickup_reset_task(task_id="f1", ctx=_make_ctx(tmp_db))
assert "reset" in result.lower()
mock_client.update_task_status.assert_called_once_with("t1", "to do")
mock_client.add_comment.assert_called_once()
@patch("cheddahbot.tools.clickup_tool._get_clickup_client") assert "cleared" in result.lower()
def test_api_error(self, mock_client_fn): assert tmp_db.kv_get("clickup:task:f1:state") is None
mock_client = MagicMock()
mock_client.update_task_status.side_effect = Exception("API error")
mock_client_fn.return_value = mock_client
result = clickup_reset_task(task_id="t1", ctx=_make_ctx()) def test_resets_completed_task(self, tmp_db):
assert "Error" in result _seed_task(tmp_db, "c1", "completed")
result = clickup_reset_task(task_id="c1", ctx=_make_ctx(tmp_db))
assert "cleared" in result.lower()
assert tmp_db.kv_get("clickup:task:c1:state") is None
def test_unknown_task(self, tmp_db):
result = clickup_reset_task(task_id="nope", ctx=_make_ctx(tmp_db))
assert "Nothing to reset" in result
class TestGetActiveTasks: class TestClickupResetAll:
def test_no_scheduler(self): def test_clears_all_states(self, tmp_db):
result = get_active_tasks(ctx={"config": MagicMock()}) _seed_task(tmp_db, "a1", "completed")
assert "not available" in result.lower() _seed_task(tmp_db, "a2", "failed")
_seed_task(tmp_db, "a3", "executing")
def test_nothing_running(self): result = clickup_reset_all(ctx=_make_ctx(tmp_db))
scheduler = MagicMock()
scheduler.get_active_executions.return_value = {}
scheduler.get_loop_timestamps.return_value = {"clickup": None, "folder_watch": None}
result = get_active_tasks(ctx={"scheduler": scheduler}) assert "3" in result
assert "No tasks actively executing" in result assert tmp_db.kv_get("clickup:task:a1:state") is None
assert "Safe to restart: Yes" in result assert tmp_db.kv_get("clickup:task:a2:state") is None
assert tmp_db.kv_get("clickup:task:a3:state") is None
def test_tasks_running(self): def test_clears_legacy_active_ids(self, tmp_db):
from datetime import UTC, datetime, timedelta tmp_db.kv_set("clickup:active_task_ids", json.dumps(["a1", "a2"]))
scheduler = MagicMock() clickup_reset_all(ctx=_make_ctx(tmp_db))
scheduler.get_active_executions.return_value = {
"t1": {
"name": "Press Release for Acme",
"tool": "write_press_releases",
"started_at": datetime.now(UTC) - timedelta(minutes=5),
"thread": "clickup_thread",
}
}
scheduler.get_loop_timestamps.return_value = {"clickup": datetime.now(UTC).isoformat()}
result = get_active_tasks(ctx={"scheduler": scheduler}) assert tmp_db.kv_get("clickup:active_task_ids") is None
assert "Active Executions (1)" in result
assert "Press Release for Acme" in result def test_empty_returns_zero(self, tmp_db):
assert "write_press_releases" in result result = clickup_reset_all(ctx=_make_ctx(tmp_db))
assert "Safe to restart: No" in result assert "0" in result

View File

@ -1,994 +0,0 @@
"""Tests for the content creation pipeline tool."""
from __future__ import annotations
import json
from pathlib import Path
from unittest.mock import MagicMock, patch
from cheddahbot.config import Config, ContentConfig
from cheddahbot.tools.content_creation import (
_build_optimization_prompt,
_build_phase1_prompt,
_build_phase2_prompt,
_finalize_optimization,
_find_cora_report,
_run_optimization,
_save_content,
_slugify,
_sync_clickup_optimization_complete,
continue_content,
create_content,
)
# ---------------------------------------------------------------------------
# _slugify
# ---------------------------------------------------------------------------
def test_slugify_basic():
assert _slugify("Plumbing Services") == "plumbing-services"
def test_slugify_special_chars():
assert _slugify("AC Repair & Maintenance!") == "ac-repair-maintenance"
def test_slugify_truncates():
long = "a" * 200
assert len(_slugify(long)) <= 80
# ---------------------------------------------------------------------------
# _build_phase1_prompt
# ---------------------------------------------------------------------------
class TestBuildPhase1Prompt:
def test_contains_trigger_keywords(self):
prompt = _build_phase1_prompt(
"https://example.com/plumbing",
"plumbing services",
"service page",
"",
"",
)
assert "on-page optimization" in prompt
assert "plumbing services" in prompt
assert "https://example.com/plumbing" in prompt
def test_includes_cora_path(self):
prompt = _build_phase1_prompt(
"https://example.com",
"keyword",
"blog post",
"Z:/cora/report.xlsx",
"",
)
assert "Z:/cora/report.xlsx" in prompt
assert "Cora SEO report" in prompt
def test_includes_capabilities_default(self):
default = "Verify on website."
prompt = _build_phase1_prompt(
"https://example.com",
"keyword",
"service page",
"",
default,
)
assert default in prompt
assert "company capabilities" in prompt
def test_no_cora_no_capabilities(self):
prompt = _build_phase1_prompt(
"https://example.com",
"keyword",
"service page",
"",
"",
)
assert "Cora SEO report" not in prompt
assert "company capabilities" not in prompt
# ---------------------------------------------------------------------------
# _build_phase2_prompt
# ---------------------------------------------------------------------------
class TestBuildPhase2Prompt:
def test_contains_outline(self):
outline = "## Section 1\nContent here."
prompt = _build_phase2_prompt(
"https://example.com",
"plumbing",
outline,
"",
)
assert outline in prompt
assert "writing phase" in prompt
assert "plumbing" in prompt
def test_includes_cora_path(self):
prompt = _build_phase2_prompt(
"https://example.com",
"keyword",
"outline text",
"Z:/cora/report.xlsx",
)
assert "Z:/cora/report.xlsx" in prompt
def test_no_cora(self):
prompt = _build_phase2_prompt(
"https://example.com",
"keyword",
"outline text",
"",
)
assert "Cora SEO report" not in prompt
# ---------------------------------------------------------------------------
# _find_cora_report
# ---------------------------------------------------------------------------
class TestFindCoraReport:
def test_empty_inbox(self, tmp_path):
assert _find_cora_report("keyword", str(tmp_path)) == ""
def test_nonexistent_path(self):
assert _find_cora_report("keyword", "/nonexistent/path") == ""
def test_empty_keyword(self, tmp_path):
assert _find_cora_report("", str(tmp_path)) == ""
def test_exact_match(self, tmp_path):
report = tmp_path / "plumbing services.xlsx"
report.touch()
result = _find_cora_report("plumbing services", str(tmp_path))
assert result == str(report)
def test_substring_match(self, tmp_path):
report = tmp_path / "plumbing-services-city.xlsx"
report.touch()
result = _find_cora_report("plumbing services", str(tmp_path))
# "plumbing services" is a substring of "plumbing-services-city"
assert result == str(report)
def test_word_overlap(self, tmp_path):
report = tmp_path / "residential-plumbing-repair.xlsx"
report.touch()
result = _find_cora_report("plumbing repair", str(tmp_path))
assert result == str(report)
def test_skips_temp_files(self, tmp_path):
(tmp_path / "~$report.xlsx").touch()
(tmp_path / "actual-report.xlsx").touch()
result = _find_cora_report("actual report", str(tmp_path))
assert "~$" not in result
assert "actual-report" in result
def test_no_match(self, tmp_path):
(tmp_path / "completely-unrelated.xlsx").touch()
result = _find_cora_report("plumbing services", str(tmp_path))
assert result == ""
# ---------------------------------------------------------------------------
# _save_content
# ---------------------------------------------------------------------------
class TestSaveContent:
def _make_config(self, outline_dir: str = "") -> Config:
cfg = Config()
cfg.content = ContentConfig(outline_dir=outline_dir)
return cfg
def test_saves_to_primary_path(self, tmp_path):
cfg = self._make_config(str(tmp_path / "outlines"))
path = _save_content("# Outline", "plumbing services", "outline.md", cfg)
assert "outlines" in path
assert Path(path).read_text(encoding="utf-8") == "# Outline"
def test_falls_back_to_local(self, tmp_path):
# Point to an invalid network path
cfg = self._make_config("\\\\nonexistent\\share\\outlines")
with patch(
"cheddahbot.tools.content_creation._LOCAL_CONTENT_DIR",
tmp_path / "local",
):
path = _save_content("# Outline", "plumbing", "outline.md", cfg)
assert str(tmp_path / "local") in path
assert Path(path).read_text(encoding="utf-8") == "# Outline"
def test_empty_outline_dir_uses_local(self, tmp_path):
cfg = self._make_config("")
with patch(
"cheddahbot.tools.content_creation._LOCAL_CONTENT_DIR",
tmp_path / "local",
):
path = _save_content("content", "keyword", "outline.md", cfg)
assert str(tmp_path / "local") in path
# ---------------------------------------------------------------------------
# create_content — Phase 1
# ---------------------------------------------------------------------------
class TestCreateContentPhase1:
def _make_ctx(self, tmp_db, tmp_path):
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.return_value = "## Generated Outline\nSection 1..."
return {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "task123",
}
def test_requires_keyword(self, tmp_db):
ctx = {"agent": MagicMock(), "config": Config(), "db": tmp_db}
assert create_content(keyword="", ctx=ctx).startswith("Error:")
def test_requires_context(self):
assert create_content(keyword="kw", url="http://x", ctx=None).startswith("Error:")
def test_phase1_runs_for_new_content(self, tmp_db, tmp_path):
ctx = self._make_ctx(tmp_db, tmp_path)
result = create_content(
keyword="plumbing services",
ctx=ctx,
)
assert "Phase 1 Complete" in result
assert "outline" in result.lower()
ctx["agent"].execute_task.assert_called_once()
call_kwargs = ctx["agent"].execute_task.call_args
assert call_kwargs.kwargs.get("skip_permissions") is True
def test_phase1_saves_outline_file(self, tmp_db, tmp_path):
ctx = self._make_ctx(tmp_db, tmp_path)
create_content(
keyword="plumbing services",
ctx=ctx,
)
# The outline should have been saved
outline_dir = tmp_path / "outlines" / "plumbing-services"
assert outline_dir.exists()
saved = (outline_dir / "outline.md").read_text(encoding="utf-8")
assert saved == "## Generated Outline\nSection 1..."
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase1_syncs_clickup(self, mock_get_client, tmp_db, tmp_path):
mock_client = MagicMock()
mock_get_client.return_value = mock_client
ctx = self._make_ctx(tmp_db, tmp_path)
create_content(
keyword="plumbing services",
ctx=ctx,
)
# Verify outline review status was set and OutlinePath was stored
mock_client.update_task_status.assert_any_call("task123", "outline review")
mock_client.set_custom_field_by_name.assert_called_once()
call_args = mock_client.set_custom_field_by_name.call_args
assert call_args[0][0] == "task123"
assert call_args[0][1] == "OutlinePath"
def test_phase1_includes_clickup_sync_marker(self, tmp_db, tmp_path):
ctx = self._make_ctx(tmp_db, tmp_path)
result = create_content(
keyword="test keyword",
ctx=ctx,
)
assert "## ClickUp Sync" in result
# ---------------------------------------------------------------------------
# create_content — Phase 2
# ---------------------------------------------------------------------------
class TestCreateContentPhase2:
def _setup_phase2(self, tmp_db, tmp_path):
"""Set up outline file and return (ctx, outline_path)."""
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
# Create the outline file
outline_dir = tmp_path / "outlines" / "plumbing-services"
outline_dir.mkdir(parents=True)
outline_file = outline_dir / "outline.md"
outline_file.write_text("## Approved Outline\nSection content here.", encoding="utf-8")
agent = MagicMock()
agent.execute_task.return_value = "# Full Content\nParagraph..."
ctx = {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "task456",
}
return ctx, str(outline_file)
def _make_phase2_client(self, outline_path):
"""Create a mock ClickUp client that triggers Phase 2 detection."""
mock_client = MagicMock()
mock_task = MagicMock()
mock_task.status = "outline approved"
mock_client.get_task.return_value = mock_task
mock_client.get_custom_field_by_name.return_value = outline_path
return mock_client
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase2_detects_outline_approved_status(self, mock_get_client, tmp_db, tmp_path):
ctx, outline_path = self._setup_phase2(tmp_db, tmp_path)
mock_get_client.return_value = self._make_phase2_client(outline_path)
result = create_content(
keyword="plumbing services",
ctx=ctx,
)
assert "Phase 2 Complete" in result
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase2_reads_outline(self, mock_get_client, tmp_db, tmp_path):
ctx, outline_path = self._setup_phase2(tmp_db, tmp_path)
mock_get_client.return_value = self._make_phase2_client(outline_path)
create_content(
keyword="plumbing services",
ctx=ctx,
)
call_args = ctx["agent"].execute_task.call_args
prompt = call_args.args[0] if call_args.args else call_args.kwargs.get("prompt", "")
assert "Approved Outline" in prompt
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase2_saves_content_file(self, mock_get_client, tmp_db, tmp_path):
ctx, outline_path = self._setup_phase2(tmp_db, tmp_path)
mock_get_client.return_value = self._make_phase2_client(outline_path)
create_content(
keyword="plumbing services",
ctx=ctx,
)
content_file = tmp_path / "outlines" / "plumbing-services" / "final-content.md"
assert content_file.exists()
assert content_file.read_text(encoding="utf-8") == "# Full Content\nParagraph..."
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase2_syncs_clickup_complete(self, mock_get_client, tmp_db, tmp_path):
ctx, outline_path = self._setup_phase2(tmp_db, tmp_path)
mock_client = self._make_phase2_client(outline_path)
mock_get_client.return_value = mock_client
create_content(
keyword="plumbing services",
ctx=ctx,
)
# Verify ClickUp was synced to internal review
mock_client.update_task_status.assert_any_call("task456", "internal review")
mock_client.add_comment.assert_called()
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase2_includes_clickup_sync_marker(self, mock_get_client, tmp_db, tmp_path):
ctx, outline_path = self._setup_phase2(tmp_db, tmp_path)
mock_get_client.return_value = self._make_phase2_client(outline_path)
result = create_content(
keyword="plumbing services",
ctx=ctx,
)
assert "## ClickUp Sync" in result
# ---------------------------------------------------------------------------
# continue_content
# ---------------------------------------------------------------------------
class TestContinueContent:
def test_requires_keyword(self, tmp_db):
ctx = {"agent": MagicMock(), "db": tmp_db, "config": Config()}
assert continue_content(keyword="", ctx=ctx).startswith("Error:")
def test_no_matching_entry(self, tmp_db):
ctx = {"agent": MagicMock(), "db": tmp_db, "config": Config()}
result = continue_content(keyword="nonexistent", ctx=ctx)
assert "No outline awaiting review" in result
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_finds_and_runs_phase2(self, mock_get_client, tmp_db, tmp_path):
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
cfg.clickup.space_id = "sp1"
# Create outline file
outline_dir = tmp_path / "outlines" / "plumbing-services"
outline_dir.mkdir(parents=True)
outline_file = outline_dir / "outline.md"
outline_file.write_text("## Outline", encoding="utf-8")
# Mock ClickUp client — returns a task matching the keyword
mock_client = MagicMock()
mock_task = MagicMock()
mock_task.id = "task789"
mock_task.custom_fields = {
"Keyword": "plumbing services",
"IMSURL": "https://example.com",
}
mock_client.get_tasks_from_space.return_value = [mock_task]
mock_client.get_custom_field_by_name.return_value = str(outline_file)
mock_get_client.return_value = mock_client
agent = MagicMock()
agent.execute_task.return_value = "# Full content"
ctx = {"agent": agent, "db": tmp_db, "config": cfg}
result = continue_content(keyword="plumbing services", ctx=ctx)
assert "Phase 2 Complete" in result
# ---------------------------------------------------------------------------
# Error propagation
# ---------------------------------------------------------------------------
class TestErrorPropagation:
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase1_execution_error_syncs_clickup(self, mock_get_client, tmp_db, tmp_path):
mock_client = MagicMock()
mock_get_client.return_value = mock_client
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.side_effect = RuntimeError("CLI crashed")
ctx = {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "task_err",
}
result = create_content(
keyword="test",
ctx=ctx,
)
assert "Error:" in result
# Verify ClickUp was notified of the failure
mock_client.update_task_status.assert_any_call("task_err", "error")
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_phase1_error_return_syncs_clickup(self, mock_get_client, tmp_db, tmp_path):
mock_client = MagicMock()
mock_get_client.return_value = mock_client
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.return_value = "Error: something went wrong"
ctx = {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "task_err2",
}
result = create_content(
keyword="test",
ctx=ctx,
)
assert result.startswith("Error:")
# Verify ClickUp was notified of the failure
mock_client.update_task_status.assert_any_call("task_err2", "error")
# ---------------------------------------------------------------------------
# _build_optimization_prompt
# ---------------------------------------------------------------------------
class TestBuildOptimizationPrompt:
def test_contains_url_and_keyword(self):
prompt = _build_optimization_prompt(
url="https://example.com/plumbing",
keyword="plumbing services",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
assert "https://example.com/plumbing" in prompt
assert "plumbing services" in prompt
def test_contains_cora_path(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
assert "Z:/cora/report.xlsx" in prompt
def test_contains_all_script_commands(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
assert "competitor_scraper.py" in prompt
assert "test_block_prep.py" in prompt
assert "test_block_generator.py" in prompt
assert "test_block_validate.py" in prompt
def test_contains_step8_instructions(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
assert "optimization_instructions.md" in prompt
assert "Heading Changes" in prompt
assert "Entity Integration Points" in prompt
assert "Meta Tag Updates" in prompt
assert "Priority Ranking" in prompt
def test_service_page_note(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
is_service_page=True,
capabilities_default="Check website.",
)
assert "service page" in prompt
assert "Check website." in prompt
def test_no_service_page_note_by_default(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
assert "service page" not in prompt.lower().split("step")[0]
def test_all_eight_steps_present(self):
prompt = _build_optimization_prompt(
url="https://example.com",
keyword="kw",
cora_path="Z:/cora/report.xlsx",
work_dir="/tmp/work",
scripts_dir="/scripts",
)
for step_num in range(1, 9):
assert f"Step {step_num}" in prompt
# ---------------------------------------------------------------------------
# _run_optimization
# ---------------------------------------------------------------------------
class TestRunOptimization:
def _make_ctx(self, tmp_db, tmp_path):
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.return_value = "Optimization complete"
return {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "opt_task_1",
}
def test_fails_without_cora_report(self, tmp_db, tmp_path):
ctx = self._make_ctx(tmp_db, tmp_path)
result = _run_optimization(
agent=ctx["agent"],
config=ctx["config"],
ctx=ctx,
task_id="opt_task_1",
url="https://example.com",
keyword="plumbing services",
cora_path="",
)
assert "Error:" in result
assert "Cora report" in result
@patch("cheddahbot.tools.content_creation._sync_clickup_fail")
def test_syncs_clickup_on_missing_cora(self, mock_fail, tmp_db, tmp_path):
ctx = self._make_ctx(tmp_db, tmp_path)
_run_optimization(
agent=ctx["agent"],
config=ctx["config"],
ctx=ctx,
task_id="opt_task_1",
url="https://example.com",
keyword="plumbing services",
cora_path="",
)
mock_fail.assert_called_once()
assert mock_fail.call_args[0][1] == "opt_task_1"
@patch("cheddahbot.tools.content_creation._finalize_optimization")
@patch("cheddahbot.tools.content_creation._sync_clickup_start")
def test_creates_work_dir_and_calls_execute(
self, mock_start, mock_finalize, tmp_db, tmp_path
):
ctx = self._make_ctx(tmp_db, tmp_path)
mock_finalize.return_value = "finalized"
with patch(
"cheddahbot.tools.content_creation._LOCAL_CONTENT_DIR",
tmp_path / "content",
):
result = _run_optimization(
agent=ctx["agent"],
config=ctx["config"],
ctx=ctx,
task_id="opt_task_1",
url="https://example.com/plumbing",
keyword="plumbing services",
cora_path="Z:/cora/report.xlsx",
)
ctx["agent"].execute_task.assert_called_once()
mock_start.assert_called_once_with(ctx, "opt_task_1")
mock_finalize.assert_called_once()
assert result == "finalized"
@patch("cheddahbot.tools.content_creation._sync_clickup_fail")
@patch("cheddahbot.tools.content_creation._sync_clickup_start")
def test_syncs_clickup_on_execution_error(
self, mock_start, mock_fail, tmp_db, tmp_path
):
ctx = self._make_ctx(tmp_db, tmp_path)
ctx["agent"].execute_task.side_effect = RuntimeError("CLI crashed")
with patch(
"cheddahbot.tools.content_creation._LOCAL_CONTENT_DIR",
tmp_path / "content",
):
result = _run_optimization(
agent=ctx["agent"],
config=ctx["config"],
ctx=ctx,
task_id="opt_task_1",
url="https://example.com",
keyword="plumbing services",
cora_path="Z:/cora/report.xlsx",
)
assert "Error:" in result
mock_fail.assert_called_once()
# ---------------------------------------------------------------------------
# _finalize_optimization
# ---------------------------------------------------------------------------
class TestFinalizeOptimization:
def _make_config(self, outline_dir: str = "") -> Config:
cfg = Config()
cfg.content = ContentConfig(outline_dir=outline_dir)
return cfg
def test_errors_on_missing_test_block(self, tmp_path):
work_dir = tmp_path / "work"
work_dir.mkdir()
# Only create instructions, not test_block.html
(work_dir / "optimization_instructions.md").write_text("instructions")
cfg = self._make_config()
result = _finalize_optimization(
ctx=None,
config=cfg,
task_id="",
keyword="kw",
url="https://example.com",
work_dir=work_dir,
exec_result="done",
)
assert "Error:" in result
assert "test_block.html" in result
def test_errors_on_missing_instructions(self, tmp_path):
work_dir = tmp_path / "work"
work_dir.mkdir()
# Only create test_block, not instructions
(work_dir / "test_block.html").write_text("<div>block</div>")
cfg = self._make_config()
result = _finalize_optimization(
ctx=None,
config=cfg,
task_id="",
keyword="kw",
url="https://example.com",
work_dir=work_dir,
exec_result="done",
)
assert "Error:" in result
assert "optimization_instructions.md" in result
def test_succeeds_with_required_files(self, tmp_path):
work_dir = tmp_path / "work"
work_dir.mkdir()
(work_dir / "test_block.html").write_text("<div>block</div>")
(work_dir / "optimization_instructions.md").write_text("# Instructions")
cfg = self._make_config()
result = _finalize_optimization(
ctx=None,
config=cfg,
task_id="",
keyword="plumbing services",
url="https://example.com",
work_dir=work_dir,
exec_result="all done",
)
assert "Optimization Complete" in result
assert "plumbing services" in result
assert "test_block.html" in result
def test_copies_to_network_path(self, tmp_path):
work_dir = tmp_path / "work"
work_dir.mkdir()
(work_dir / "test_block.html").write_text("<div>block</div>")
(work_dir / "optimization_instructions.md").write_text("# Instructions")
net_dir = tmp_path / "network"
cfg = self._make_config(str(net_dir))
_finalize_optimization(
ctx=None,
config=cfg,
task_id="",
keyword="plumbing services",
url="https://example.com",
work_dir=work_dir,
exec_result="done",
)
assert (net_dir / "plumbing-services" / "test_block.html").exists()
assert (net_dir / "plumbing-services" / "optimization_instructions.md").exists()
@patch("cheddahbot.tools.content_creation._sync_clickup_optimization_complete")
def test_syncs_clickup_when_task_id_present(self, mock_sync, tmp_path, tmp_db):
work_dir = tmp_path / "work"
work_dir.mkdir()
(work_dir / "test_block.html").write_text("<div>block</div>")
(work_dir / "optimization_instructions.md").write_text("# Instructions")
cfg = self._make_config()
ctx = {"config": cfg, "db": tmp_db}
_finalize_optimization(
ctx=ctx,
config=cfg,
task_id="task_fin",
keyword="kw",
url="https://example.com",
work_dir=work_dir,
exec_result="done",
)
mock_sync.assert_called_once()
call_kwargs = mock_sync.call_args.kwargs
assert call_kwargs["task_id"] == "task_fin"
assert "test_block.html" in call_kwargs["found_files"]
assert "optimization_instructions.md" in call_kwargs["found_files"]
# ---------------------------------------------------------------------------
# _sync_clickup_optimization_complete
# ---------------------------------------------------------------------------
class TestSyncClickupOptimizationComplete:
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_uploads_files_and_posts_comment(self, mock_get_client, tmp_path):
mock_client = MagicMock()
mock_get_client.return_value = mock_client
work_dir = tmp_path / "work"
work_dir.mkdir()
tb_path = work_dir / "test_block.html"
tb_path.write_text("<div>block</div>")
inst_path = work_dir / "optimization_instructions.md"
inst_path.write_text("# Instructions")
val_path = work_dir / "validation_report.json"
val_path.write_text(json.dumps({"summary": "All metrics improved."}))
cfg = Config()
ctx = {"config": cfg}
found_files = {
"test_block.html": tb_path,
"optimization_instructions.md": inst_path,
"validation_report.json": val_path,
}
_sync_clickup_optimization_complete(
ctx=ctx,
config=cfg,
task_id="task_sync",
keyword="plumbing",
url="https://example.com",
found_files=found_files,
work_dir=work_dir,
)
# 3 file uploads
assert mock_client.upload_attachment.call_count == 3
# Comment posted
mock_client.add_comment.assert_called_once()
comment = mock_client.add_comment.call_args[0][1]
assert "plumbing" in comment
assert "All metrics improved." in comment
assert "Next Steps" in comment
# Status set to internal review
mock_client.update_task_status.assert_called_once_with(
"task_sync", cfg.clickup.review_status
)
@patch("cheddahbot.tools.content_creation._get_clickup_client")
def test_handles_no_validation_report(self, mock_get_client, tmp_path):
mock_client = MagicMock()
mock_get_client.return_value = mock_client
work_dir = tmp_path / "work"
work_dir.mkdir()
tb_path = work_dir / "test_block.html"
tb_path.write_text("<div>block</div>")
inst_path = work_dir / "optimization_instructions.md"
inst_path.write_text("# Instructions")
cfg = Config()
ctx = {"config": cfg}
found_files = {
"test_block.html": tb_path,
"optimization_instructions.md": inst_path,
}
_sync_clickup_optimization_complete(
ctx=ctx,
config=cfg,
task_id="task_sync2",
keyword="kw",
url="https://example.com",
found_files=found_files,
work_dir=work_dir,
)
# 2 uploads (no validation_report.json)
assert mock_client.upload_attachment.call_count == 2
mock_client.add_comment.assert_called_once()
def test_noop_without_task_id(self, tmp_path):
"""No ClickUp sync when task_id is empty."""
work_dir = tmp_path / "work"
work_dir.mkdir()
cfg = Config()
# Should not raise
_sync_clickup_optimization_complete(
ctx={"config": cfg},
config=cfg,
task_id="",
keyword="kw",
url="https://example.com",
found_files={},
work_dir=work_dir,
)
# ---------------------------------------------------------------------------
# create_content — Routing (URL → optimization vs new content → phases)
# ---------------------------------------------------------------------------
class TestCreateContentRouting:
@patch("cheddahbot.tools.content_creation._run_optimization")
def test_explicit_optimization_routes_correctly(self, mock_opt, tmp_db, tmp_path):
"""When content_type='on page optimization', routes to _run_optimization."""
mock_opt.return_value = "## Optimization Complete"
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
ctx = {
"agent": MagicMock(),
"config": cfg,
"db": tmp_db,
"clickup_task_id": "routing_test",
}
result = create_content(
keyword="plumbing services",
url="https://example.com/plumbing",
content_type="on page optimization",
ctx=ctx,
)
mock_opt.assert_called_once()
assert result == "## Optimization Complete"
@patch("cheddahbot.tools.content_creation._run_optimization")
def test_explicit_new_content_with_url_routes_to_phase1(self, mock_opt, tmp_db, tmp_path):
"""Content Creation with URL should go to Phase 1, NOT optimization."""
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.return_value = "## Outline"
ctx = {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "",
}
result = create_content(
keyword="new keyword",
url="https://example.com/future-page",
content_type="new content",
ctx=ctx,
)
mock_opt.assert_not_called()
assert "Phase 1 Complete" in result
@patch("cheddahbot.tools.content_creation._run_optimization")
def test_optimization_without_url_returns_error(self, mock_opt, tmp_db, tmp_path):
"""On Page Optimization without URL should return an error."""
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
ctx = {
"agent": MagicMock(),
"config": cfg,
"db": tmp_db,
"clickup_task_id": "",
}
result = create_content(
keyword="plumbing services",
url="",
content_type="on page optimization",
ctx=ctx,
)
mock_opt.assert_not_called()
assert "Error" in result
assert "URL" in result
@patch("cheddahbot.tools.content_creation._run_optimization")
def test_fallback_url_routes_to_optimization(self, mock_opt, tmp_db, tmp_path):
"""When content_type is empty and URL present, falls back to optimization."""
mock_opt.return_value = "## Optimization Complete"
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
ctx = {
"agent": MagicMock(),
"config": cfg,
"db": tmp_db,
"clickup_task_id": "routing_test",
}
result = create_content(
keyword="plumbing services",
url="https://example.com/plumbing",
content_type="",
ctx=ctx,
)
mock_opt.assert_called_once()
assert result == "## Optimization Complete"
@patch("cheddahbot.tools.content_creation._run_optimization")
def test_new_content_still_calls_phase1(self, mock_opt, tmp_db, tmp_path):
"""Regression: new content (no URL, no content_type) still goes through _run_phase1."""
cfg = Config()
cfg.content = ContentConfig(outline_dir=str(tmp_path / "outlines"))
agent = MagicMock()
agent.execute_task.return_value = "## Generated Outline\nContent..."
ctx = {
"agent": agent,
"config": cfg,
"db": tmp_db,
"clickup_task_id": "",
}
create_content(
keyword="new topic",
url="",
ctx=ctx,
)
mock_opt.assert_not_called()
agent.execute_task.assert_called_once()
# Verify it's the phase 1 prompt (new content path)
call_args = agent.execute_task.call_args
prompt = call_args.args[0] if call_args.args else call_args.kwargs.get("prompt", "")
assert "new content creation project" in prompt

View File

@ -1,233 +0,0 @@
"""Tests for the Cora distribution watcher (scheduler._distribute_cora_file)."""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
from unittest.mock import MagicMock, patch
from cheddahbot.config import AutoCoraConfig, Config, ContentConfig, LinkBuildingConfig
@dataclass
class FakeTask:
"""Minimal ClickUp task stub for distribution tests."""
id: str = "fake_id"
name: str = ""
task_type: str = ""
status: str = "running cora"
custom_fields: dict = field(default_factory=dict)
def _make_scheduler(tmp_path, *, lb_folder="", content_inbox="", human_inbox=""):
"""Build a Scheduler with temp paths and mocked dependencies."""
from cheddahbot.scheduler import Scheduler
config = Config()
config.link_building = LinkBuildingConfig(watch_folder=lb_folder)
config.content = ContentConfig(cora_inbox=content_inbox)
config.autocora = AutoCoraConfig(cora_human_inbox=human_inbox, enabled=True)
config.clickup.enabled = True
config.clickup.space_id = "sp1"
config.clickup.api_token = "tok"
db = MagicMock()
agent = MagicMock()
sched = Scheduler(config=config, db=db, agent=agent)
return sched
KW_FIELDS = {"Keyword": "ac drive repair"}
def _drop_xlsx(folder: Path, name: str = "ac-drive-repair.xlsx") -> Path:
"""Create a dummy xlsx file in the given folder."""
folder.mkdir(parents=True, exist_ok=True)
p = folder / name
p.write_bytes(b"fake-xlsx-data")
return p
# ── Distribution logic tests ──
def test_distribute_lb_only(tmp_path):
"""LB task matched → copies to cora-inbox only."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
tasks = [FakeTask(name="LB task", task_type="Link Building", custom_fields=KW_FIELDS)]
with patch.object(sched, "_get_clickup_client") as mock_client:
mock_client.return_value.get_tasks_from_overall_lists.return_value = tasks
sched._distribute_cora_file(xlsx)
assert (lb / xlsx.name).exists()
assert not (content / xlsx.name).exists()
assert (human / "processed" / xlsx.name).exists()
assert not xlsx.exists()
def test_distribute_content_only(tmp_path):
"""Content task matched → copies to content-cora-inbox only."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
tasks = [FakeTask(name="CC task", task_type="Content Creation", custom_fields=KW_FIELDS)]
with patch.object(sched, "_get_clickup_client") as mock_client:
mock_client.return_value.get_tasks_from_overall_lists.return_value = tasks
sched._distribute_cora_file(xlsx)
assert not (lb / xlsx.name).exists()
assert (content / xlsx.name).exists()
assert (human / "processed" / xlsx.name).exists()
def test_distribute_mixed(tmp_path):
"""Both LB and Content tasks matched → copies to both inboxes."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
tasks = [
FakeTask(name="LB task", task_type="Link Building", custom_fields=KW_FIELDS),
FakeTask(name="CC task", task_type="Content Creation", custom_fields=KW_FIELDS),
]
with patch.object(sched, "_get_clickup_client") as mock_client:
mock_client.return_value.get_tasks_from_overall_lists.return_value = tasks
sched._distribute_cora_file(xlsx)
assert (lb / xlsx.name).exists()
assert (content / xlsx.name).exists()
assert (human / "processed" / xlsx.name).exists()
def test_distribute_no_match(tmp_path):
"""No matching tasks → file stays in inbox, not moved to processed."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
with patch.object(sched, "_get_clickup_client") as mock_client:
mock_client.return_value.get_tasks_from_overall_lists.return_value = []
sched._distribute_cora_file(xlsx)
assert xlsx.exists() # Still in inbox
assert not (human / "processed" / xlsx.name).exists()
def test_distribute_opo_task(tmp_path):
"""On Page Optimization task → copies to content inbox."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
tasks = [FakeTask(name="OPO task", task_type="On Page Optimization", custom_fields=KW_FIELDS)]
with patch.object(sched, "_get_clickup_client") as mock_client:
mock_client.return_value.get_tasks_from_overall_lists.return_value = tasks
sched._distribute_cora_file(xlsx)
assert not (lb / xlsx.name).exists()
assert (content / xlsx.name).exists()
# ── Scan tests ──
def test_scan_skips_processed(tmp_path):
"""Files already in processed/ are skipped."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
# File in both top-level and processed/
_drop_xlsx(human)
_drop_xlsx(human / "processed")
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
with patch.object(sched, "_distribute_cora_file") as mock_dist:
sched._scan_cora_human_inbox()
mock_dist.assert_not_called()
def test_scan_skips_temp_files(tmp_path):
"""Office temp files (~$...) are skipped."""
human = tmp_path / "human"
lb = tmp_path / "lb"
content = tmp_path / "content"
_drop_xlsx(human, name="~$ac-drive-repair.xlsx")
sched = _make_scheduler(
tmp_path, lb_folder=str(lb), content_inbox=str(content), human_inbox=str(human)
)
with patch.object(sched, "_distribute_cora_file") as mock_dist:
sched._scan_cora_human_inbox()
mock_dist.assert_not_called()
def test_scan_empty_inbox(tmp_path):
"""Empty inbox → no-op."""
human = tmp_path / "human"
human.mkdir()
sched = _make_scheduler(tmp_path, human_inbox=str(human))
with patch.object(sched, "_distribute_cora_file") as mock_dist:
sched._scan_cora_human_inbox()
mock_dist.assert_not_called()
def test_distribute_copy_failure_no_move(tmp_path):
"""If copy fails, original is NOT moved to processed."""
human = tmp_path / "human"
xlsx = _drop_xlsx(human)
sched = _make_scheduler(tmp_path, lb_folder="/nonexistent/network/path", human_inbox=str(human))
tasks = [FakeTask(name="LB task", task_type="Link Building", custom_fields=KW_FIELDS)]
with (
patch.object(sched, "_get_clickup_client") as mock_client,
patch("cheddahbot.scheduler.shutil.copy2", side_effect=OSError("network down")),
):
mock_client.return_value.get_tasks_from_overall_lists.return_value = tasks
sched._distribute_cora_file(xlsx)
assert xlsx.exists() # Original untouched
assert not (human / "processed" / xlsx.name).exists()

View File

@ -10,27 +10,21 @@ from cheddahbot.db import Database
class TestConversationsAgentName: class TestConversationsAgentName:
"""Conversations are tagged by agent_name for per-agent history filtering.""" """Conversations are tagged by agent_name for per-agent history filtering."""
def _add_msg(self, db, conv_id):
"""Add a dummy message so list_conversations() includes this conv."""
db.add_message(conv_id, "user", "hello")
def test_create_with_default_agent_name(self, tmp_db): def test_create_with_default_agent_name(self, tmp_db):
tmp_db.create_conversation("conv1") tmp_db.create_conversation("conv1")
self._add_msg(tmp_db, "conv1")
convs = tmp_db.list_conversations() convs = tmp_db.list_conversations()
assert len(convs) == 1 assert len(convs) == 1
assert convs[0]["agent_name"] == "default" assert convs[0]["agent_name"] == "default"
def test_create_with_custom_agent_name(self, tmp_db): def test_create_with_custom_agent_name(self, tmp_db):
tmp_db.create_conversation("conv1", agent_name="writer") tmp_db.create_conversation("conv1", agent_name="writer")
self._add_msg(tmp_db, "conv1")
convs = tmp_db.list_conversations() convs = tmp_db.list_conversations()
assert convs[0]["agent_name"] == "writer" assert convs[0]["agent_name"] == "writer"
def test_list_filters_by_agent_name(self, tmp_db): def test_list_filters_by_agent_name(self, tmp_db):
for cid, agent in [("c1", "default"), ("c2", "writer"), ("c3", "default")]: tmp_db.create_conversation("c1", agent_name="default")
tmp_db.create_conversation(cid, agent_name=agent) tmp_db.create_conversation("c2", agent_name="writer")
self._add_msg(tmp_db, cid) tmp_db.create_conversation("c3", agent_name="default")
default_convs = tmp_db.list_conversations(agent_name="default") default_convs = tmp_db.list_conversations(agent_name="default")
writer_convs = tmp_db.list_conversations(agent_name="writer") writer_convs = tmp_db.list_conversations(agent_name="writer")
@ -41,16 +35,14 @@ class TestConversationsAgentName:
assert len(all_convs) == 3 assert len(all_convs) == 3
def test_list_without_filter_returns_all(self, tmp_db): def test_list_without_filter_returns_all(self, tmp_db):
for cid, agent in [("c1", "a"), ("c2", "b")]: tmp_db.create_conversation("c1", agent_name="a")
tmp_db.create_conversation(cid, agent_name=agent) tmp_db.create_conversation("c2", agent_name="b")
self._add_msg(tmp_db, cid)
convs = tmp_db.list_conversations() convs = tmp_db.list_conversations()
assert len(convs) == 2 assert len(convs) == 2
def test_list_returns_agent_name_in_results(self, tmp_db): def test_list_returns_agent_name_in_results(self, tmp_db):
tmp_db.create_conversation("c1", agent_name="researcher") tmp_db.create_conversation("c1", agent_name="researcher")
self._add_msg(tmp_db, "c1")
convs = tmp_db.list_conversations() convs = tmp_db.list_conversations()
assert "agent_name" in convs[0] assert "agent_name" in convs[0]
assert convs[0]["agent_name"] == "researcher" assert convs[0]["agent_name"] == "researcher"
@ -60,7 +52,6 @@ class TestConversationsAgentName:
db_path = tmp_path / "test_migrate.db" db_path = tmp_path / "test_migrate.db"
db1 = Database(db_path) db1 = Database(db_path)
db1.create_conversation("c1", agent_name="ops") db1.create_conversation("c1", agent_name="ops")
db1.add_message("c1", "user", "hello")
# Re-init on same DB file triggers migration again # Re-init on same DB file triggers migration again
db2 = Database(db_path) db2 = Database(db_path)
convs = db2.list_conversations() convs = db2.list_conversations()

View File

@ -2,6 +2,7 @@
from __future__ import annotations from __future__ import annotations
import json
import subprocess import subprocess
from unittest.mock import MagicMock, patch from unittest.mock import MagicMock, patch
@ -227,36 +228,23 @@ class TestFuzzyKeywordMatch:
def test_exact_match(self): def test_exact_match(self):
assert _fuzzy_keyword_match("precision cnc", "precision cnc") is True assert _fuzzy_keyword_match("precision cnc", "precision cnc") is True
def test_no_match_without_llm(self): def test_substring_match_a_in_b(self):
"""Without an llm_check, non-exact strings return False.""" assert _fuzzy_keyword_match("cnc machining", "precision cnc machining services") is True
assert _fuzzy_keyword_match("shaft", "shafts") is False
assert _fuzzy_keyword_match("shaft manufacturing", "custom shaft manufacturing") is False def test_substring_match_b_in_a(self):
assert _fuzzy_keyword_match("precision cnc machining services", "cnc machining") is True
def test_word_overlap(self):
assert _fuzzy_keyword_match("precision cnc machining", "cnc machining precision") is True
def test_no_match(self):
assert _fuzzy_keyword_match("precision cnc", "web design agency") is False
def test_empty_strings(self): def test_empty_strings(self):
assert _fuzzy_keyword_match("", "test") is False assert _fuzzy_keyword_match("", "test") is False
assert _fuzzy_keyword_match("test", "") is False assert _fuzzy_keyword_match("test", "") is False
assert _fuzzy_keyword_match("", "") is False assert _fuzzy_keyword_match("", "") is False
def test_llm_check_called_on_mismatch(self):
"""When strings differ, llm_check is called and its result is returned."""
llm_yes = lambda a, b: True
llm_no = lambda a, b: False
assert _fuzzy_keyword_match("shaft", "shafts", llm_check=llm_yes) is True
assert _fuzzy_keyword_match("shaft", "shafts", llm_check=llm_no) is False
def test_llm_check_not_called_on_exact(self):
"""Exact match should not call llm_check."""
def boom(a, b):
raise AssertionError("should not be called")
assert _fuzzy_keyword_match("shaft", "shaft", llm_check=boom) is True
def test_no_substring_match_without_llm(self):
"""Substring matching is gone — different keywords must not match."""
assert _fuzzy_keyword_match("shaft manufacturing", "custom shaft manufacturing") is False
assert _fuzzy_keyword_match("cnc machining", "precision cnc machining services") is False
class TestNormalizeForMatch: class TestNormalizeForMatch:
def test_lowercase_and_strip(self): def test_lowercase_and_strip(self):
@ -560,6 +548,16 @@ class TestScanCoraFolder:
assert "Processed" in result assert "Processed" in result
assert "old.xlsx" in result assert "old.xlsx" in result
def test_shows_kv_status(self, mock_ctx, tmp_path):
mock_ctx["config"].link_building.watch_folder = str(tmp_path)
(tmp_path / "tracked.xlsx").write_text("fake")
db = mock_ctx["db"]
db.kv_set("linkbuilding:watched:tracked.xlsx", json.dumps({"status": "completed"}))
result = scan_cora_folder(ctx=mock_ctx)
assert "completed" in result
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# ClickUp state machine tests # ClickUp state machine tests
@ -584,6 +582,12 @@ class TestClickUpStateMachine:
mock_ctx["clickup_task_id"] = "task_abc" mock_ctx["clickup_task_id"] = "task_abc"
mock_ctx["config"].clickup.enabled = True mock_ctx["config"].clickup.enabled = True
# Pre-set executing state
mock_ctx["db"].kv_set(
"clickup:task:task_abc:state",
json.dumps({"state": "executing"}),
)
ingest_proc = subprocess.CompletedProcess( ingest_proc = subprocess.CompletedProcess(
args=[], returncode=0, stdout=ingest_success_stdout, stderr="" args=[], returncode=0, stdout=ingest_success_stdout, stderr=""
) )
@ -599,9 +603,10 @@ class TestClickUpStateMachine:
assert "ClickUp Sync" in result assert "ClickUp Sync" in result
# Verify ClickUp API was called for completion # Verify KV state was updated
cu.add_comment.assert_called() raw = mock_ctx["db"].kv_get("clickup:task:task_abc:state")
cu.update_task_status.assert_called() state = json.loads(raw)
assert state["state"] == "completed"
@patch("cheddahbot.tools.linkbuilding._run_blm_command") @patch("cheddahbot.tools.linkbuilding._run_blm_command")
@patch("cheddahbot.tools.linkbuilding._get_clickup_client") @patch("cheddahbot.tools.linkbuilding._get_clickup_client")
@ -614,6 +619,14 @@ class TestClickUpStateMachine:
mock_ctx["clickup_task_id"] = "task_fail" mock_ctx["clickup_task_id"] = "task_fail"
mock_ctx["config"].clickup.enabled = True mock_ctx["config"].clickup.enabled = True
mock_ctx["config"].clickup.skill_map = {
"Link Building": {"error_status": "internal review"}
}
mock_ctx["db"].kv_set(
"clickup:task:task_fail:state",
json.dumps({"state": "executing"}),
)
mock_cmd.return_value = subprocess.CompletedProcess( mock_cmd.return_value = subprocess.CompletedProcess(
args=[], returncode=1, stdout="Error", stderr="crash" args=[], returncode=1, stdout="Error", stderr="crash"
@ -625,6 +638,6 @@ class TestClickUpStateMachine:
) )
assert "Error" in result assert "Error" in result
# Verify ClickUp API was called for failure raw = mock_ctx["db"].kv_get("clickup:task:task_fail:state")
cu.add_comment.assert_called() state = json.loads(raw)
cu.update_task_status.assert_called() assert state["state"] == "failed"

View File

@ -1,410 +0,0 @@
"""Tests for the ntfy.sh push notification sender."""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import httpx
from cheddahbot.ntfy import NtfyChannel, NtfyNotifier
# ---------------------------------------------------------------------------
# NtfyChannel routing
# ---------------------------------------------------------------------------
class TestNtfyChannel:
def test_accepts_matching_category_and_pattern(self):
ch = NtfyChannel(
name="human_action",
server="https://ntfy.sh",
topic="test-topic",
categories=["clickup", "autocora"],
include_patterns=["completed", "SUCCESS"],
)
assert ch.accepts("ClickUp task completed: **Acme PR**", "clickup") is True
assert ch.accepts("AutoCora SUCCESS: **keyword**", "autocora") is True
def test_rejects_wrong_category(self):
ch = NtfyChannel(
name="human_action",
server="https://ntfy.sh",
topic="test-topic",
categories=["clickup"],
include_patterns=["completed"],
)
assert ch.accepts("Some autocora message completed", "autocora") is False
def test_rejects_non_matching_pattern(self):
ch = NtfyChannel(
name="human_action",
server="https://ntfy.sh",
topic="test-topic",
categories=["clickup"],
include_patterns=["completed"],
)
assert ch.accepts("Executing ClickUp task: **Acme PR**", "clickup") is False
def test_no_include_patterns_accepts_all_in_category(self):
ch = NtfyChannel(
name="all_clickup",
server="https://ntfy.sh",
topic="test-topic",
categories=["clickup"],
)
assert ch.accepts("Any message at all", "clickup") is True
def test_exclude_patterns_take_priority(self):
ch = NtfyChannel(
name="test",
server="https://ntfy.sh",
topic="test-topic",
categories=["clickup"],
include_patterns=["task"],
exclude_patterns=["Executing"],
)
assert ch.accepts("Executing ClickUp task", "clickup") is False
assert ch.accepts("ClickUp task completed", "clickup") is True
def test_case_insensitive_patterns(self):
ch = NtfyChannel(
name="test",
server="https://ntfy.sh",
topic="test-topic",
categories=["autocora"],
include_patterns=["success"],
)
assert ch.accepts("AutoCora SUCCESS: **kw**", "autocora") is True
def test_empty_topic_filtered_by_notifier(self):
ch = NtfyChannel(
name="empty", server="https://ntfy.sh", topic="",
categories=["clickup"],
)
notifier = NtfyNotifier([ch])
assert notifier.enabled is False
# ---------------------------------------------------------------------------
# NtfyNotifier
# ---------------------------------------------------------------------------
class TestNtfyNotifier:
@patch("cheddahbot.ntfy.httpx.post")
def test_notify_posts_to_matching_channel(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="human_action",
server="https://ntfy.sh",
topic="my-topic",
categories=["clickup"],
include_patterns=["completed"],
)
notifier = NtfyNotifier([ch])
notifier.notify("ClickUp task completed: **Acme PR**", "clickup")
# Wait for daemon thread
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
mock_post.assert_called_once()
call_args = mock_post.call_args
assert call_args[0][0] == "https://ntfy.sh/my-topic"
assert call_args[1]["headers"]["Title"] == "CheddahBot [clickup]"
assert call_args[1]["headers"]["Priority"] == "high"
@patch("cheddahbot.ntfy.httpx.post")
def test_notify_skips_non_matching_channel(self, mock_post):
ch = NtfyChannel(
name="errors",
server="https://ntfy.sh",
topic="err-topic",
categories=["clickup"],
include_patterns=["failed"],
)
notifier = NtfyNotifier([ch])
notifier.notify("ClickUp task completed: **Acme PR**", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
mock_post.assert_not_called()
@patch("cheddahbot.ntfy.httpx.post")
def test_notify_routes_to_multiple_channels(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch1 = NtfyChannel(
name="all", server="https://ntfy.sh", topic="all-topic",
categories=["clickup"],
)
ch2 = NtfyChannel(
name="errors", server="https://ntfy.sh", topic="err-topic",
categories=["clickup"], include_patterns=["failed"],
)
notifier = NtfyNotifier([ch1, ch2])
notifier.notify("ClickUp task failed: **Acme**", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
assert mock_post.call_count == 2
@patch("cheddahbot.ntfy.httpx.post")
def test_webhook_error_is_swallowed(self, mock_post):
mock_post.side_effect = httpx.ConnectError("connection refused")
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"],
)
notifier = NtfyNotifier([ch])
# Should not raise
notifier.notify("ClickUp task completed: **test**", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
@patch("cheddahbot.ntfy.httpx.post")
def test_4xx_is_logged_not_raised(self, mock_post):
mock_post.return_value = MagicMock(status_code=400, text="Bad Request")
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"],
)
notifier = NtfyNotifier([ch])
notifier.notify("ClickUp task completed: **test**", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
def test_enabled_property(self):
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"],
)
assert NtfyNotifier([ch]).enabled is True
assert NtfyNotifier([]).enabled is False
# ---------------------------------------------------------------------------
# Post format
# ---------------------------------------------------------------------------
class TestPostFormat:
@patch("cheddahbot.ntfy.httpx.post")
def test_includes_tags_header(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"], tags="white_check_mark",
)
notifier = NtfyNotifier([ch])
notifier.notify("task completed", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
headers = mock_post.call_args[1]["headers"]
assert headers["Tags"] == "white_check_mark"
@patch("cheddahbot.ntfy.httpx.post")
def test_omits_tags_header_when_empty(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"], tags="",
)
notifier = NtfyNotifier([ch])
notifier.notify("task completed", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
headers = mock_post.call_args[1]["headers"]
assert "Tags" not in headers
@patch("cheddahbot.ntfy.httpx.post")
def test_custom_server_url(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="test", server="https://my-ntfy.example.com",
topic="topic", categories=["clickup"],
)
notifier = NtfyNotifier([ch])
notifier.notify("task completed", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
assert mock_post.call_args[0][0] == "https://my-ntfy.example.com/topic"
@patch("cheddahbot.ntfy.httpx.post")
def test_message_sent_as_body(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"],
)
notifier = NtfyNotifier([ch])
notifier.notify("Hello **world**", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
assert mock_post.call_args[1]["content"] == b"Hello **world**"
@patch("cheddahbot.ntfy.httpx.post")
def test_priority_header(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
ch = NtfyChannel(
name="test", server="https://ntfy.sh", topic="topic",
categories=["clickup"], priority="urgent",
)
notifier = NtfyNotifier([ch])
notifier.notify("task completed", "clickup")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
assert mock_post.call_args[1]["headers"]["Priority"] == "urgent"
# ---------------------------------------------------------------------------
# Dedup window
# ---------------------------------------------------------------------------
def _make_channel(**overrides) -> NtfyChannel:
defaults = dict(
name="errors",
server="https://ntfy.sh",
topic="test-topic",
categories=["alert"],
)
defaults.update(overrides)
return NtfyChannel(**defaults)
class TestDedup:
def test_first_message_goes_through(self):
notifier = NtfyNotifier([_make_channel()])
assert notifier._check_and_track("errors", "task X skipped") is True
def test_duplicate_permanently_suppressed(self):
notifier = NtfyNotifier([_make_channel()])
assert notifier._check_and_track("errors", "task X skipped") is True
assert notifier._check_and_track("errors", "task X skipped") is False
def test_duplicate_still_suppressed_after_day_rollover(self):
notifier = NtfyNotifier([_make_channel()])
assert notifier._check_and_track("errors", "task X skipped") is True
# Dedup memory persists even across date rollover
with patch.object(notifier, "_today", return_value="2099-01-01"):
assert notifier._check_and_track("errors", "task X skipped") is False
def test_different_messages_not_deduped(self):
notifier = NtfyNotifier([_make_channel()])
assert notifier._check_and_track("errors", "task A skipped") is True
assert notifier._check_and_track("errors", "task B skipped") is True
def test_same_message_different_channel_not_deduped(self):
notifier = NtfyNotifier([_make_channel()])
assert notifier._check_and_track("errors", "task X skipped") is True
assert notifier._check_and_track("alerts", "task X skipped") is True
# ---------------------------------------------------------------------------
# Daily cap
# ---------------------------------------------------------------------------
class TestDailyCap:
def test_sends_up_to_cap(self):
notifier = NtfyNotifier([_make_channel()], daily_cap=3)
for i in range(3):
assert notifier._check_and_track("errors", f"msg {i}") is True
assert notifier._check_and_track("errors", "msg 3") is False
def test_cap_resets_on_new_day(self):
notifier = NtfyNotifier([_make_channel()], daily_cap=2)
assert notifier._check_and_track("errors", "msg 0") is True
assert notifier._check_and_track("errors", "msg 1") is True
assert notifier._check_and_track("errors", "msg 2") is False
with patch.object(notifier, "_today", return_value="2099-01-01"):
assert notifier._check_and_track("errors", "msg 2") is True
# ---------------------------------------------------------------------------
# 429 backoff
# ---------------------------------------------------------------------------
class TestRateLimitBackoff:
def test_429_suppresses_rest_of_day(self):
notifier = NtfyNotifier([_make_channel()])
notifier._mark_rate_limited()
assert notifier._check_and_track("errors", "new message") is False
def test_429_resets_next_day(self):
notifier = NtfyNotifier([_make_channel()])
notifier._mark_rate_limited()
assert notifier._check_and_track("errors", "blocked") is False
with patch.object(notifier, "_today", return_value="2099-01-01"):
assert notifier._check_and_track("errors", "unblocked") is True
def test_post_sets_rate_limit_on_429(self):
channel = _make_channel()
notifier = NtfyNotifier([channel])
mock_resp = MagicMock(status_code=429, text="Rate limited")
with patch("cheddahbot.ntfy.httpx.post", return_value=mock_resp):
notifier._post(channel, "test msg", "alert")
assert notifier._rate_limited_until == notifier._today()
# ---------------------------------------------------------------------------
# Notify integration with dedup
# ---------------------------------------------------------------------------
class TestNotifyDedup:
@patch("cheddahbot.ntfy.httpx.post")
def test_notify_skips_deduped_messages(self, mock_post):
mock_post.return_value = MagicMock(status_code=200)
channel = _make_channel()
notifier = NtfyNotifier([channel])
notifier.notify("same msg", "alert")
notifier.notify("same msg", "alert")
import threading
for t in threading.enumerate():
if t.daemon and t.is_alive():
t.join(timeout=2)
# Only one post — second was deduped
mock_post.assert_called_once()

View File

@ -552,7 +552,7 @@ class TestSubmitPressRelease:
headline="Advanced Industrial Expands PEEK Machining", headline="Advanced Industrial Expands PEEK Machining",
company_name="Advanced Industrial", company_name="Advanced Industrial",
pr_text=REALISTIC_PR_TEXT, pr_text=REALISTIC_PR_TEXT,
keyword="PEEK machining", topic="PEEK machining",
target_url="https://advancedindustrial.com/peek", target_url="https://advancedindustrial.com/peek",
ctx=submit_ctx, ctx=submit_ctx,
) )
@ -575,7 +575,7 @@ class TestSubmitPressRelease:
headline="Advanced Industrial Expands PEEK Machining", headline="Advanced Industrial Expands PEEK Machining",
company_name="Advanced Industrial", company_name="Advanced Industrial",
pr_text=REALISTIC_PR_TEXT, pr_text=REALISTIC_PR_TEXT,
keyword="PEEK machining", topic="PEEK machining",
branded_url="https://linkedin.com/company/advanced-industrial", branded_url="https://linkedin.com/company/advanced-industrial",
ctx=submit_ctx, ctx=submit_ctx,
) )
@ -598,7 +598,7 @@ class TestSubmitPressRelease:
headline="Advanced Industrial Expands PEEK Machining", headline="Advanced Industrial Expands PEEK Machining",
company_name="Advanced Industrial", company_name="Advanced Industrial",
pr_text=REALISTIC_PR_TEXT, pr_text=REALISTIC_PR_TEXT,
keyword="PEEK machining", topic="PEEK machining",
branded_url="GBP", branded_url="GBP",
ctx=submit_ctx, ctx=submit_ctx,
) )
@ -694,7 +694,7 @@ class TestSubmitPressRelease:
headline="Advanced Industrial Expands PEEK Machining", headline="Advanced Industrial Expands PEEK Machining",
company_name="Advanced Industrial", company_name="Advanced Industrial",
pr_text=LONG_PR_TEXT, pr_text=LONG_PR_TEXT,
keyword="PEEK machining", topic="PEEK machining",
target_url="https://example.com/peek", target_url="https://example.com/peek",
ctx=submit_ctx, ctx=submit_ctx,
) )

View File

@ -2,6 +2,7 @@
from __future__ import annotations from __future__ import annotations
import json
from dataclasses import dataclass, field from dataclasses import dataclass, field
from datetime import UTC, datetime from datetime import UTC, datetime
from unittest.mock import MagicMock from unittest.mock import MagicMock
@ -17,7 +18,7 @@ _PR_MAPPING = {
"auto_execute": True, "auto_execute": True,
"field_mapping": { "field_mapping": {
"topic": "task_name", "topic": "task_name",
"company_name": "Client", "company_name": "Customer",
}, },
} }
@ -35,7 +36,6 @@ class _FakeClickUpConfig:
error_status: str = "error" error_status: str = "error"
task_type_field_name: str = "Work Category" task_type_field_name: str = "Work Category"
default_auto_execute: bool = True default_auto_execute: bool = True
poll_task_types: list[str] = field(default_factory=lambda: ["Press Release"])
skill_map: dict = field(default_factory=lambda: {"Press Release": _PR_MAPPING}) skill_map: dict = field(default_factory=lambda: {"Press Release": _PR_MAPPING})
enabled: bool = True enabled: bool = True
@ -69,7 +69,7 @@ def _now_ms():
return int(datetime.now(UTC).timestamp() * 1000) return int(datetime.now(UTC).timestamp() * 1000)
_FIELDS = {"Client": "Acme"} _FIELDS = {"Customer": "Acme"}
# ── Tests ── # ── Tests ──
@ -104,6 +104,55 @@ class TestPollClickup:
mock_client.discover_field_filter.return_value = field_filter mock_client.discover_field_filter.return_value = field_filter
return mock_client return mock_client
def test_skips_task_already_completed(self, tmp_db):
"""Tasks with completed state should be skipped."""
config = _FakeConfig()
agent = MagicMock()
scheduler = Scheduler(config, tmp_db, agent)
state = {"state": "completed", "clickup_task_id": "t1"}
tmp_db.kv_set("clickup:task:t1:state", json.dumps(state))
due = str(_now_ms() + 86400000)
task = _make_task(
"t1",
"PR for Acme",
"Press Release",
due_date=due,
custom_fields=_FIELDS,
)
scheduler._clickup_client = self._make_mock_client(
tasks=[task],
)
scheduler._poll_clickup()
scheduler._clickup_client.update_task_status.assert_not_called()
def test_skips_task_already_failed(self, tmp_db):
"""Tasks with failed state should be skipped."""
config = _FakeConfig()
agent = MagicMock()
scheduler = Scheduler(config, tmp_db, agent)
state = {"state": "failed", "clickup_task_id": "t1"}
tmp_db.kv_set("clickup:task:t1:state", json.dumps(state))
due = str(_now_ms() + 86400000)
task = _make_task(
"t1",
"PR for Acme",
"Press Release",
due_date=due,
)
scheduler._clickup_client = self._make_mock_client(
tasks=[task],
)
scheduler._poll_clickup()
scheduler._clickup_client.update_task_status.assert_not_called()
def test_skips_task_with_no_due_date(self, tmp_db): def test_skips_task_with_no_due_date(self, tmp_db):
"""Tasks with no due date should be skipped.""" """Tasks with no due date should be skipped."""
config = _FakeConfig() config = _FakeConfig()
@ -150,11 +199,11 @@ class TestExecuteTask:
"""Test the simplified _execute_task method.""" """Test the simplified _execute_task method."""
def test_success_flow(self, tmp_db): def test_success_flow(self, tmp_db):
"""Successful execution: tool called, automation underway set.""" """Successful execution: state=completed."""
config = _FakeConfig() config = _FakeConfig()
agent = MagicMock() agent = MagicMock()
agent._tools = MagicMock() agent._tools = MagicMock()
agent._tools.execute.return_value = "Pipeline completed successfully" agent._tools.execute.return_value = "## ClickUp Sync\nDone"
scheduler = Scheduler(config, tmp_db, agent) scheduler = Scheduler(config, tmp_db, agent)
mock_client = MagicMock() mock_client = MagicMock()
@ -175,10 +224,51 @@ class TestExecuteTask:
"t1", "t1",
"automation underway", "automation underway",
) )
agent._tools.execute.assert_called_once()
raw = tmp_db.kv_get("clickup:task:t1:state")
state = json.loads(raw)
assert state["state"] == "completed"
def test_success_fallback_path(self, tmp_db):
"""Scheduler uploads docx and sets review status."""
config = _FakeConfig()
agent = MagicMock()
agent._tools = MagicMock()
agent._tools.execute.return_value = "Press releases done.\n**Docx:** `output/pr.docx`"
scheduler = Scheduler(config, tmp_db, agent)
mock_client = MagicMock()
mock_client.update_task_status.return_value = True
mock_client.upload_attachment.return_value = True
mock_client.add_comment.return_value = True
scheduler._clickup_client = mock_client
due = str(_now_ms() + 86400000)
task = _make_task(
"t1",
"PR for Acme",
"Press Release",
due_date=due,
custom_fields=_FIELDS,
)
scheduler._execute_task(task)
mock_client.update_task_status.assert_any_call(
"t1",
"internal review",
)
mock_client.upload_attachment.assert_called_once_with(
"t1",
"output/pr.docx",
)
raw = tmp_db.kv_get("clickup:task:t1:state")
state = json.loads(raw)
assert state["state"] == "completed"
assert "output/pr.docx" in state["deliverable_paths"]
def test_failure_flow(self, tmp_db): def test_failure_flow(self, tmp_db):
"""Failed: error comment posted, status set to 'error'.""" """Failed: state=failed, error comment, status set to 'error'."""
config = _FakeConfig() config = _FakeConfig()
agent = MagicMock() agent = MagicMock()
agent._tools = MagicMock() agent._tools = MagicMock()
@ -205,6 +295,11 @@ class TestExecuteTask:
comment_text = mock_client.add_comment.call_args[0][1] comment_text = mock_client.add_comment.call_args[0][1]
assert "failed" in comment_text.lower() assert "failed" in comment_text.lower()
raw = tmp_db.kv_get("clickup:task:t1:state")
state = json.loads(raw)
assert state["state"] == "failed"
assert "API timeout" in state["error"]
class TestFieldFilterDiscovery: class TestFieldFilterDiscovery:
"""Test _discover_field_filter caching.""" """Test _discover_field_filter caching."""
@ -232,62 +327,3 @@ class TestFieldFilterDiscovery:
mock_client.discover_field_filter.reset_mock() mock_client.discover_field_filter.reset_mock()
scheduler._poll_clickup() scheduler._poll_clickup()
mock_client.discover_field_filter.assert_not_called() mock_client.discover_field_filter.assert_not_called()
class TestActiveExecutions:
"""Test the active execution registry."""
def test_register_and_get(self, tmp_db):
config = _FakeConfig()
scheduler = Scheduler(config, tmp_db, MagicMock())
scheduler._register_execution("t1", "Task One", "write_press_releases")
active = scheduler.get_active_executions()
assert "t1" in active
assert active["t1"]["name"] == "Task One"
assert active["t1"]["tool"] == "write_press_releases"
assert "started_at" in active["t1"]
assert "thread" in active["t1"]
def test_unregister(self, tmp_db):
config = _FakeConfig()
scheduler = Scheduler(config, tmp_db, MagicMock())
scheduler._register_execution("t1", "Task One", "write_press_releases")
scheduler._unregister_execution("t1")
assert scheduler.get_active_executions() == {}
def test_unregister_nonexistent_is_noop(self, tmp_db):
config = _FakeConfig()
scheduler = Scheduler(config, tmp_db, MagicMock())
# Should not raise
scheduler._unregister_execution("nonexistent")
assert scheduler.get_active_executions() == {}
def test_multiple_executions(self, tmp_db):
config = _FakeConfig()
scheduler = Scheduler(config, tmp_db, MagicMock())
scheduler._register_execution("t1", "Task One", "write_press_releases")
scheduler._register_execution("t2", "Task Two", "run_cora_backlinks")
active = scheduler.get_active_executions()
assert len(active) == 2
assert "t1" in active
assert "t2" in active
def test_get_returns_snapshot(self, tmp_db):
"""get_active_executions returns a copy, not a reference."""
config = _FakeConfig()
scheduler = Scheduler(config, tmp_db, MagicMock())
scheduler._register_execution("t1", "Task One", "tool_a")
snapshot = scheduler.get_active_executions()
scheduler._unregister_execution("t1")
# Snapshot should still have t1
assert "t1" in snapshot
# But live state should be empty
assert scheduler.get_active_executions() == {}

View File

@ -1,42 +1,32 @@
"""Tests for scheduler helper functions. """Tests for scheduler helper functions."""
Note: _extract_docx_paths was removed as part of KV store elimination.
The scheduler no longer handles docx extraction tools own their own sync.
"""
from __future__ import annotations from __future__ import annotations
from cheddahbot.scheduler import _extract_docx_paths
class TestLoopTimestamps:
"""Test that loop timestamps use in-memory storage."""
def test_initial_timestamps_are_none(self): class TestExtractDocxPaths:
from unittest.mock import MagicMock def test_extracts_paths_from_realistic_output(self):
result = (
"Press releases generated successfully!\n\n"
"**Docx:** `output/press_releases/acme-corp-launch.docx`\n"
"**Docx:** `output/press_releases/acme-corp-expansion.docx`\n"
"Files saved to output/press_releases/"
)
paths = _extract_docx_paths(result)
from cheddahbot.scheduler import Scheduler assert len(paths) == 2
assert paths[0] == "output/press_releases/acme-corp-launch.docx"
assert paths[1] == "output/press_releases/acme-corp-expansion.docx"
config = MagicMock() def test_returns_empty_list_when_no_paths(self):
db = MagicMock() result = "Task completed successfully. No files generated."
agent = MagicMock() paths = _extract_docx_paths(result)
sched = Scheduler(config, db, agent)
timestamps = sched.get_loop_timestamps() assert paths == []
assert timestamps["heartbeat"] is None
assert timestamps["poll"] is None
assert timestamps["clickup"] is None
def test_timestamps_update_in_memory(self): def test_only_matches_docx_extension(self):
from unittest.mock import MagicMock result = "**Docx:** `report.docx`\n**PDF:** `report.pdf`\n**Docx:** `summary.txt`\n"
paths = _extract_docx_paths(result)
from cheddahbot.scheduler import Scheduler assert paths == ["report.docx"]
config = MagicMock()
db = MagicMock()
agent = MagicMock()
sched = Scheduler(config, db, agent)
sched._loop_timestamps["heartbeat"] = "2026-02-27T12:00:00+00:00"
timestamps = sched.get_loop_timestamps()
assert timestamps["heartbeat"] == "2026-02-27T12:00:00+00:00"
# Ensure db.kv_set was never called
db.kv_set.assert_not_called()

48
uv.lock
View File

@ -325,16 +325,12 @@ dependencies = [
{ name = "edge-tts" }, { name = "edge-tts" },
{ name = "gradio" }, { name = "gradio" },
{ name = "httpx" }, { name = "httpx" },
{ name = "jinja2" },
{ name = "numpy" }, { name = "numpy" },
{ name = "openai" }, { name = "openai" },
{ name = "openpyxl" },
{ name = "python-docx" }, { name = "python-docx" },
{ name = "python-dotenv" }, { name = "python-dotenv" },
{ name = "python-multipart" },
{ name = "pyyaml" }, { name = "pyyaml" },
{ name = "sentence-transformers" }, { name = "sentence-transformers" },
{ name = "sse-starlette" },
] ]
[package.dev-dependencies] [package.dev-dependencies]
@ -360,16 +356,12 @@ requires-dist = [
{ name = "edge-tts", specifier = ">=6.1" }, { name = "edge-tts", specifier = ">=6.1" },
{ name = "gradio", specifier = ">=5.0" }, { name = "gradio", specifier = ">=5.0" },
{ name = "httpx", specifier = ">=0.27" }, { name = "httpx", specifier = ">=0.27" },
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "numpy", specifier = ">=1.24" }, { name = "numpy", specifier = ">=1.24" },
{ name = "openai", specifier = ">=1.30" }, { name = "openai", specifier = ">=1.30" },
{ name = "openpyxl", specifier = ">=3.1.5" },
{ name = "python-docx", specifier = ">=1.2.0" }, { name = "python-docx", specifier = ">=1.2.0" },
{ name = "python-dotenv", specifier = ">=1.0" }, { name = "python-dotenv", specifier = ">=1.0" },
{ name = "python-multipart", specifier = ">=0.0.22" },
{ name = "pyyaml", specifier = ">=6.0" }, { name = "pyyaml", specifier = ">=6.0" },
{ name = "sentence-transformers", specifier = ">=3.0" }, { name = "sentence-transformers", specifier = ">=3.0" },
{ name = "sse-starlette", specifier = ">=3.3.3" },
] ]
[package.metadata.requires-dev] [package.metadata.requires-dev]
@ -572,15 +564,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/bf/89/92ac6b154ab87d236c15e5e0c73cb99be58efb1ea3eb9318c266bf9a36bf/edge_tts-7.2.7-py3-none-any.whl", hash = "sha256:ac11d9e834347e5ee62cbe72e8a56ffd65d3c4e795be14b1e593b72cf6480dd9", size = 30556, upload-time = "2025-12-12T20:54:26.956Z" }, { url = "https://files.pythonhosted.org/packages/bf/89/92ac6b154ab87d236c15e5e0c73cb99be58efb1ea3eb9318c266bf9a36bf/edge_tts-7.2.7-py3-none-any.whl", hash = "sha256:ac11d9e834347e5ee62cbe72e8a56ffd65d3c4e795be14b1e593b72cf6480dd9", size = 30556, upload-time = "2025-12-12T20:54:26.956Z" },
] ]
[[package]]
name = "et-xmlfile"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/d3/38/af70d7ab1ae9d4da450eeec1fa3918940a5fafb9055e934af8d6eb0c2313/et_xmlfile-2.0.0.tar.gz", hash = "sha256:dab3f4764309081ce75662649be815c4c9081e88f0837825f90fd28317d4da54", size = 17234, upload-time = "2024-10-25T17:25:40.039Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c1/8b/5fe2cc11fee489817272089c4203e679c63b570a5aaeb18d852ae3cbba6a/et_xmlfile-2.0.0-py3-none-any.whl", hash = "sha256:7a91720bc756843502c3b7504c77b8fe44217c85c537d85037f0f536151b2caa", size = 18059, upload-time = "2024-10-25T17:25:39.051Z" },
]
[[package]] [[package]]
name = "fastapi" name = "fastapi"
version = "0.129.0" version = "0.129.0"
@ -1569,18 +1552,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/cc/56/0a89092a453bb2c676d66abee44f863e742b2110d4dbb1dbcca3f7e5fc33/openai-2.21.0-py3-none-any.whl", hash = "sha256:0bc1c775e5b1536c294eded39ee08f8407656537ccc71b1004104fe1602e267c", size = 1103065, upload-time = "2026-02-14T00:11:59.603Z" }, { url = "https://files.pythonhosted.org/packages/cc/56/0a89092a453bb2c676d66abee44f863e742b2110d4dbb1dbcca3f7e5fc33/openai-2.21.0-py3-none-any.whl", hash = "sha256:0bc1c775e5b1536c294eded39ee08f8407656537ccc71b1004104fe1602e267c", size = 1103065, upload-time = "2026-02-14T00:11:59.603Z" },
] ]
[[package]]
name = "openpyxl"
version = "3.1.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "et-xmlfile" },
]
sdist = { url = "https://files.pythonhosted.org/packages/3d/f9/88d94a75de065ea32619465d2f77b29a0469500e99012523b91cc4141cd1/openpyxl-3.1.5.tar.gz", hash = "sha256:cf0e3cf56142039133628b5acffe8ef0c12bc902d2aadd3e0fe5878dc08d1050", size = 186464, upload-time = "2024-06-28T14:03:44.161Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c0/da/977ded879c29cbd04de313843e76868e6e13408a94ed6b987245dc7c8506/openpyxl-3.1.5-py2.py3-none-any.whl", hash = "sha256:5282c12b107bffeef825f4617dc029afaf41d0ea60823bbb665ef3079dc79de2", size = 250910, upload-time = "2024-06-28T14:03:41.161Z" },
]
[[package]] [[package]]
name = "orjson" name = "orjson"
version = "3.11.7" version = "3.11.7"
@ -2562,19 +2533,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" }, { url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" },
] ]
[[package]]
name = "sse-starlette"
version = "3.3.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
{ name = "starlette" },
]
sdist = { url = "https://files.pythonhosted.org/packages/14/2f/9223c24f568bb7a0c03d751e609844dce0968f13b39a3f73fbb3a96cd27a/sse_starlette-3.3.3.tar.gz", hash = "sha256:72a95d7575fd5129bd0ae15275ac6432bb35ac542fdebb82889c24bb9f3f4049", size = 32420, upload-time = "2026-03-17T20:05:55.529Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/78/e2/b8cff57a67dddf9a464d7e943218e031617fb3ddc133aeeb0602ff5f6c85/sse_starlette-3.3.3-py3-none-any.whl", hash = "sha256:c5abb5082a1cc1c6294d89c5290c46b5f67808cfdb612b7ec27e8ba061c22e8d", size = 14329, upload-time = "2026-03-17T20:05:54.35Z" },
]
[[package]] [[package]]
name = "starlette" name = "starlette"
version = "0.52.1" version = "0.52.1"
@ -2741,12 +2699,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/0f/8b/4b61d6e13f7108f36910df9ab4b58fd389cc2520d54d81b88660804aad99/torch-2.10.0-2-cp311-none-macosx_11_0_arm64.whl", hash = "sha256:418997cb02d0a0f1497cf6a09f63166f9f5df9f3e16c8a716ab76a72127c714f", size = 79423467, upload-time = "2026-02-10T21:44:48.711Z" }, { url = "https://files.pythonhosted.org/packages/0f/8b/4b61d6e13f7108f36910df9ab4b58fd389cc2520d54d81b88660804aad99/torch-2.10.0-2-cp311-none-macosx_11_0_arm64.whl", hash = "sha256:418997cb02d0a0f1497cf6a09f63166f9f5df9f3e16c8a716ab76a72127c714f", size = 79423467, upload-time = "2026-02-10T21:44:48.711Z" },
{ url = "https://files.pythonhosted.org/packages/d3/54/a2ba279afcca44bbd320d4e73675b282fcee3d81400ea1b53934efca6462/torch-2.10.0-2-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:13ec4add8c3faaed8d13e0574f5cd4a323c11655546f91fbe6afa77b57423574", size = 79498202, upload-time = "2026-02-10T21:44:52.603Z" }, { url = "https://files.pythonhosted.org/packages/d3/54/a2ba279afcca44bbd320d4e73675b282fcee3d81400ea1b53934efca6462/torch-2.10.0-2-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:13ec4add8c3faaed8d13e0574f5cd4a323c11655546f91fbe6afa77b57423574", size = 79498202, upload-time = "2026-02-10T21:44:52.603Z" },
{ url = "https://files.pythonhosted.org/packages/ec/23/2c9fe0c9c27f7f6cb865abcea8a4568f29f00acaeadfc6a37f6801f84cb4/torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:e521c9f030a3774ed770a9c011751fb47c4d12029a3d6522116e48431f2ff89e", size = 79498254, upload-time = "2026-02-10T21:44:44.095Z" }, { url = "https://files.pythonhosted.org/packages/ec/23/2c9fe0c9c27f7f6cb865abcea8a4568f29f00acaeadfc6a37f6801f84cb4/torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:e521c9f030a3774ed770a9c011751fb47c4d12029a3d6522116e48431f2ff89e", size = 79498254, upload-time = "2026-02-10T21:44:44.095Z" },
{ url = "https://files.pythonhosted.org/packages/36/ab/7b562f1808d3f65414cd80a4f7d4bb00979d9355616c034c171249e1a303/torch-2.10.0-3-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:ac5bdcbb074384c66fa160c15b1ead77839e3fe7ed117d667249afce0acabfac", size = 915518691, upload-time = "2026-03-11T14:15:43.147Z" },
{ url = "https://files.pythonhosted.org/packages/b3/7a/abada41517ce0011775f0f4eacc79659bc9bc6c361e6bfe6f7052a6b9363/torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:98c01b8bb5e3240426dcde1446eed6f40c778091c8544767ef1168fc663a05a6", size = 915622781, upload-time = "2026-03-11T14:17:11.354Z" },
{ url = "https://files.pythonhosted.org/packages/ab/c6/4dfe238342ffdcec5aef1c96c457548762d33c40b45a1ab7033bb26d2ff2/torch-2.10.0-3-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:80b1b5bfe38eb0e9f5ff09f206dcac0a87aadd084230d4a36eea5ec5232c115b", size = 915627275, upload-time = "2026-03-11T14:16:11.325Z" },
{ url = "https://files.pythonhosted.org/packages/d8/f0/72bf18847f58f877a6a8acf60614b14935e2f156d942483af1ffc081aea0/torch-2.10.0-3-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:46b3574d93a2a8134b3f5475cfb98e2eb46771794c57015f6ad1fb795ec25e49", size = 915523474, upload-time = "2026-03-11T14:17:44.422Z" },
{ url = "https://files.pythonhosted.org/packages/f4/39/590742415c3030551944edc2ddc273ea1fdfe8ffb2780992e824f1ebee98/torch-2.10.0-3-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:b1d5e2aba4eb7f8e87fbe04f86442887f9167a35f092afe4c237dfcaaef6e328", size = 915632474, upload-time = "2026-03-11T14:15:13.666Z" },
{ url = "https://files.pythonhosted.org/packages/b6/8e/34949484f764dde5b222b7fe3fede43e4a6f0da9d7f8c370bb617d629ee2/torch-2.10.0-3-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:0228d20b06701c05a8f978357f657817a4a63984b0c90745def81c18aedfa591", size = 915523882, upload-time = "2026-03-11T14:14:46.311Z" },
{ url = "https://files.pythonhosted.org/packages/78/89/f5554b13ebd71e05c0b002f95148033e730d3f7067f67423026cc9c69410/torch-2.10.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:3282d9febd1e4e476630a099692b44fdc214ee9bf8ee5377732d9d9dfe5712e4", size = 145992610, upload-time = "2026-01-21T16:25:26.327Z" }, { url = "https://files.pythonhosted.org/packages/78/89/f5554b13ebd71e05c0b002f95148033e730d3f7067f67423026cc9c69410/torch-2.10.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:3282d9febd1e4e476630a099692b44fdc214ee9bf8ee5377732d9d9dfe5712e4", size = 145992610, upload-time = "2026-01-21T16:25:26.327Z" },
{ url = "https://files.pythonhosted.org/packages/ae/30/a3a2120621bf9c17779b169fc17e3dc29b230c29d0f8222f499f5e159aa8/torch-2.10.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:a2f9edd8dbc99f62bc4dfb78af7bf89499bca3d753423ac1b4e06592e467b763", size = 915607863, upload-time = "2026-01-21T16:25:06.696Z" }, { url = "https://files.pythonhosted.org/packages/ae/30/a3a2120621bf9c17779b169fc17e3dc29b230c29d0f8222f499f5e159aa8/torch-2.10.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:a2f9edd8dbc99f62bc4dfb78af7bf89499bca3d753423ac1b4e06592e467b763", size = 915607863, upload-time = "2026-01-21T16:25:06.696Z" },
{ url = "https://files.pythonhosted.org/packages/6f/3d/c87b33c5f260a2a8ad68da7147e105f05868c281c63d65ed85aa4da98c66/torch-2.10.0-cp311-cp311-win_amd64.whl", hash = "sha256:29b7009dba4b7a1c960260fc8ac85022c784250af43af9fb0ebafc9883782ebd", size = 113723116, upload-time = "2026-01-21T16:25:21.916Z" }, { url = "https://files.pythonhosted.org/packages/6f/3d/c87b33c5f260a2a8ad68da7147e105f05868c281c63d65ed85aa4da98c66/torch-2.10.0-cp311-cp311-win_amd64.whl", hash = "sha256:29b7009dba4b7a1c960260fc8ac85022c784250af43af9fb0ebafc9883782ebd", size = 113723116, upload-time = "2026-01-21T16:25:21.916Z" },