CheddahBot/README.md

# CheddahBot

A personal AI assistant built in Python with a Gradio web UI. CheddahBot supports multiple LLM providers (hot-swappable at runtime), a 4-layer memory system, 15+ built-in tools, a task scheduler with heartbeat, voice chat, and the ability for the agent to create new tools and skills on the fly.

The UI runs as a Progressive Web App (PWA), so you can install it on your phone and use it like a native app.

---

## Table of Contents

- [Features](#features)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [Provider Setup](#provider-setup)
- [Architecture](#architecture)
- [Memory System](#memory-system)
- [Tools Reference](#tools-reference)
- [Meta-Tools: Runtime Tool and Skill Creation](#meta-tools-runtime-tool-and-skill-creation)
- [Scheduler and Heartbeat](#scheduler-and-heartbeat)
- [Voice Chat](#voice-chat)
- [Identity System](#identity-system)
- [Known Issues and Limitations](#known-issues-and-limitations)

---

## Features

- **Multi-model support** -- Claude (via Claude Code CLI), OpenRouter (GPT-4o, Gemini, Mistral, Llama, and more), Ollama (local), LM Studio (local). All hot-swappable from the UI dropdown at any time.
- **Gradio web UI** -- Clean chat interface with model switcher, conversation history, file uploads, microphone input, and camera. Launches as a PWA for mobile use.
- **4-layer memory** -- Identity files (SOUL.md, USER.md), long-term memory (MEMORY.md), daily logs (YYYY-MM-DD.md), and semantic search over all memory via sentence-transformer embeddings.
- **15+ built-in tools** -- File operations, shell commands, web search, URL fetching, Python code execution, image analysis, CSV/JSON processing, memory management, task scheduling.
- **Meta-tools** -- The agent can create entirely new tools and multi-step skills at runtime. New tools are written as Python modules and hot-loaded without restarting.
- **Task scheduler** -- Cron-based recurring tasks and one-time scheduled prompts. Includes a heartbeat system that periodically runs a proactive checklist.
- **Voice chat** -- Speech-to-text via Whisper (local or API) and text-to-speech via edge-tts. Record audio, get a spoken response.
- **Persistent storage** -- SQLite database for conversations, messages, scheduled tasks, and key-value storage. All conversations are saved and browsable.
- **Streaming responses** -- Responses stream token-by-token in the chat UI for all OpenAI-compatible providers.

---

## Quick Start

### Prerequisites

- Python 3.11 or later
- (Optional) Node.js / npm -- only needed if using the Claude Code CLI provider
- (Optional) Ollama or LM Studio -- for local model inference
- (Optional) ffmpeg -- for video frame extraction

### Install

```bash
# Clone the repository
git clone <your-repo-url> CheddahBot
cd CheddahBot

# Create a virtual environment (recommended)
python -m venv .venv
.venv\Scripts\activate       # Windows
# source .venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt
```

### Configure

Copy or edit the `.env` file in the project root:

```env
# Required for OpenRouter (recommended primary provider)
OPENROUTER_API_KEY=your-key-here

# Optional overrides
# CHEDDAH_DEFAULT_MODEL=claude-sonnet-4-20250514
# CHEDDAH_HOST=0.0.0.0
# CHEDDAH_PORT=7860
```

Get an OpenRouter API key at https://openrouter.ai/keys.

### Run

```bash
python -m cheddahbot
```

The Gradio UI will launch at `http://localhost:7860` by default. On your local network it will also be accessible at `http://<your-ip>:7860`. The PWA can be installed from the browser on mobile devices.

---

## Configuration

CheddahBot loads configuration in this priority order: environment variables (highest), then `config.yaml`, then built-in defaults.

### config.yaml

Located at the project root. Controls server settings, memory parameters, scheduler timing, local model endpoints, and shell safety settings.

```yaml
# Default model to use on startup
default_model: "claude-sonnet-4-20250514"

# Gradio server settings
host: "0.0.0.0"
port: 7860

# Memory settings
memory:
  max_context_messages: 50       # Messages kept in the LLM context window
  flush_threshold: 40            # Auto-summarize when message count exceeds this
  embedding_model: "all-MiniLM-L6-v2"  # Sentence-transformer model for semantic search
  search_top_k: 5                # Number of semantic search results returned

# Scheduler settings
scheduler:
  heartbeat_interval_minutes: 30
  poll_interval_seconds: 60

# Local model endpoints (auto-detected)
ollama_url: "http://localhost:11434"
lmstudio_url: "http://localhost:1234"

# Safety settings
shell:
  blocked_commands:
    - "rm -rf /"
    - "format"
    - ":(){:|:&};:"
  require_approval: false        # If true, shell commands need user confirmation
```

### .env

Environment variables with the `CHEDDAH_` prefix override `config.yaml` values:

| Variable | Description |
|---|---|
| `OPENROUTER_API_KEY` | Your OpenRouter API key (recommended) |
| `CHEDDAH_DEFAULT_MODEL` | Override the default model ID |
| `CHEDDAH_HOST` | Override the Gradio server host |
| `CHEDDAH_PORT` | Override the Gradio server port |
| `GMAIL_USERNAME` | Gmail address for sending emails (enables email tool) |
| `GMAIL_APP_PASSWORD` | Gmail app password ([create one here](https://myaccount.google.com/apppasswords)) |
| `EMAIL_DEFAULT_TO` | Default recipient for the `email_file` tool |

### Identity Files

See the [Identity System](#identity-system) section below.

---

## Provider Setup

CheddahBot routes model requests to different backends based on the selected model ID. You can switch models at any time from the dropdown in the UI.

### OpenRouter (Recommended)

OpenRouter is the recommended primary provider. It gives full control over system prompts, supports tool/function calling, and provides access to a wide range of models through a single API key -- including Claude, GPT-4o, Gemini, Mistral, Llama, and many others.

1. Sign up at https://openrouter.ai and create an API key.
2. Set `OPENROUTER_API_KEY` in your `.env` file.
3. Select any OpenRouter model from the UI dropdown.

Pre-configured OpenRouter models:

| Model ID | Display Name |
|---|---|
| `openai/gpt-4o` | GPT-4o |
| `openai/gpt-4o-mini` | GPT-4o Mini |
| `google/gemini-2.0-flash-001` | Gemini 2.0 Flash |
| `google/gemini-2.5-pro-preview` | Gemini 2.5 Pro |
| `mistralai/mistral-large` | Mistral Large |
| `meta-llama/llama-3.3-70b-instruct` | Llama 3.3 70B |

You can use any model ID supported by OpenRouter -- the ones above are just the pre-populated dropdown entries.

### Ollama (Local, Free)

Ollama is fully supported for running local models with no API key required.

1. Install Ollama from https://ollama.com.
2. Pull a model: `ollama pull llama3.1` (or any model you want).
3. Start Ollama (it runs on `http://localhost:11434` by default).
4. Click the **Refresh** button in the CheddahBot UI. Your Ollama models will appear in the dropdown with a `[Ollama]` prefix.

Model IDs follow the format `local/ollama/<model-name>` (e.g., `local/ollama/llama3.1`).

### LM Studio (Local)

LM Studio provides a local OpenAI-compatible API.

1. Install LM Studio from https://lmstudio.ai.
2. Load a model and start the local server (default: `http://localhost:1234`).
3. Click **Refresh** in the CheddahBot UI. Your LM Studio models appear with a `[LM Studio]` prefix.

Model IDs follow the format `local/lmstudio/<model-id>`.

### Claude Code CLI

Claude models (Sonnet, Opus, Haiku) are routed through the Claude Code CLI (`claude -p`), which uses your Anthropic Max subscription.

1. Install Claude Code: `npm install -g @anthropic-ai/claude-code`
2. Make sure `claude` is available in your PATH.
3. Claude models will appear in the dropdown by default.

**Important caveat:** The Claude Code CLI is designed as a coding assistant. When invoked via `claude -p`, it does not fully respect custom system prompts -- it applies its own internal system prompt on top of whatever you provide. This means the personality defined in `SOUL.md` and the tool-use instructions may not be followed reliably when using Claude via this path. This is a known limitation of the CLI integration.

**Recommendation:** If you want full control over system prompts and behavior (which is important for the identity system, memory injection, and tool calling to work properly), use Claude models through OpenRouter instead. OpenRouter supports Claude models with standard OpenAI-compatible API semantics, giving you complete control over the system prompt.

---

## Architecture

### Directory Structure

```
CheddahBot/
  config.yaml          # Main configuration file
  .env                 # API keys and environment overrides
  requirements.txt     # Python dependencies
  identity/
    SOUL.md            # Agent personality definition
    USER.md            # User profile (filled in by you)
    HEARTBEAT.md       # Proactive checklist for heartbeat cycle
  memory/              # Runtime memory files (gitignored)
    MEMORY.md          # Long-term learned facts
    YYYY-MM-DD.md      # Daily logs
    embeddings.db      # Vector embeddings for semantic search
  data/
    cheddahbot.db      # SQLite database (conversations, tasks, KV store)
    uploads/           # User-uploaded files
    generated/         # Agent-generated files (TTS output, etc.)
  skills/              # User/agent-created skill modules
  cheddahbot/
    __main__.py        # Entry point (python -m cheddahbot)
    config.py          # Configuration loader
    db.py              # SQLite persistence layer
    llm.py             # Model-agnostic LLM adapter
    router.py          # System prompt builder and message formatter
    agent.py           # Core agent loop (LLM + tools + memory)
    memory.py          # 4-layer memory system
    ui.py              # Gradio web interface
    scheduler.py       # Task scheduler and heartbeat
    media.py           # Audio/video processing (STT, TTS, video frames)
    providers/         # Reserved for future custom providers
    tools/
      __init__.py      # Tool registry, @tool decorator, auto-discovery
      file_ops.py      # File read/write/edit/search tools
      shell.py         # Shell command execution
      web.py           # Web search and URL fetching
      code_exec.py     # Python code execution (sandboxed subprocess)
      calendar_tool.py # Memory and scheduling tools
      image.py         # Image analysis via vision-capable LLM
      data_proc.py     # CSV and JSON processing
      build_tool.py    # Meta-tool: create new tools at runtime
      build_skill.py   # Meta-tool: create new skills at runtime
    skills/
      __init__.py      # Skill registry, @skill decorator, dynamic loader
```

### Module Responsibilities

**`__main__.py`** -- Application entry point. Initializes configuration, database, LLM adapter, agent, memory system, tool system, and scheduler in sequence, then launches the Gradio UI.

**`config.py`** -- Loads configuration from `.env`, `config.yaml`, and built-in defaults using a layered override approach. Defines dataclasses for `Config`, `MemoryConfig`, `SchedulerConfig`, and `ShellConfig`. Creates required data directories on startup.

**`db.py`** -- SQLite persistence layer using WAL mode for concurrent access. Manages conversations, messages (with tool call metadata), scheduled tasks, task run logs, and a general-purpose key-value store. Thread-safe via `threading.local()`.

**`llm.py`** -- Model-agnostic LLM adapter that routes requests to the appropriate backend based on the model ID. Claude models go through the Claude Code CLI subprocess. All other models (OpenRouter, Ollama, LM Studio) go through the OpenAI Python SDK against the appropriate base URL. Handles streaming, tool call accumulation, and model discovery for local providers.

**`router.py`** -- Builds the system prompt by concatenating identity files (SOUL.md, USER.md), memory context, tool descriptions, and core instructions. Also handles formatting conversation history into the LLM message format.

**`agent.py`** -- The core agent loop. On each user message: stores the message, builds the system prompt with memory context, calls the LLM, checks for tool calls, executes tools, feeds results back to the LLM, and repeats (up to 10 iterations). Handles streaming output to the UI. Triggers memory auto-flush when conversation length exceeds the configured threshold.

**`memory.py`** -- Implements the 4-layer memory system (see [Memory System](#memory-system) below). Manages long-term memory files, daily logs, embedding-based semantic search, conversation summarization, and reindexing.

**`ui.py`** -- Gradio interface with a chat panel, model dropdown with refresh, new chat button, multimodal input (text, file upload, microphone), voice chat accordion, conversation history browser, and settings section. Supports streaming responses.

**`scheduler.py`** -- Background thread that polls for due scheduled tasks (cron-based or one-time) and executes them by sending prompts to the agent. Includes a separate heartbeat thread that periodically reads `HEARTBEAT.md` and asks the agent to act on any items that need attention.

**`media.py`** -- Audio and video processing. Speech-to-text via local Whisper or OpenAI Whisper API. Text-to-speech via edge-tts (free, no API key). Video frame extraction via ffmpeg.

**`tools/__init__.py`** -- Tool registry with a `@tool` decorator for registering functions, automatic parameter schema extraction from type hints, OpenAI function-calling schema generation, auto-discovery of tool modules via `pkgutil`, and runtime execution with context injection.

**`skills/__init__.py`** -- Skill registry with a `@skill` decorator, dynamic loading from `.py` files in the `skills/` directory, and runtime execution.

---

## Memory System

CheddahBot uses a 4-layer memory architecture that gives the agent both persistent knowledge and contextual awareness.

### Layer 1: Identity (SOUL.md + USER.md)

Static files in `identity/` that define who the agent is and who the user is. These are loaded into the system prompt on every request. See [Identity System](#identity-system).

### Layer 2: Long-Term Memory (MEMORY.md)

A Markdown file at `memory/MEMORY.md` containing timestamped facts, preferences, and instructions the agent has learned. The agent writes to this file using the `remember_this` tool. The most recent 2000 characters are injected into the system prompt.

Example entries:
```
- [2025-06-15 14:30] User prefers tabs over spaces
- [2025-06-15 15:00] User's project deadline is June 30th
```

### Layer 3: Daily Logs (YYYY-MM-DD.md)

Date-stamped Markdown files in `memory/` that capture timestamped notes, conversation summaries, and heartbeat actions for each day. The agent writes to these using the `log_note` tool. Today's log (up to 1500 characters) is injected into the system prompt.

When conversation length exceeds the configured `flush_threshold` (default 40 messages), older messages are automatically summarized and moved to the daily log.

### Layer 4: Semantic Search (Embeddings)

All memory entries are indexed using sentence-transformer embeddings (`all-MiniLM-L6-v2` by default) and stored in `memory/embeddings.db`. On each user message, a semantic search is performed against the index, and the top-k most relevant memory fragments are injected into the system prompt.

If `sentence-transformers` is not installed, the system falls back to a keyword-based search over the Markdown files.

The `reindex_all()` method rebuilds the entire embedding index from all memory files.

---

## Tools Reference

Tools are registered using the `@tool` decorator and auto-discovered at startup. They are exposed to the LLM via OpenAI-compatible function-calling schema. The agent can chain multiple tool calls in a single response (up to 10 iterations).

### Files

| Tool | Description |
|---|---|
| `read_file(path)` | Read the contents of a file (up to 50K chars) |
| `write_file(path, content)` | Write content to a file (creates or overwrites) |
| `edit_file(path, old_text, new_text)` | Replace the first occurrence of text in a file |
| `list_directory(path)` | List files and folders with sizes |
| `search_files(pattern, directory)` | Search for files matching a glob pattern |
| `search_in_files(query, directory, extension)` | Search for text content across files |

### Shell

| Tool | Description |
|---|---|
| `run_command(command, timeout)` | Execute a shell command (with safety checks, max 120s) |

Blocked patterns include `rm -rf /`, `format c:`, fork bombs, `dd if=/dev/zero`, `mkfs.`, and writes to `/dev/sda`.

### Web

| Tool | Description |
|---|---|
| `web_search(query, max_results)` | Search the web via DuckDuckGo (no API key needed) |
| `fetch_url(url)` | Fetch and extract text content from a URL (HTML parsed, scripts/nav stripped) |

### Code

| Tool | Description |
|---|---|
| `run_python(code, timeout)` | Execute Python code in a subprocess (max 60s) |

### Memory

| Tool | Description |
|---|---|
| `remember_this(text)` | Save a fact or instruction to long-term memory (MEMORY.md) |
| `search_memory(query)` | Semantic search through saved memories |
| `log_note(text)` | Add a timestamped note to today's daily log |

### Scheduling

| Tool | Description |
|---|---|
| `schedule_task(name, prompt, schedule)` | Schedule a recurring (cron) or one-time (`once:YYYY-MM-DDTHH:MM`) task |
| `list_tasks()` | List all scheduled tasks with status |

### Media

| Tool | Description |
|---|---|
| `analyze_image(path, question)` | Analyze an image using the current vision-capable LLM |

### Data

| Tool | Description |
|---|---|
| `read_csv(path, max_rows)` | Read a CSV file and display as a formatted table |
| `read_json(path)` | Read and pretty-print a JSON file |
| `query_json(path, json_path)` | Extract data from JSON using dot-notation (`data.users.0.name`) |

### Content

| Tool | Description |
|---|---|
| `write_press_releases(topic, company_name, ...)` | Full autonomous PR pipeline: generates headlines, writes 2 press releases with JSON-LD schemas, saves `.txt` + `.docx` files |

### Delivery

| Tool | Description |
|---|---|
| `email_file(file_path, to, subject)` | Email a file as an attachment via Gmail SMTP. Auto-converts `.txt` to `.docx` before sending |

### Meta

| Tool | Description |
|---|---|
| `build_tool(name, description, code)` | Create a new tool module at runtime (see below) |
| `build_skill(name, description, steps)` | Create a new multi-step skill at runtime (see below) |

---

## Meta-Tools: Runtime Tool and Skill Creation

One of CheddahBot's distinctive features is that the agent can extend its own capabilities at runtime by writing new tools and skills.

### build_tool

The `build_tool` meta-tool allows the agent to create a new tool by writing Python code with the `@tool` decorator. The code is saved as a new module in the `cheddahbot/tools/` directory and hot-loaded immediately -- no restart required.

Example: if you ask "create a tool that counts words in a file", the agent will:

1. Write a Python function with the `@tool` decorator.
2. Save it to `cheddahbot/tools/word_counter.py`.
3. Import and register it at runtime.
4. The new tool is immediately available for use.

The generated module includes the necessary imports automatically. Tool names must be valid Python identifiers and cannot overwrite existing modules.

### build_skill

The `build_skill` meta-tool creates multi-step skills -- higher-level operations that combine multiple actions. Skills are saved to the `skills/` directory and loaded via the skill registry.

Skills use the `@skill` decorator from the skills module and can orchestrate complex workflows.

---

## Scheduler and Heartbeat

### Scheduled Tasks

The scheduler runs as a background thread that polls the database for due tasks every 60 seconds (configurable via `scheduler.poll_interval_seconds`).

Tasks can be created by the agent using the `schedule_task` tool:

- **Cron schedule** -- Standard cron expressions (e.g., `0 9 * * *` for daily at 9 AM). The next run time is calculated after each execution.
- **One-time** -- Use the format `once:YYYY-MM-DDTHH:MM`. The task is automatically disabled after it runs.

When a task fires, its prompt is sent to the agent via `respond_to_prompt`, and the result is logged to the `task_run_logs` table.

### Heartbeat

The heartbeat is a separate background thread that runs on a configurable interval (default: every 30 minutes). On each cycle, it:

1. Reads `identity/HEARTBEAT.md` -- a checklist of things to proactively check.
2. Sends the checklist to the agent as a prompt.
3. If the agent determines nothing needs attention, it responds with `HEARTBEAT_OK` and no action is taken.
4. If the agent takes action, the result is logged to the daily memory log.

The default heartbeat checklist includes checking for failed scheduled tasks, reviewing pending reminders, and checking disk space. You can customize `HEARTBEAT.md` with any proactive checks you want.

---

## Voice Chat

CheddahBot supports a full voice conversation loop: speak, get a spoken response.

### Speech-to-Text (STT)

Audio input is transcribed using Whisper. The system tries local Whisper first (if the `whisper` package is installed), then falls back to the OpenAI Whisper API.

Audio can be provided in two ways:
- **Microphone input** in the main chat -- audio files are automatically detected and transcribed, with the transcript appended to the message.
- **Voice Chat accordion** -- a dedicated record-and-respond mode.

Supported audio formats: WAV, MP3, OGG, WebM, M4A.

### Text-to-Speech (TTS)

Responses are spoken using edge-tts, which is free and requires no API key. The default voice is `en-US-AriaNeural`. TTS output is saved to `data/generated/voice_response.mp3` and played back automatically in the Voice Chat panel.

Install edge-tts:
```bash
pip install edge-tts
```

### Video Frame Extraction

The media module also supports extracting key frames from video files using ffmpeg (used internally for video analysis workflows). Requires `ffmpeg` and `ffprobe` in your PATH.

---

## Identity System

CheddahBot's identity is defined by three Markdown files in the `identity/` directory.

### SOUL.md

Defines the agent's personality, boundaries, and behavioral quirks. This is injected at the top of every system prompt.

Default personality traits:
- Direct and no-nonsense but warm
- Uses humor when appropriate
- Proactive -- suggests things before being asked
- Remembers and references past conversations naturally

Edit this file to customize the agent's personality to your liking.

### USER.md

Your user profile. Contains your name, how you want to be addressed, your technical level, primary language, current projects, communication preferences, and anything else you want the agent to know about you.

Fill this in after installation -- the more context you provide, the more personalized the agent's responses will be.

### HEARTBEAT.md

A checklist of proactive tasks for the heartbeat system. Each item is something the agent should check on periodically. See [Scheduler and Heartbeat](#scheduler-and-heartbeat).

---

## Known Issues and Limitations

### Claude Code CLI System Prompt

The Claude Code CLI (`claude -p`) is designed as a coding assistant and applies its own internal system prompt. Custom system prompts passed via `--system-prompt` are appended but do not override the built-in behavior. This means:

- The `SOUL.md` personality may not be followed reliably.
- Tool-use instructions may be ignored or overridden.
- The agent may behave more like a coding assistant than a personal assistant.

**Workaround:** Use Claude models through OpenRouter instead of the CLI. OpenRouter provides standard API access to Claude with full system prompt control.

### Claude Code CLI Does Not Support Streaming

The Claude Code CLI integration uses `subprocess.Popen` with `communicate()`, which means the entire response is collected before being displayed. There is no token-by-token streaming for Claude CLI responses. OpenRouter, Ollama, and LM Studio all support true streaming.

### Claude Code CLI Tool Calling

Tool calling is not supported through the Claude Code CLI path. The `--tools ""` flag is passed to disable Claude Code's built-in tools, and CheddahBot's own tools are described in the system prompt rather than via function-calling schema. This makes tool use unreliable with the CLI backend. Again, OpenRouter is the recommended provider for full tool support.

### Embedding Model Download

The first time the memory system initializes, it downloads the `all-MiniLM-L6-v2` sentence-transformer model (approximately 80 MB). This requires an internet connection and may take a moment. Subsequent starts use the cached model.

If `sentence-transformers` is not installed, the memory system falls back to keyword-based search. Semantic search will not be available but everything else works.

### Shell Command Safety

The shell tool blocks a set of known dangerous command patterns, but it is not a full sandbox. Commands run with the same permissions as the CheddahBot process. Exercise caution with the `run_command` tool, especially on production machines.

### Conversation Context Window

The system keeps the most recent 50 messages (configurable via `memory.max_context_messages`) in the LLM context window. Older messages are summarized and moved to the daily log when the count exceeds `flush_threshold` (default 40). Very long conversations may lose fine-grained detail from earlier messages.

### Single Conversation at a Time

The agent maintains one active conversation at a time in memory. You can start a new chat (which creates a new conversation in the database) and browse past conversations in the history panel, but there is no multi-user or multi-session support.

### Local Model Limitations

Ollama and LM Studio models vary widely in their ability to follow tool-calling schemas. Smaller models may not reliably use tools. For best results with local models, use models that are known to support function calling (e.g., Llama 3.1+ instruct variants).