26 KiB

Raw Permalink Blame History

CheddahBot

A personal AI assistant built in Python with a Gradio web UI. CheddahBot supports multiple LLM providers (hot-swappable at runtime), a 4-layer memory system, 15+ built-in tools, a task scheduler with heartbeat, voice chat, and the ability for the agent to create new tools and skills on the fly.

The UI runs as a Progressive Web App (PWA), so you can install it on your phone and use it like a native app.

Features
Quick Start
Configuration
Provider Setup
Architecture
Memory System
Tools Reference
Meta-Tools: Runtime Tool and Skill Creation
Scheduler and Heartbeat
Voice Chat
Identity System
Known Issues and Limitations

Features

Multi-model support -- Claude (via Claude Code CLI), OpenRouter (GPT-4o, Gemini, Mistral, Llama, and more), Ollama (local), LM Studio (local). All hot-swappable from the UI dropdown at any time.
Gradio web UI -- Clean chat interface with model switcher, conversation history, file uploads, microphone input, and camera. Launches as a PWA for mobile use.
4-layer memory -- Identity files (SOUL.md, USER.md), long-term memory (MEMORY.md), daily logs (YYYY-MM-DD.md), and semantic search over all memory via sentence-transformer embeddings.
15+ built-in tools -- File operations, shell commands, web search, URL fetching, Python code execution, image analysis, CSV/JSON processing, memory management, task scheduling.
Meta-tools -- The agent can create entirely new tools and multi-step skills at runtime. New tools are written as Python modules and hot-loaded without restarting.
Task scheduler -- Cron-based recurring tasks and one-time scheduled prompts. Includes a heartbeat system that periodically runs a proactive checklist.
Voice chat -- Speech-to-text via Whisper (local or API) and text-to-speech via edge-tts. Record audio, get a spoken response.
Persistent storage -- SQLite database for conversations, messages, scheduled tasks, and key-value storage. All conversations are saved and browsable.
Streaming responses -- Responses stream token-by-token in the chat UI for all OpenAI-compatible providers.

Quick Start

Prerequisites

Python 3.11 or later
(Optional) Node.js / npm -- only needed if using the Claude Code CLI provider
(Optional) Ollama or LM Studio -- for local model inference
(Optional) ffmpeg -- for video frame extraction

Install

# Clone the repository
git clone <your-repo-url> CheddahBot
cd CheddahBot

# Create a virtual environment (recommended)
python -m venv .venv
.venv\Scripts\activate       # Windows
# source .venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

Configure

Copy or edit the .env file in the project root:

# Required for OpenRouter (recommended primary provider)
OPENROUTER_API_KEY=your-key-here

# Optional overrides
# CHEDDAH_DEFAULT_MODEL=claude-sonnet-4-20250514
# CHEDDAH_HOST=0.0.0.0
# CHEDDAH_PORT=7860

Get an OpenRouter API key at https://openrouter.ai/keys.

Run

python -m cheddahbot

The Gradio UI will launch at http://localhost:7860 by default. On your local network it will also be accessible at http://<your-ip>:7860. The PWA can be installed from the browser on mobile devices.

Configuration

CheddahBot loads configuration in this priority order: environment variables (highest), then config.yaml, then built-in defaults.

config.yaml

Located at the project root. Controls server settings, memory parameters, scheduler timing, local model endpoints, and shell safety settings.

# Default model to use on startup
default_model: "claude-sonnet-4-20250514"

# Gradio server settings
host: "0.0.0.0"
port: 7860

# Memory settings
memory:
  max_context_messages: 50       # Messages kept in the LLM context window
  flush_threshold: 40            # Auto-summarize when message count exceeds this
  embedding_model: "all-MiniLM-L6-v2"  # Sentence-transformer model for semantic search
  search_top_k: 5                # Number of semantic search results returned

# Scheduler settings
scheduler:
  heartbeat_interval_minutes: 30
  poll_interval_seconds: 60

# Local model endpoints (auto-detected)
ollama_url: "http://localhost:11434"
lmstudio_url: "http://localhost:1234"

# Safety settings
shell:
  blocked_commands:
    - "rm -rf /"
    - "format"
    - ":(){:|:&};:"
  require_approval: false        # If true, shell commands need user confirmation

.env

Environment variables with the CHEDDAH_ prefix override config.yaml values:

Variable	Description
`OPENROUTER_API_KEY`	Your OpenRouter API key (recommended)
`CHEDDAH_DEFAULT_MODEL`	Override the default model ID
`CHEDDAH_HOST`	Override the Gradio server host
`CHEDDAH_PORT`	Override the Gradio server port
`GMAIL_USERNAME`	Gmail address for sending emails (enables email tool)
`GMAIL_APP_PASSWORD`	Gmail app password (create one here)
`EMAIL_DEFAULT_TO`	Default recipient for the `email_file` tool

Identity Files

See the Identity System section below.

Provider Setup

CheddahBot routes model requests to different backends based on the selected model ID. You can switch models at any time from the dropdown in the UI.

OpenRouter (Recommended)

OpenRouter is the recommended primary provider. It gives full control over system prompts, supports tool/function calling, and provides access to a wide range of models through a single API key -- including Claude, GPT-4o, Gemini, Mistral, Llama, and many others.

Sign up at https://openrouter.ai and create an API key.
Set OPENROUTER_API_KEY in your .env file.
Select any OpenRouter model from the UI dropdown.

Pre-configured OpenRouter models:

Model ID	Display Name
`openai/gpt-4o`	GPT-4o
`openai/gpt-4o-mini`	GPT-4o Mini
`google/gemini-2.0-flash-001`	Gemini 2.0 Flash
`google/gemini-2.5-pro-preview`	Gemini 2.5 Pro
`mistralai/mistral-large`	Mistral Large
`meta-llama/llama-3.3-70b-instruct`	Llama 3.3 70B

You can use any model ID supported by OpenRouter -- the ones above are just the pre-populated dropdown entries.

Ollama (Local, Free)

Ollama is fully supported for running local models with no API key required.

Install Ollama from https://ollama.com.
Pull a model: ollama pull llama3.1 (or any model you want).
Start Ollama (it runs on http://localhost:11434 by default).
Click the Refresh button in the CheddahBot UI. Your Ollama models will appear in the dropdown with a [Ollama] prefix.

Model IDs follow the format local/ollama/<model-name> (e.g., local/ollama/llama3.1).

LM Studio (Local)

LM Studio provides a local OpenAI-compatible API.

Install LM Studio from https://lmstudio.ai.
Load a model and start the local server (default: http://localhost:1234).
Click Refresh in the CheddahBot UI. Your LM Studio models appear with a [LM Studio] prefix.

Model IDs follow the format local/lmstudio/<model-id>.

Claude Code CLI

Claude models (Sonnet, Opus, Haiku) are routed through the Claude Code CLI (claude -p), which uses your Anthropic Max subscription.

Install Claude Code: npm install -g @anthropic-ai/claude-code
Make sure claude is available in your PATH.
Claude models will appear in the dropdown by default.

Important caveat: The Claude Code CLI is designed as a coding assistant. When invoked via claude -p, it does not fully respect custom system prompts -- it applies its own internal system prompt on top of whatever you provide. This means the personality defined in SOUL.md and the tool-use instructions may not be followed reliably when using Claude via this path. This is a known limitation of the CLI integration.

Recommendation: If you want full control over system prompts and behavior (which is important for the identity system, memory injection, and tool calling to work properly), use Claude models through OpenRouter instead. OpenRouter supports Claude models with standard OpenAI-compatible API semantics, giving you complete control over the system prompt.

Architecture

Directory Structure

CheddahBot/
  config.yaml          # Main configuration file
  .env                 # API keys and environment overrides
  requirements.txt     # Python dependencies
  identity/
    SOUL.md            # Agent personality definition
    USER.md            # User profile (filled in by you)
    HEARTBEAT.md       # Proactive checklist for heartbeat cycle
  memory/              # Runtime memory files (gitignored)
    MEMORY.md          # Long-term learned facts
    YYYY-MM-DD.md      # Daily logs
    embeddings.db      # Vector embeddings for semantic search
  data/
    cheddahbot.db      # SQLite database (conversations, tasks, KV store)
    uploads/           # User-uploaded files
    generated/         # Agent-generated files (TTS output, etc.)
  skills/              # User/agent-created skill modules
  cheddahbot/
    __main__.py        # Entry point (python -m cheddahbot)
    config.py          # Configuration loader
    db.py              # SQLite persistence layer
    llm.py             # Model-agnostic LLM adapter
    router.py          # System prompt builder and message formatter
    agent.py           # Core agent loop (LLM + tools + memory)
    memory.py          # 4-layer memory system
    ui.py              # Gradio web interface
    scheduler.py       # Task scheduler and heartbeat
    media.py           # Audio/video processing (STT, TTS, video frames)
    providers/         # Reserved for future custom providers
    tools/
      __init__.py      # Tool registry, @tool decorator, auto-discovery
      file_ops.py      # File read/write/edit/search tools
      shell.py         # Shell command execution
      web.py           # Web search and URL fetching
      code_exec.py     # Python code execution (sandboxed subprocess)
      calendar_tool.py # Memory and scheduling tools
      image.py         # Image analysis via vision-capable LLM
      data_proc.py     # CSV and JSON processing
      build_tool.py    # Meta-tool: create new tools at runtime
      build_skill.py   # Meta-tool: create new skills at runtime
    skills/
      __init__.py      # Skill registry, @skill decorator, dynamic loader

Module Responsibilities

__main__.py -- Application entry point. Initializes configuration, database, LLM adapter, agent, memory system, tool system, and scheduler in sequence, then launches the Gradio UI.

config.py -- Loads configuration from .env, config.yaml, and built-in defaults using a layered override approach. Defines dataclasses for Config, MemoryConfig, SchedulerConfig, and ShellConfig. Creates required data directories on startup.

db.py -- SQLite persistence layer using WAL mode for concurrent access. Manages conversations, messages (with tool call metadata), scheduled tasks, task run logs, and a general-purpose key-value store. Thread-safe via threading.local().

llm.py -- Model-agnostic LLM adapter that routes requests to the appropriate backend based on the model ID. Claude models go through the Claude Code CLI subprocess. All other models (OpenRouter, Ollama, LM Studio) go through the OpenAI Python SDK against the appropriate base URL. Handles streaming, tool call accumulation, and model discovery for local providers.

router.py -- Builds the system prompt by concatenating identity files (SOUL.md, USER.md), memory context, tool descriptions, and core instructions. Also handles formatting conversation history into the LLM message format.

agent.py -- The core agent loop. On each user message: stores the message, builds the system prompt with memory context, calls the LLM, checks for tool calls, executes tools, feeds results back to the LLM, and repeats (up to 10 iterations). Handles streaming output to the UI. Triggers memory auto-flush when conversation length exceeds the configured threshold.

memory.py -- Implements the 4-layer memory system (see Memory System below). Manages long-term memory files, daily logs, embedding-based semantic search, conversation summarization, and reindexing.

ui.py -- Gradio interface with a chat panel, model dropdown with refresh, new chat button, multimodal input (text, file upload, microphone), voice chat accordion, conversation history browser, and settings section. Supports streaming responses.

scheduler.py -- Background thread that polls for due scheduled tasks (cron-based or one-time) and executes them by sending prompts to the agent. Includes a separate heartbeat thread that periodically reads HEARTBEAT.md and asks the agent to act on any items that need attention.

media.py -- Audio and video processing. Speech-to-text via local Whisper or OpenAI Whisper API. Text-to-speech via edge-tts (free, no API key). Video frame extraction via ffmpeg.

tools/__init__.py -- Tool registry with a @tool decorator for registering functions, automatic parameter schema extraction from type hints, OpenAI function-calling schema generation, auto-discovery of tool modules via pkgutil, and runtime execution with context injection.

skills/__init__.py -- Skill registry with a @skill decorator, dynamic loading from .py files in the skills/ directory, and runtime execution.

Memory System

CheddahBot uses a 4-layer memory architecture that gives the agent both persistent knowledge and contextual awareness.

Layer 1: Identity (SOUL.md + USER.md)

Static files in identity/ that define who the agent is and who the user is. These are loaded into the system prompt on every request. See Identity System.

Layer 2: Long-Term Memory (MEMORY.md)

A Markdown file at memory/MEMORY.md containing timestamped facts, preferences, and instructions the agent has learned. The agent writes to this file using the remember_this tool. The most recent 2000 characters are injected into the system prompt.

Example entries:

- [2025-06-15 14:30] User prefers tabs over spaces
- [2025-06-15 15:00] User's project deadline is June 30th

Layer 3: Daily Logs (YYYY-MM-DD.md)

Date-stamped Markdown files in memory/ that capture timestamped notes, conversation summaries, and heartbeat actions for each day. The agent writes to these using the log_note tool. Today's log (up to 1500 characters) is injected into the system prompt.

When conversation length exceeds the configured flush_threshold (default 40 messages), older messages are automatically summarized and moved to the daily log.

Layer 4: Semantic Search (Embeddings)

All memory entries are indexed using sentence-transformer embeddings (all-MiniLM-L6-v2 by default) and stored in memory/embeddings.db. On each user message, a semantic search is performed against the index, and the top-k most relevant memory fragments are injected into the system prompt.

If sentence-transformers is not installed, the system falls back to a keyword-based search over the Markdown files.

The reindex_all() method rebuilds the entire embedding index from all memory files.

Tools Reference

Tools are registered using the @tool decorator and auto-discovered at startup. They are exposed to the LLM via OpenAI-compatible function-calling schema. The agent can chain multiple tool calls in a single response (up to 10 iterations).

Files

Tool	Description
`read_file(path)`	Read the contents of a file (up to 50K chars)
`write_file(path, content)`	Write content to a file (creates or overwrites)
`edit_file(path, old_text, new_text)`	Replace the first occurrence of text in a file
`list_directory(path)`	List files and folders with sizes
`search_files(pattern, directory)`	Search for files matching a glob pattern
`search_in_files(query, directory, extension)`	Search for text content across files

Shell

Tool	Description
`run_command(command, timeout)`	Execute a shell command (with safety checks, max 120s)

Blocked patterns include rm -rf /, format c:, fork bombs, dd if=/dev/zero, mkfs., and writes to /dev/sda.

Web

Tool	Description
`web_search(query, max_results)`	Search the web via DuckDuckGo (no API key needed)
`fetch_url(url)`	Fetch and extract text content from a URL (HTML parsed, scripts/nav stripped)

Code

Tool	Description
`run_python(code, timeout)`	Execute Python code in a subprocess (max 60s)

Memory

Tool	Description
`remember_this(text)`	Save a fact or instruction to long-term memory (MEMORY.md)
`search_memory(query)`	Semantic search through saved memories
`log_note(text)`	Add a timestamped note to today's daily log

Scheduling

Tool	Description
`schedule_task(name, prompt, schedule)`	Schedule a recurring (cron) or one-time (`once:YYYY-MM-DDTHH:MM`) task
`list_tasks()`	List all scheduled tasks with status

Media

Tool	Description
`analyze_image(path, question)`	Analyze an image using the current vision-capable LLM

Data

Tool	Description
`read_csv(path, max_rows)`	Read a CSV file and display as a formatted table
`read_json(path)`	Read and pretty-print a JSON file
`query_json(path, json_path)`	Extract data from JSON using dot-notation (`data.users.0.name`)

Content

Tool	Description
`write_press_releases(topic, company_name, ...)`	Full autonomous PR pipeline: generates headlines, writes 2 press releases with JSON-LD schemas, saves `.txt` + `.docx` files

Delivery

Tool	Description
`email_file(file_path, to, subject)`	Email a file as an attachment via Gmail SMTP. Auto-converts `.txt` to `.docx` before sending

Tool	Description
`build_tool(name, description, code)`	Create a new tool module at runtime (see below)
`build_skill(name, description, steps)`	Create a new multi-step skill at runtime (see below)

Meta-Tools: Runtime Tool and Skill Creation

One of CheddahBot's distinctive features is that the agent can extend its own capabilities at runtime by writing new tools and skills.

build_tool

The build_tool meta-tool allows the agent to create a new tool by writing Python code with the @tool decorator. The code is saved as a new module in the cheddahbot/tools/ directory and hot-loaded immediately -- no restart required.

Example: if you ask "create a tool that counts words in a file", the agent will:

Write a Python function with the @tool decorator.
Save it to cheddahbot/tools/word_counter.py.
Import and register it at runtime.
The new tool is immediately available for use.

The generated module includes the necessary imports automatically. Tool names must be valid Python identifiers and cannot overwrite existing modules.

build_skill

The build_skill meta-tool creates multi-step skills -- higher-level operations that combine multiple actions. Skills are saved to the skills/ directory and loaded via the skill registry.

Skills use the @skill decorator from the skills module and can orchestrate complex workflows.

Scheduler and Heartbeat

Scheduled Tasks

The scheduler runs as a background thread that polls the database for due tasks every 60 seconds (configurable via scheduler.poll_interval_seconds).

Tasks can be created by the agent using the schedule_task tool:

Cron schedule -- Standard cron expressions (e.g., 0 9 * * * for daily at 9 AM). The next run time is calculated after each execution.
One-time -- Use the format once:YYYY-MM-DDTHH:MM. The task is automatically disabled after it runs.

When a task fires, its prompt is sent to the agent via respond_to_prompt, and the result is logged to the task_run_logs table.

Heartbeat

The heartbeat is a separate background thread that runs on a configurable interval (default: every 30 minutes). On each cycle, it:

Reads identity/HEARTBEAT.md -- a checklist of things to proactively check.
Sends the checklist to the agent as a prompt.
If the agent determines nothing needs attention, it responds with HEARTBEAT_OK and no action is taken.
If the agent takes action, the result is logged to the daily memory log.

The default heartbeat checklist includes checking for failed scheduled tasks, reviewing pending reminders, and checking disk space. You can customize HEARTBEAT.md with any proactive checks you want.

Voice Chat

CheddahBot supports a full voice conversation loop: speak, get a spoken response.

Speech-to-Text (STT)

Audio input is transcribed using Whisper. The system tries local Whisper first (if the whisper package is installed), then falls back to the OpenAI Whisper API.

Audio can be provided in two ways:

Microphone input in the main chat -- audio files are automatically detected and transcribed, with the transcript appended to the message.
Voice Chat accordion -- a dedicated record-and-respond mode.

Supported audio formats: WAV, MP3, OGG, WebM, M4A.

Text-to-Speech (TTS)

Responses are spoken using edge-tts, which is free and requires no API key. The default voice is en-US-AriaNeural. TTS output is saved to data/generated/voice_response.mp3 and played back automatically in the Voice Chat panel.

Install edge-tts:

pip install edge-tts

Video Frame Extraction

The media module also supports extracting key frames from video files using ffmpeg (used internally for video analysis workflows). Requires ffmpeg and ffprobe in your PATH.

Identity System

CheddahBot's identity is defined by three Markdown files in the identity/ directory.

SOUL.md

Defines the agent's personality, boundaries, and behavioral quirks. This is injected at the top of every system prompt.

Default personality traits:

Direct and no-nonsense but warm
Uses humor when appropriate
Proactive -- suggests things before being asked
Remembers and references past conversations naturally

Edit this file to customize the agent's personality to your liking.

USER.md

Your user profile. Contains your name, how you want to be addressed, your technical level, primary language, current projects, communication preferences, and anything else you want the agent to know about you.

Fill this in after installation -- the more context you provide, the more personalized the agent's responses will be.

HEARTBEAT.md

A checklist of proactive tasks for the heartbeat system. Each item is something the agent should check on periodically. See Scheduler and Heartbeat.

Known Issues and Limitations

Claude Code CLI System Prompt

The Claude Code CLI (claude -p) is designed as a coding assistant and applies its own internal system prompt. Custom system prompts passed via --system-prompt are appended but do not override the built-in behavior. This means:

The SOUL.md personality may not be followed reliably.
Tool-use instructions may be ignored or overridden.
The agent may behave more like a coding assistant than a personal assistant.

Workaround: Use Claude models through OpenRouter instead of the CLI. OpenRouter provides standard API access to Claude with full system prompt control.

Claude Code CLI Does Not Support Streaming

The Claude Code CLI integration uses subprocess.Popen with communicate(), which means the entire response is collected before being displayed. There is no token-by-token streaming for Claude CLI responses. OpenRouter, Ollama, and LM Studio all support true streaming.

Claude Code CLI Tool Calling

Tool calling is not supported through the Claude Code CLI path. The --tools "" flag is passed to disable Claude Code's built-in tools, and CheddahBot's own tools are described in the system prompt rather than via function-calling schema. This makes tool use unreliable with the CLI backend. Again, OpenRouter is the recommended provider for full tool support.

Embedding Model Download

The first time the memory system initializes, it downloads the all-MiniLM-L6-v2 sentence-transformer model (approximately 80 MB). This requires an internet connection and may take a moment. Subsequent starts use the cached model.

If sentence-transformers is not installed, the memory system falls back to keyword-based search. Semantic search will not be available but everything else works.

Shell Command Safety

The shell tool blocks a set of known dangerous command patterns, but it is not a full sandbox. Commands run with the same permissions as the CheddahBot process. Exercise caution with the run_command tool, especially on production machines.

Conversation Context Window

The system keeps the most recent 50 messages (configurable via memory.max_context_messages) in the LLM context window. Older messages are summarized and moved to the daily log when the count exceeds flush_threshold (default 40). Very long conversations may lose fine-grained detail from earlier messages.

Single Conversation at a Time

The agent maintains one active conversation at a time in memory. You can start a new chat (which creates a new conversation in the database) and browse past conversations in the history panel, but there is no multi-user or multi-session support.

Local Model Limitations

Ollama and LM Studio models vary widely in their ability to follow tool-calling schemas. Smaller models may not reliably use tools. For best results with local models, use models that are known to support function calling (e.g., Llama 3.1+ instruct variants).

26 KiB Raw Permalink Blame History