26 KiB
CheddahBot
A personal AI assistant built in Python with a Gradio web UI. CheddahBot supports multiple LLM providers (hot-swappable at runtime), a 4-layer memory system, 15+ built-in tools, a task scheduler with heartbeat, voice chat, and the ability for the agent to create new tools and skills on the fly.
The UI runs as a Progressive Web App (PWA), so you can install it on your phone and use it like a native app.
Table of Contents
- Features
- Quick Start
- Configuration
- Provider Setup
- Architecture
- Memory System
- Tools Reference
- Meta-Tools: Runtime Tool and Skill Creation
- Scheduler and Heartbeat
- Voice Chat
- Identity System
- Known Issues and Limitations
Features
- Multi-model support -- Claude (via Claude Code CLI), OpenRouter (GPT-4o, Gemini, Mistral, Llama, and more), Ollama (local), LM Studio (local). All hot-swappable from the UI dropdown at any time.
- Gradio web UI -- Clean chat interface with model switcher, conversation history, file uploads, microphone input, and camera. Launches as a PWA for mobile use.
- 4-layer memory -- Identity files (SOUL.md, USER.md), long-term memory (MEMORY.md), daily logs (YYYY-MM-DD.md), and semantic search over all memory via sentence-transformer embeddings.
- 15+ built-in tools -- File operations, shell commands, web search, URL fetching, Python code execution, image analysis, CSV/JSON processing, memory management, task scheduling.
- Meta-tools -- The agent can create entirely new tools and multi-step skills at runtime. New tools are written as Python modules and hot-loaded without restarting.
- Task scheduler -- Cron-based recurring tasks and one-time scheduled prompts. Includes a heartbeat system that periodically runs a proactive checklist.
- Voice chat -- Speech-to-text via Whisper (local or API) and text-to-speech via edge-tts. Record audio, get a spoken response.
- Persistent storage -- SQLite database for conversations, messages, scheduled tasks, and key-value storage. All conversations are saved and browsable.
- Streaming responses -- Responses stream token-by-token in the chat UI for all OpenAI-compatible providers.
Quick Start
Prerequisites
- Python 3.11 or later
- (Optional) Node.js / npm -- only needed if using the Claude Code CLI provider
- (Optional) Ollama or LM Studio -- for local model inference
- (Optional) ffmpeg -- for video frame extraction
Install
# Clone the repository
git clone <your-repo-url> CheddahBot
cd CheddahBot
# Create a virtual environment (recommended)
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt
Configure
Copy or edit the .env file in the project root:
# Required for OpenRouter (recommended primary provider)
OPENROUTER_API_KEY=your-key-here
# Optional overrides
# CHEDDAH_DEFAULT_MODEL=claude-sonnet-4-20250514
# CHEDDAH_HOST=0.0.0.0
# CHEDDAH_PORT=7860
Get an OpenRouter API key at https://openrouter.ai/keys.
Run
python -m cheddahbot
The Gradio UI will launch at http://localhost:7860 by default. On your local network it will also be accessible at http://<your-ip>:7860. The PWA can be installed from the browser on mobile devices.
Configuration
CheddahBot loads configuration in this priority order: environment variables (highest), then config.yaml, then built-in defaults.
config.yaml
Located at the project root. Controls server settings, memory parameters, scheduler timing, local model endpoints, and shell safety settings.
# Default model to use on startup
default_model: "claude-sonnet-4-20250514"
# Gradio server settings
host: "0.0.0.0"
port: 7860
# Memory settings
memory:
max_context_messages: 50 # Messages kept in the LLM context window
flush_threshold: 40 # Auto-summarize when message count exceeds this
embedding_model: "all-MiniLM-L6-v2" # Sentence-transformer model for semantic search
search_top_k: 5 # Number of semantic search results returned
# Scheduler settings
scheduler:
heartbeat_interval_minutes: 30
poll_interval_seconds: 60
# Local model endpoints (auto-detected)
ollama_url: "http://localhost:11434"
lmstudio_url: "http://localhost:1234"
# Safety settings
shell:
blocked_commands:
- "rm -rf /"
- "format"
- ":(){:|:&};:"
require_approval: false # If true, shell commands need user confirmation
.env
Environment variables with the CHEDDAH_ prefix override config.yaml values:
| Variable | Description |
|---|---|
OPENROUTER_API_KEY |
Your OpenRouter API key (recommended) |
CHEDDAH_DEFAULT_MODEL |
Override the default model ID |
CHEDDAH_HOST |
Override the Gradio server host |
CHEDDAH_PORT |
Override the Gradio server port |
GMAIL_USERNAME |
Gmail address for sending emails (enables email tool) |
GMAIL_APP_PASSWORD |
Gmail app password (create one here) |
EMAIL_DEFAULT_TO |
Default recipient for the email_file tool |
Identity Files
See the Identity System section below.
Provider Setup
CheddahBot routes model requests to different backends based on the selected model ID. You can switch models at any time from the dropdown in the UI.
OpenRouter (Recommended)
OpenRouter is the recommended primary provider. It gives full control over system prompts, supports tool/function calling, and provides access to a wide range of models through a single API key -- including Claude, GPT-4o, Gemini, Mistral, Llama, and many others.
- Sign up at https://openrouter.ai and create an API key.
- Set
OPENROUTER_API_KEYin your.envfile. - Select any OpenRouter model from the UI dropdown.
Pre-configured OpenRouter models:
| Model ID | Display Name |
|---|---|
openai/gpt-4o |
GPT-4o |
openai/gpt-4o-mini |
GPT-4o Mini |
google/gemini-2.0-flash-001 |
Gemini 2.0 Flash |
google/gemini-2.5-pro-preview |
Gemini 2.5 Pro |
mistralai/mistral-large |
Mistral Large |
meta-llama/llama-3.3-70b-instruct |
Llama 3.3 70B |
You can use any model ID supported by OpenRouter -- the ones above are just the pre-populated dropdown entries.
Ollama (Local, Free)
Ollama is fully supported for running local models with no API key required.
- Install Ollama from https://ollama.com.
- Pull a model:
ollama pull llama3.1(or any model you want). - Start Ollama (it runs on
http://localhost:11434by default). - Click the Refresh button in the CheddahBot UI. Your Ollama models will appear in the dropdown with a
[Ollama]prefix.
Model IDs follow the format local/ollama/<model-name> (e.g., local/ollama/llama3.1).
LM Studio (Local)
LM Studio provides a local OpenAI-compatible API.
- Install LM Studio from https://lmstudio.ai.
- Load a model and start the local server (default:
http://localhost:1234). - Click Refresh in the CheddahBot UI. Your LM Studio models appear with a
[LM Studio]prefix.
Model IDs follow the format local/lmstudio/<model-id>.
Claude Code CLI
Claude models (Sonnet, Opus, Haiku) are routed through the Claude Code CLI (claude -p), which uses your Anthropic Max subscription.
- Install Claude Code:
npm install -g @anthropic-ai/claude-code - Make sure
claudeis available in your PATH. - Claude models will appear in the dropdown by default.
Important caveat: The Claude Code CLI is designed as a coding assistant. When invoked via claude -p, it does not fully respect custom system prompts -- it applies its own internal system prompt on top of whatever you provide. This means the personality defined in SOUL.md and the tool-use instructions may not be followed reliably when using Claude via this path. This is a known limitation of the CLI integration.
Recommendation: If you want full control over system prompts and behavior (which is important for the identity system, memory injection, and tool calling to work properly), use Claude models through OpenRouter instead. OpenRouter supports Claude models with standard OpenAI-compatible API semantics, giving you complete control over the system prompt.
Architecture
Directory Structure
CheddahBot/
config.yaml # Main configuration file
.env # API keys and environment overrides
requirements.txt # Python dependencies
identity/
SOUL.md # Agent personality definition
USER.md # User profile (filled in by you)
HEARTBEAT.md # Proactive checklist for heartbeat cycle
memory/ # Runtime memory files (gitignored)
MEMORY.md # Long-term learned facts
YYYY-MM-DD.md # Daily logs
embeddings.db # Vector embeddings for semantic search
data/
cheddahbot.db # SQLite database (conversations, tasks, KV store)
uploads/ # User-uploaded files
generated/ # Agent-generated files (TTS output, etc.)
skills/ # User/agent-created skill modules
cheddahbot/
__main__.py # Entry point (python -m cheddahbot)
config.py # Configuration loader
db.py # SQLite persistence layer
llm.py # Model-agnostic LLM adapter
router.py # System prompt builder and message formatter
agent.py # Core agent loop (LLM + tools + memory)
memory.py # 4-layer memory system
ui.py # Gradio web interface
scheduler.py # Task scheduler and heartbeat
media.py # Audio/video processing (STT, TTS, video frames)
providers/ # Reserved for future custom providers
tools/
__init__.py # Tool registry, @tool decorator, auto-discovery
file_ops.py # File read/write/edit/search tools
shell.py # Shell command execution
web.py # Web search and URL fetching
code_exec.py # Python code execution (sandboxed subprocess)
calendar_tool.py # Memory and scheduling tools
image.py # Image analysis via vision-capable LLM
data_proc.py # CSV and JSON processing
build_tool.py # Meta-tool: create new tools at runtime
build_skill.py # Meta-tool: create new skills at runtime
skills/
__init__.py # Skill registry, @skill decorator, dynamic loader
Module Responsibilities
__main__.py -- Application entry point. Initializes configuration, database, LLM adapter, agent, memory system, tool system, and scheduler in sequence, then launches the Gradio UI.
config.py -- Loads configuration from .env, config.yaml, and built-in defaults using a layered override approach. Defines dataclasses for Config, MemoryConfig, SchedulerConfig, and ShellConfig. Creates required data directories on startup.
db.py -- SQLite persistence layer using WAL mode for concurrent access. Manages conversations, messages (with tool call metadata), scheduled tasks, task run logs, and a general-purpose key-value store. Thread-safe via threading.local().
llm.py -- Model-agnostic LLM adapter that routes requests to the appropriate backend based on the model ID. Claude models go through the Claude Code CLI subprocess. All other models (OpenRouter, Ollama, LM Studio) go through the OpenAI Python SDK against the appropriate base URL. Handles streaming, tool call accumulation, and model discovery for local providers.
router.py -- Builds the system prompt by concatenating identity files (SOUL.md, USER.md), memory context, tool descriptions, and core instructions. Also handles formatting conversation history into the LLM message format.
agent.py -- The core agent loop. On each user message: stores the message, builds the system prompt with memory context, calls the LLM, checks for tool calls, executes tools, feeds results back to the LLM, and repeats (up to 10 iterations). Handles streaming output to the UI. Triggers memory auto-flush when conversation length exceeds the configured threshold.
memory.py -- Implements the 4-layer memory system (see Memory System below). Manages long-term memory files, daily logs, embedding-based semantic search, conversation summarization, and reindexing.
ui.py -- Gradio interface with a chat panel, model dropdown with refresh, new chat button, multimodal input (text, file upload, microphone), voice chat accordion, conversation history browser, and settings section. Supports streaming responses.
scheduler.py -- Background thread that polls for due scheduled tasks (cron-based or one-time) and executes them by sending prompts to the agent. Includes a separate heartbeat thread that periodically reads HEARTBEAT.md and asks the agent to act on any items that need attention.
media.py -- Audio and video processing. Speech-to-text via local Whisper or OpenAI Whisper API. Text-to-speech via edge-tts (free, no API key). Video frame extraction via ffmpeg.
tools/__init__.py -- Tool registry with a @tool decorator for registering functions, automatic parameter schema extraction from type hints, OpenAI function-calling schema generation, auto-discovery of tool modules via pkgutil, and runtime execution with context injection.
skills/__init__.py -- Skill registry with a @skill decorator, dynamic loading from .py files in the skills/ directory, and runtime execution.
Memory System
CheddahBot uses a 4-layer memory architecture that gives the agent both persistent knowledge and contextual awareness.
Layer 1: Identity (SOUL.md + USER.md)
Static files in identity/ that define who the agent is and who the user is. These are loaded into the system prompt on every request. See Identity System.
Layer 2: Long-Term Memory (MEMORY.md)
A Markdown file at memory/MEMORY.md containing timestamped facts, preferences, and instructions the agent has learned. The agent writes to this file using the remember_this tool. The most recent 2000 characters are injected into the system prompt.
Example entries:
- [2025-06-15 14:30] User prefers tabs over spaces
- [2025-06-15 15:00] User's project deadline is June 30th
Layer 3: Daily Logs (YYYY-MM-DD.md)
Date-stamped Markdown files in memory/ that capture timestamped notes, conversation summaries, and heartbeat actions for each day. The agent writes to these using the log_note tool. Today's log (up to 1500 characters) is injected into the system prompt.
When conversation length exceeds the configured flush_threshold (default 40 messages), older messages are automatically summarized and moved to the daily log.
Layer 4: Semantic Search (Embeddings)
All memory entries are indexed using sentence-transformer embeddings (all-MiniLM-L6-v2 by default) and stored in memory/embeddings.db. On each user message, a semantic search is performed against the index, and the top-k most relevant memory fragments are injected into the system prompt.
If sentence-transformers is not installed, the system falls back to a keyword-based search over the Markdown files.
The reindex_all() method rebuilds the entire embedding index from all memory files.
Tools Reference
Tools are registered using the @tool decorator and auto-discovered at startup. They are exposed to the LLM via OpenAI-compatible function-calling schema. The agent can chain multiple tool calls in a single response (up to 10 iterations).
Files
| Tool | Description |
|---|---|
read_file(path) |
Read the contents of a file (up to 50K chars) |
write_file(path, content) |
Write content to a file (creates or overwrites) |
edit_file(path, old_text, new_text) |
Replace the first occurrence of text in a file |
list_directory(path) |
List files and folders with sizes |
search_files(pattern, directory) |
Search for files matching a glob pattern |
search_in_files(query, directory, extension) |
Search for text content across files |
Shell
| Tool | Description |
|---|---|
run_command(command, timeout) |
Execute a shell command (with safety checks, max 120s) |
Blocked patterns include rm -rf /, format c:, fork bombs, dd if=/dev/zero, mkfs., and writes to /dev/sda.
Web
| Tool | Description |
|---|---|
web_search(query, max_results) |
Search the web via DuckDuckGo (no API key needed) |
fetch_url(url) |
Fetch and extract text content from a URL (HTML parsed, scripts/nav stripped) |
Code
| Tool | Description |
|---|---|
run_python(code, timeout) |
Execute Python code in a subprocess (max 60s) |
Memory
| Tool | Description |
|---|---|
remember_this(text) |
Save a fact or instruction to long-term memory (MEMORY.md) |
search_memory(query) |
Semantic search through saved memories |
log_note(text) |
Add a timestamped note to today's daily log |
Scheduling
| Tool | Description |
|---|---|
schedule_task(name, prompt, schedule) |
Schedule a recurring (cron) or one-time (once:YYYY-MM-DDTHH:MM) task |
list_tasks() |
List all scheduled tasks with status |
Media
| Tool | Description |
|---|---|
analyze_image(path, question) |
Analyze an image using the current vision-capable LLM |
Data
| Tool | Description |
|---|---|
read_csv(path, max_rows) |
Read a CSV file and display as a formatted table |
read_json(path) |
Read and pretty-print a JSON file |
query_json(path, json_path) |
Extract data from JSON using dot-notation (data.users.0.name) |
Content
| Tool | Description |
|---|---|
write_press_releases(topic, company_name, ...) |
Full autonomous PR pipeline: generates headlines, writes 2 press releases with JSON-LD schemas, saves .txt + .docx files |
Delivery
| Tool | Description |
|---|---|
email_file(file_path, to, subject) |
Email a file as an attachment via Gmail SMTP. Auto-converts .txt to .docx before sending |
Meta
| Tool | Description |
|---|---|
build_tool(name, description, code) |
Create a new tool module at runtime (see below) |
build_skill(name, description, steps) |
Create a new multi-step skill at runtime (see below) |
Meta-Tools: Runtime Tool and Skill Creation
One of CheddahBot's distinctive features is that the agent can extend its own capabilities at runtime by writing new tools and skills.
build_tool
The build_tool meta-tool allows the agent to create a new tool by writing Python code with the @tool decorator. The code is saved as a new module in the cheddahbot/tools/ directory and hot-loaded immediately -- no restart required.
Example: if you ask "create a tool that counts words in a file", the agent will:
- Write a Python function with the
@tooldecorator. - Save it to
cheddahbot/tools/word_counter.py. - Import and register it at runtime.
- The new tool is immediately available for use.
The generated module includes the necessary imports automatically. Tool names must be valid Python identifiers and cannot overwrite existing modules.
build_skill
The build_skill meta-tool creates multi-step skills -- higher-level operations that combine multiple actions. Skills are saved to the skills/ directory and loaded via the skill registry.
Skills use the @skill decorator from the skills module and can orchestrate complex workflows.
Scheduler and Heartbeat
Scheduled Tasks
The scheduler runs as a background thread that polls the database for due tasks every 60 seconds (configurable via scheduler.poll_interval_seconds).
Tasks can be created by the agent using the schedule_task tool:
- Cron schedule -- Standard cron expressions (e.g.,
0 9 * * *for daily at 9 AM). The next run time is calculated after each execution. - One-time -- Use the format
once:YYYY-MM-DDTHH:MM. The task is automatically disabled after it runs.
When a task fires, its prompt is sent to the agent via respond_to_prompt, and the result is logged to the task_run_logs table.
Heartbeat
The heartbeat is a separate background thread that runs on a configurable interval (default: every 30 minutes). On each cycle, it:
- Reads
identity/HEARTBEAT.md-- a checklist of things to proactively check. - Sends the checklist to the agent as a prompt.
- If the agent determines nothing needs attention, it responds with
HEARTBEAT_OKand no action is taken. - If the agent takes action, the result is logged to the daily memory log.
The default heartbeat checklist includes checking for failed scheduled tasks, reviewing pending reminders, and checking disk space. You can customize HEARTBEAT.md with any proactive checks you want.
Voice Chat
CheddahBot supports a full voice conversation loop: speak, get a spoken response.
Speech-to-Text (STT)
Audio input is transcribed using Whisper. The system tries local Whisper first (if the whisper package is installed), then falls back to the OpenAI Whisper API.
Audio can be provided in two ways:
- Microphone input in the main chat -- audio files are automatically detected and transcribed, with the transcript appended to the message.
- Voice Chat accordion -- a dedicated record-and-respond mode.
Supported audio formats: WAV, MP3, OGG, WebM, M4A.
Text-to-Speech (TTS)
Responses are spoken using edge-tts, which is free and requires no API key. The default voice is en-US-AriaNeural. TTS output is saved to data/generated/voice_response.mp3 and played back automatically in the Voice Chat panel.
Install edge-tts:
pip install edge-tts
Video Frame Extraction
The media module also supports extracting key frames from video files using ffmpeg (used internally for video analysis workflows). Requires ffmpeg and ffprobe in your PATH.
Identity System
CheddahBot's identity is defined by three Markdown files in the identity/ directory.
SOUL.md
Defines the agent's personality, boundaries, and behavioral quirks. This is injected at the top of every system prompt.
Default personality traits:
- Direct and no-nonsense but warm
- Uses humor when appropriate
- Proactive -- suggests things before being asked
- Remembers and references past conversations naturally
Edit this file to customize the agent's personality to your liking.
USER.md
Your user profile. Contains your name, how you want to be addressed, your technical level, primary language, current projects, communication preferences, and anything else you want the agent to know about you.
Fill this in after installation -- the more context you provide, the more personalized the agent's responses will be.
HEARTBEAT.md
A checklist of proactive tasks for the heartbeat system. Each item is something the agent should check on periodically. See Scheduler and Heartbeat.
Known Issues and Limitations
Claude Code CLI System Prompt
The Claude Code CLI (claude -p) is designed as a coding assistant and applies its own internal system prompt. Custom system prompts passed via --system-prompt are appended but do not override the built-in behavior. This means:
- The
SOUL.mdpersonality may not be followed reliably. - Tool-use instructions may be ignored or overridden.
- The agent may behave more like a coding assistant than a personal assistant.
Workaround: Use Claude models through OpenRouter instead of the CLI. OpenRouter provides standard API access to Claude with full system prompt control.
Claude Code CLI Does Not Support Streaming
The Claude Code CLI integration uses subprocess.Popen with communicate(), which means the entire response is collected before being displayed. There is no token-by-token streaming for Claude CLI responses. OpenRouter, Ollama, and LM Studio all support true streaming.
Claude Code CLI Tool Calling
Tool calling is not supported through the Claude Code CLI path. The --tools "" flag is passed to disable Claude Code's built-in tools, and CheddahBot's own tools are described in the system prompt rather than via function-calling schema. This makes tool use unreliable with the CLI backend. Again, OpenRouter is the recommended provider for full tool support.
Embedding Model Download
The first time the memory system initializes, it downloads the all-MiniLM-L6-v2 sentence-transformer model (approximately 80 MB). This requires an internet connection and may take a moment. Subsequent starts use the cached model.
If sentence-transformers is not installed, the memory system falls back to keyword-based search. Semantic search will not be available but everything else works.
Shell Command Safety
The shell tool blocks a set of known dangerous command patterns, but it is not a full sandbox. Commands run with the same permissions as the CheddahBot process. Exercise caution with the run_command tool, especially on production machines.
Conversation Context Window
The system keeps the most recent 50 messages (configurable via memory.max_context_messages) in the LLM context window. Older messages are summarized and moved to the daily log when the count exceeds flush_threshold (default 40). Very long conversations may lose fine-grained detail from earlier messages.
Single Conversation at a Time
The agent maintains one active conversation at a time in memory. You can start a new chat (which creates a new conversation in the database) and browse past conversations in the history panel, but there is no multi-user or multi-session support.
Local Model Limitations
Ollama and LM Studio models vary widely in their ability to follow tool-calling schemas. Smaller models may not reliably use tools. For best results with local models, use models that are known to support function calling (e.g., Llama 3.1+ instruct variants).