CheddahBot/docs/ARCHITECTURE.md

26 KiB

CheddahBot Architecture

System Overview

CheddahBot is a personal AI assistant built in Python. It exposes a Gradio-based web UI, routes user messages through an agent loop backed by a model-agnostic LLM adapter, persists conversations in SQLite, maintains a 4-layer memory system with optional semantic search, and provides an extensible tool registry that the LLM can invoke mid-conversation. A background scheduler handles cron-based tasks and periodic heartbeat checks.

Data Flow Diagram

User (browser)
     |
     v
+-----------+      +------------+      +--------------+
| Gradio UI | ---> |   Agent    | ---> |  LLM Adapter |
|  (ui.py)  |      | (agent.py) |      |   (llm.py)   |
+-----------+      +-----+------+      +------+-------+
                         |                     |
            +------------+-------+     +-------+--------+
            |            |       |     | Claude CLI     |
            v            v       v     | OpenRouter     |
      +---------+  +---------+ +---+  | Ollama         |
      | Router  |  | Tools   | | DB|  | LM Studio      |
      |(router) |  |(tools/) | |(db|  +----------------+
      +----+----+  +----+----+ +---+
           |            |
   +-------+--+    +----+----+
   | Identity |    | Memory  |
   | SOUL.md  |    | System  |
   | USER.md  |    |(memory) |
   +----------+    +---------+
  1. The user submits text (or voice / files) through the Gradio interface.
  2. ui.py hands the message to Agent.respond().
  3. The agent stores the user message in SQLite, builds a system prompt via router.py (loading identity files and memory context), and formats the conversation history.
  4. The agent sends messages to LLMAdapter.chat() which dispatches to the correct provider backend.
  5. The LLM response streams back. If it contains tool-call requests, the agent executes them through ToolRegistry.execute(), appends the results, and loops back to step 4 (up to 10 iterations).
  6. The final assistant response is stored in the database and streamed to the UI.
  7. After responding, the agent checks whether the conversation has exceeded the flush threshold; if so, the memory system summarizes older messages into the daily log.

Module-by-Module Breakdown

__main__.py -- Entry Point

File: cheddahbot/__main__.py

Orchestrates startup in this order:

  1. load_config() -- loads configuration from env vars / YAML / defaults.
  2. Database(config.db_path) -- opens (or creates) the SQLite database.
  3. LLMAdapter(...) -- initializes the model-agnostic LLM client.
  4. Agent(config, db, llm) -- creates the core agent.
  5. MemorySystem(config, db) -- initializes the memory system and injects it into the agent via agent.set_memory().
  6. ToolRegistry(config, db, agent) -- auto-discovers and loads all tool modules, then injects via agent.set_tools().
  7. Scheduler(config, db, agent) -- starts two daemon threads (task poller and heartbeat).
  8. create_ui(agent, config, llm) -- builds the Gradio Blocks app and launches it on the configured host/port.

Each subsystem (memory, tools, scheduler) is wrapped in a try/except so the application degrades gracefully if optional dependencies are missing.


config.py -- Configuration

File: cheddahbot/config.py

Defines four dataclasses:

Dataclass Key Fields
Config default_model, host, port, ollama_url, lmstudio_url, openrouter_api_key, plus derived paths (root_dir, data_dir, identity_dir, memory_dir, skills_dir, db_path)
MemoryConfig max_context_messages (50), flush_threshold (40), embedding_model ("all-MiniLM-L6-v2"), search_top_k (5)
SchedulerConfig heartbeat_interval_minutes (30), poll_interval_seconds (60)
ShellConfig blocked_commands, require_approval (False)

load_config() applies three layers of configuration in priority order:

  1. Dataclass defaults (lowest priority).
  2. config.yaml at the project root (middle priority).
  3. Environment variables with the CHEDDAH_ prefix, plus OPENROUTER_API_KEY (highest priority).

The function also ensures required data directories exist on disk.


db.py -- Database Layer

File: cheddahbot/db.py

A thin wrapper around SQLite using thread-local connections (one connection per thread), WAL journal mode, and foreign keys.

Key methods:

  • create_conversation(conv_id, title) -- insert a new conversation row.
  • list_conversations(limit) -- return recent conversations ordered by updated_at.
  • add_message(conv_id, role, content, ...) -- insert a message and touch the conversation's updated_at.
  • get_messages(conv_id, limit) -- return messages in chronological order.
  • count_messages(conv_id) -- count messages for flush-threshold checks.
  • add_scheduled_task(name, prompt, schedule) -- persist a scheduled task.
  • get_due_tasks() -- return tasks whose next_run is in the past or NULL.
  • update_task_next_run(task_id, next_run) -- update the next execution time.
  • log_task_run(task_id, result, error) -- record the outcome of a task run.
  • kv_set(key, value) / kv_get(key) -- generic key-value store.

agent.py -- Core Agent Loop

File: cheddahbot/agent.py

Contains the Agent class, the central coordinator.

Key members:

  • conv_id -- current conversation ID (a 12-character hex string).
  • _memory -- optional MemorySystem reference.
  • _tools -- optional ToolRegistry reference.

Primary method: respond(user_input, files)

This is a Python generator that yields text chunks for streaming. The detailed flow is described in the next section.

Helper: respond_to_prompt(prompt)

Non-streaming wrapper that collects all chunks and returns a single string. Used by the scheduler and heartbeat for internal prompts.


router.py -- System Prompt Builder

File: cheddahbot/router.py

Two functions:

  1. build_system_prompt(identity_dir, memory_context, tools_description) -- assembles the full system prompt by concatenating these sections separated by horizontal rules:

    • Contents of identity/SOUL.md
    • Contents of identity/USER.md
    • Memory context string (from the memory system)
    • Tools description listing (from the tool registry)
    • A fixed "Instructions" section with core behavioral directives.
  2. format_messages_for_llm(system_prompt, history, max_messages) -- converts raw database rows into the [{role, content}] format expected by the LLM. The system prompt becomes the first message. Tool results are converted to user messages prefixed with [Tool Result]. History is trimmed to the most recent max_messages entries.


llm.py -- LLM Adapter

File: cheddahbot/llm.py

Described in detail in a dedicated section below.


memory.py -- Memory System

File: cheddahbot/memory.py

Described in detail in a dedicated section below.


media.py -- Audio/Video Processing

File: cheddahbot/media.py

Three utility functions:

  • transcribe_audio(path) -- Speech-to-text. Tries local Whisper first, then falls back to the OpenAI Whisper API.
  • text_to_speech(text, output_path, voice) -- Text-to-speech via edge-tts (free, no API key). Defaults to the en-US-AriaNeural voice.
  • extract_video_frames(video_path, max_frames) -- Extracts key frames from video using ffprobe (to get duration) and ffmpeg (to extract JPEG frames).

scheduler.py -- Scheduler and Heartbeat

File: cheddahbot/scheduler.py

Described in detail in a dedicated section below.


ui.py -- Gradio Web Interface

File: cheddahbot/ui.py

Builds a Gradio Blocks application with:

  • A model dropdown (populated from llm.list_available_models()) with a refresh button and a "New Chat" button.
  • A gr.Chatbot widget for the conversation (500px height, copy buttons).
  • A gr.MultimodalTextbox supporting text, file upload, and microphone input.
  • A "Voice Chat" accordion for record-and-respond audio interaction.
  • A "Conversation History" accordion showing past conversations from the database.
  • A "Settings" accordion with guidance on editing identity and config files.

Event wiring:

  • Model dropdown change calls llm.switch_model().
  • Refresh button re-discovers local models.
  • Message submit calls agent.respond() in streaming mode, updating the chatbot widget with each chunk.
  • Audio files attached to messages are transcribed via media.transcribe_audio() before being sent to the agent.
  • Voice Chat records audio, transcribes it, gets a text response from the agent, converts it to speech via media.text_to_speech(), and plays it back.

tools/__init__.py -- Tool Registry

File: cheddahbot/tools/__init__.py

Described in detail in a dedicated section below.


skills/__init__.py -- Skill Registry

File: cheddahbot/skills/__init__.py

Defines a parallel registry for "skills" (multi-step operations). Key pieces:

  • SkillDef -- dataclass holding name, description, func.
  • @skill(name, description) -- decorator that registers a skill in the global _SKILLS dict.
  • load_skill(path) -- dynamically loads a .py file as a module (triggering any @skill decorators inside it).
  • discover_skills(skills_dir) -- loads all .py files from the skills directory.
  • list_skills() / run_skill(name, **kwargs) -- query and execute skills.

providers/__init__.py -- Provider Extensions

File: cheddahbot/providers/__init__.py

Reserved for future custom provider implementations. Currently empty.


The Agent Loop in Detail

When Agent.respond(user_input) is called, the following sequence occurs:

1. ensure_conversation()
   |-- Creates a new conversation in the DB if one doesn't exist
   |
2. db.add_message(conv_id, "user", user_input)
   |-- Persists the user's message
   |
3. Build system prompt
   |-- memory.get_context(user_input)  --> memory context string
   |-- tools.get_tools_schema()        --> OpenAI-format JSON schemas
   |-- tools.get_tools_description()   --> human-readable tool list
   |-- router.build_system_prompt(identity_dir, memory_context, tools_description)
   |
4. Load conversation history from DB
   |-- db.get_messages(conv_id, limit=max_context_messages)
   |-- router.format_messages_for_llm(system_prompt, history, max_messages)
   |
5. AGENT LOOP (up to MAX_TOOL_ITERATIONS = 10):
   |
   |-- llm.chat(messages, tools=tools_schema, stream=True)
   |     |-- Yields {"type":"text","content":"..."} chunks --> streamed to user
   |     |-- Yields {"type":"tool_use","name":"...","input":{...}} chunks
   |
   |-- If no tool_calls: store assistant message, BREAK
   |
   |-- If tool_calls present:
   |     |-- Store assistant message with tool_calls metadata
   |     |-- For each tool call:
   |     |     |-- yield "Using tool: <name>" indicator
   |     |     |-- tools.execute(name, input) --> result string
   |     |     |-- yield tool result (truncated to 2000 chars)
   |     |     |-- db.add_message(conv_id, "tool", result)
   |     |     |-- Append result to messages as user message
   |     |-- Continue loop (LLM sees tool results and can respond or call more tools)
   |
6. After loop: check if memory flush is needed
   |-- If message count > flush_threshold:
   |     |-- memory.auto_flush(conv_id)

The loop allows the LLM to chain up to 10 consecutive tool calls before being cut off. Each tool result is injected back into the conversation as a user message so the LLM can reason about it in the next iteration.


LLM Adapter Design

File: cheddahbot/llm.py

Provider Routing

The LLMAdapter supports four provider paths. The active provider is determined by examining the current model ID:

Model ID Pattern Provider Backend
claude-* claude Claude Code CLI (subprocess)
local/ollama/<model> ollama Ollama HTTP API (OpenAI-compat)
local/lmstudio/<model> lmstudio LM Studio HTTP API (OpenAI-compat)
Anything else openrouter OpenRouter API (OpenAI-compat)

The chat() Method

This is the single entry point. It accepts a list of messages, an optional tools schema, and a stream flag. It returns a generator yielding dictionaries:

  • {"type": "text", "content": "..."} -- a text chunk to display.
  • {"type": "tool_use", "id": "...", "name": "...", "input": {...}} -- a tool invocation request.

Claude Code CLI Path (_chat_claude_sdk)

For Claude models, CheddahBot shells out to the claude CLI binary (the Claude Code SDK):

  1. Separates system prompt, conversation history, and the latest user message from the messages list.
  2. Builds a full system prompt by appending conversation history under a "Conversation So Far" heading.
  3. Invokes claude -p <prompt> --model <model> --output-format json --system-prompt <system>.
  4. The CLAUDECODE environment variable is stripped from the subprocess environment to avoid nested-session errors.
  5. Parses the JSON output and yields the result field as a text chunk.
  6. On Windows, shell=True is used for compatibility with npm-installed binaries.

OpenAI-Compatible Path (_chat_openai_sdk)

For OpenRouter, Ollama, and LM Studio, the adapter uses the openai Python SDK:

  1. _resolve_endpoint(provider) returns the base URL and API key:
    • OpenRouter: https://openrouter.ai/api/v1 with the configured API key.
    • Ollama: http://localhost:11434/v1 with dummy key "ollama".
    • LM Studio: http://localhost:1234/v1 with dummy key "lm-studio".
  2. _resolve_model_id(provider) strips the local/ollama/ or local/lmstudio/ prefix from the model ID.
  3. Creates an openai.OpenAI client with the resolved base URL and API key.
  4. In streaming mode: iterates over client.chat.completions.create(stream=True), accumulates tool call arguments across chunks (indexed by tc.index), yields text deltas immediately, and yields completed tool calls at the end of the stream.
  5. In non-streaming mode: makes a single call and yields text and tool calls from the response.

Model Discovery

  • discover_local_models() -- probes the Ollama tags endpoint and LM Studio models endpoint (3-second timeout each) and returns ModelInfo objects.
  • list_available_models() -- returns a combined list of hardcoded Claude models, hardcoded OpenRouter models (if an API key is configured), and dynamically discovered local models.

Model Switching

switch_model(model_id) updates current_model. The provider property re-evaluates on every access, so switching models also implicitly switches providers.


Memory System

File: cheddahbot/memory.py

The 4 Layers

Layer 1: Identity      -- identity/SOUL.md, identity/USER.md
                          (loaded by router.py into the system prompt)

Layer 2: Long-term     -- memory/MEMORY.md
                          (persisted facts and instructions, appended over time)

Layer 3: Daily logs    -- memory/YYYY-MM-DD.md
                          (timestamped entries per day, including auto-flush summaries)

Layer 4: Semantic      -- memory/embeddings.db
                          (SQLite with vector embeddings for similarity search)

How Memory Context is Built

MemorySystem.get_context(query) is called once per agent turn. It assembles a string from:

  1. Long-term memory -- the last 2000 characters of MEMORY.md.
  2. Today's log -- the last 1500 characters of today's date file.
  3. Semantic search results -- the top-k most similar entries to the user's query, formatted as a bulleted list.

This string is injected into the system prompt by router.py under the heading "Relevant Memory".

  • The embedding model is all-MiniLM-L6-v2 from sentence-transformers (lazy loaded, thread-safe via a lock).
  • _index_text(text, doc_id) -- encodes the text into a vector and stores it in memory/embeddings.db (table: embeddings with columns id TEXT, text TEXT, vector BLOB).
  • search(query, top_k) -- encodes the query, loads all vectors from the database, computes cosine similarity against each one, sorts by score, and returns the top-k results.
  • If sentence-transformers is not installed, _fallback_search() performs simple case-insensitive substring matching across all .md files in the memory directory.

Writing to Memory

  • remember(text) -- appends a timestamped entry to memory/MEMORY.md and indexes it for semantic search. Exposed to the LLM via the remember_this tool.
  • log_daily(text) -- appends a timestamped entry to today's daily log file and indexes it. Exposed via the log_note tool.

Auto-Flush

When Agent.respond() finishes, it checks db.count_messages(conv_id). If the count exceeds config.memory.flush_threshold (default 40):

  1. auto_flush(conv_id) loads up to 200 messages.
  2. All but the last 10 are selected for summarization.
  3. A summary string is built from the selected messages (truncated to 1000 chars).
  4. The summary is appended to the daily log via log_daily().

This prevents conversations from growing unbounded while preserving context in the daily log for future semantic search.

Reindexing

reindex_all() clears all embeddings and re-indexes every line (longer than 10 characters) from every .md file in the memory directory. This can be called to rebuild the search index from scratch.


Tool System

File: cheddahbot/tools/__init__.py (registry) and cheddahbot/tools/*.py (tool modules)

The @tool Decorator

from cheddahbot.tools import tool

@tool("my_tool_name", "Description of what this tool does", category="general")
def my_tool_name(param1: str, param2: int = 10) -> str:
    return f"Result: {param1}, {param2}"

The decorator:

  1. Creates a ToolDef object containing the function, name, description, category, and auto-extracted parameter schema.
  2. Registers it in the global _TOOLS dictionary keyed by name.
  3. Attaches the ToolDef as func._tool_def on the original function.

Parameter Schema Generation

_extract_params(func) inspects the function signature using inspect:

  • Skips parameters named self or ctx.
  • Maps type annotations to JSON Schema types: str -> "string", int -> "integer", float -> "number", bool -> "boolean", list -> "array". Unannotated parameters default to "string".
  • Parameters without defaults are marked as required.

Schema Output

ToolDef.to_openai_schema() returns the tool definition in OpenAI function-calling format:

{
  "type": "function",
  "function": {
    "name": "tool_name",
    "description": "...",
    "parameters": {
      "type": "object",
      "properties": { ... },
      "required": [ ... ]
    }
  }
}

Auto-Discovery

When ToolRegistry.__init__() is called, _discover_tools() uses pkgutil.iter_modules to find every .py file in cheddahbot/tools/ (skipping files starting with _). Each module is imported via importlib.import_module, which triggers the @tool decorators and populates the global registry.

Tool Execution

ToolRegistry.execute(name, args):

  1. Looks up the ToolDef in the global _TOOLS dict.
  2. Inspects the function signature for a ctx parameter. If present, injects a context dictionary containing config, db, agent, and memory.
  3. Calls the function with the provided arguments.
  4. Returns the result as a string (or "Done." if the function returns None).
  5. Catches all exceptions and returns "Tool error: ...".

Meta-Tools

Two special tools enable runtime extensibility:

build_tool (in cheddahbot/tools/build_tool.py):

  • Accepts name, description, and code (Python source using the @tool decorator).
  • Writes a new .py file into cheddahbot/tools/.
  • Hot-imports the module via importlib.import_module, which triggers the @tool decorator and registers the new tool immediately.
  • If the import fails, the file is deleted.

build_skill (in cheddahbot/tools/build_skill.py):

  • Accepts name, description, and steps (Python source using the @skill decorator).
  • Writes a new .py file into the configured skills/ directory.
  • Calls skills.load_skill() to dynamically import it.

Scheduler and Heartbeat Design

File: cheddahbot/scheduler.py

The Scheduler class starts two daemon threads at application boot.

Task Poller Thread

  • Runs in _poll_loop(), sleeping for poll_interval_seconds (default 60) between iterations.
  • Each iteration calls _run_due_tasks():
    1. Queries db.get_due_tasks() for tasks where next_run is NULL or in the past.
    2. For each due task, calls agent.respond_to_prompt(task["prompt"]) to generate a response.
    3. Logs the result via db.log_task_run().
    4. If the schedule is "once:<datetime>", the task is disabled.
    5. Otherwise, the schedule is treated as a cron expression: croniter is used to calculate the next run time, which is saved via db.update_task_next_run().

Heartbeat Thread

  • Runs in _heartbeat_loop(), sleeping for heartbeat_interval_minutes (default 30) between iterations.
  • Waits 60 seconds before the first heartbeat to let the system initialize.
  • Each iteration calls _run_heartbeat():
    1. Reads identity/HEARTBEAT.md.
    2. Sends the checklist to the agent as a prompt: "HEARTBEAT CHECK. Review this checklist and take action if needed."
    3. If the response contains "HEARTBEAT_OK", no action is logged.
    4. Otherwise, the response is logged to the daily log via memory.log_daily().

Thread Safety

Both threads are daemon threads (they die when the main process exits). The _stop_event threading event can be set to gracefully shut down both loops. The database layer uses thread-local connections, so concurrent access from the scheduler threads and the Gradio request threads is safe.


Database Schema

The SQLite database (data/cheddahbot.db) contains five tables:

conversations

Column Type Notes
id TEXT Primary key (hex)
title TEXT Display title
created_at TEXT ISO 8601 UTC
updated_at TEXT ISO 8601 UTC

messages

Column Type Notes
id INTEGER Autoincrement primary key
conv_id TEXT Foreign key to conversations.id
role TEXT "user", "assistant", or "tool"
content TEXT Message body
tool_calls TEXT JSON array of {name, input} (nullable)
tool_result TEXT Name of the tool that produced this result (nullable)
model TEXT Model ID used for this response (nullable)
created_at TEXT ISO 8601 UTC

Index: idx_messages_conv on (conv_id, created_at).

scheduled_tasks

Column Type Notes
id INTEGER Autoincrement primary key
name TEXT Human-readable task name
prompt TEXT The prompt to send to the agent
schedule TEXT Cron expression or "once:<datetime>"
enabled INTEGER 1 = active, 0 = disabled
next_run TEXT ISO 8601 UTC (nullable)
created_at TEXT ISO 8601 UTC

task_run_logs

Column Type Notes
id INTEGER Autoincrement primary key
task_id INTEGER Foreign key to scheduled_tasks.id
started_at TEXT ISO 8601 UTC
finished_at TEXT ISO 8601 UTC (nullable)
result TEXT Agent response (nullable)
error TEXT Error message if failed (nullable)

kv_store

Column Type Notes
key TEXT Primary key
value TEXT Arbitrary value

Embeddings Database

A separate SQLite file at memory/embeddings.db holds one table:

embeddings

Column Type Notes
id TEXT Primary key (e.g. "daily:2026-02-14:08:30")
text TEXT The original text that was embedded
vector BLOB Raw float32 bytes of the embedding vector

Identity Files

Three Markdown files in the identity/ directory define the agent's personality, user context, and background behavior.

identity/SOUL.md

Defines the agent's personality, communication style, boundaries, and quirks. This is loaded first into the system prompt, making it the most prominent identity influence on every response.

Contents are read by router.build_system_prompt() at the beginning of each agent turn.

identity/USER.md

Contains a user profile template: name, technical level, primary language, current projects, and communication preferences. The user edits this file to customize how the agent addresses them and what context it assumes.

Loaded by router.build_system_prompt() immediately after SOUL.md.

identity/HEARTBEAT.md

A checklist of items to review on each heartbeat cycle. The scheduler reads this file and sends it to the agent as a prompt every heartbeat_interval_minutes (default 30 minutes). The agent processes the checklist and either confirms "HEARTBEAT_OK" or takes action and logs it.

Loading Order in the System Prompt

The system prompt assembled by router.build_system_prompt() concatenates these sections, separated by \n\n---\n\n:

  1. SOUL.md contents
  2. USER.md contents
  3. Memory context (long-term + daily log + semantic search results)
  4. Tools description (categorized list of available tools)
  5. Core instructions (hardcoded behavioral directives)