CheddahBot/docs/ARCHITECTURE.md

722 lines
26 KiB
Markdown

# CheddahBot Architecture
## System Overview
CheddahBot is a personal AI assistant built in Python. It exposes a Gradio-based
web UI, routes user messages through an agent loop backed by a model-agnostic LLM
adapter, persists conversations in SQLite, maintains a 4-layer memory system with
optional semantic search, and provides an extensible tool registry that the LLM
can invoke mid-conversation. A background scheduler handles cron-based tasks and
periodic heartbeat checks.
### Data Flow Diagram
```
User (browser)
|
v
+-----------+ +------------+ +--------------+
| Gradio UI | ---> | Agent | ---> | LLM Adapter |
| (ui.py) | | (agent.py) | | (llm.py) |
+-----------+ +-----+------+ +------+-------+
| |
+------------+-------+ +-------+--------+
| | | | Claude CLI |
v v v | OpenRouter |
+---------+ +---------+ +---+ | Ollama |
| Router | | Tools | | DB| | LM Studio |
|(router) | |(tools/) | |(db| +----------------+
+----+----+ +----+----+ +---+
| |
+-------+--+ +----+----+
| Identity | | Memory |
| SOUL.md | | System |
| USER.md | |(memory) |
+----------+ +---------+
```
1. The user submits text (or voice / files) through the Gradio interface.
2. `ui.py` hands the message to `Agent.respond()`.
3. The agent stores the user message in SQLite, builds a system prompt via
`router.py` (loading identity files and memory context), and formats the
conversation history.
4. The agent sends messages to `LLMAdapter.chat()` which dispatches to the
correct provider backend.
5. The LLM response streams back. If it contains tool-call requests, the agent
executes them through `ToolRegistry.execute()`, appends the results, and loops
back to step 4 (up to 10 iterations).
6. The final assistant response is stored in the database and streamed to the UI.
7. After responding, the agent checks whether the conversation has exceeded the
flush threshold; if so, the memory system summarizes older messages into the
daily log.
---
## Module-by-Module Breakdown
### `__main__.py` -- Entry Point
**File:** `cheddahbot/__main__.py`
Orchestrates startup in this order:
1. `load_config()` -- loads configuration from env vars / YAML / defaults.
2. `Database(config.db_path)` -- opens (or creates) the SQLite database.
3. `LLMAdapter(...)` -- initializes the model-agnostic LLM client.
4. `Agent(config, db, llm)` -- creates the core agent.
5. `MemorySystem(config, db)` -- initializes the memory system and injects it
into the agent via `agent.set_memory()`.
6. `ToolRegistry(config, db, agent)` -- auto-discovers and loads all tool
modules, then injects via `agent.set_tools()`.
7. `Scheduler(config, db, agent)` -- starts two daemon threads (task poller and
heartbeat).
8. `create_ui(agent, config, llm)` -- builds the Gradio Blocks app and launches
it on the configured host/port.
Each subsystem (memory, tools, scheduler) is wrapped in a try/except so the
application degrades gracefully if optional dependencies are missing.
---
### `config.py` -- Configuration
**File:** `cheddahbot/config.py`
Defines four dataclasses:
| Dataclass | Key Fields |
|------------------|---------------------------------------------------------------|
| `Config` | `default_model`, `host`, `port`, `ollama_url`, `lmstudio_url`, `openrouter_api_key`, plus derived paths (`root_dir`, `data_dir`, `identity_dir`, `memory_dir`, `skills_dir`, `db_path`) |
| `MemoryConfig` | `max_context_messages` (50), `flush_threshold` (40), `embedding_model` ("all-MiniLM-L6-v2"), `search_top_k` (5) |
| `SchedulerConfig` | `heartbeat_interval_minutes` (30), `poll_interval_seconds` (60) |
| `ShellConfig` | `blocked_commands`, `require_approval` (False) |
`load_config()` applies three layers of configuration in priority order:
1. Dataclass defaults (lowest priority).
2. `config.yaml` at the project root (middle priority).
3. Environment variables with the `CHEDDAH_` prefix, plus `OPENROUTER_API_KEY`
(highest priority).
The function also ensures required data directories exist on disk.
---
### `db.py` -- Database Layer
**File:** `cheddahbot/db.py`
A thin wrapper around SQLite using thread-local connections (one connection per
thread), WAL journal mode, and foreign keys.
**Key methods:**
- `create_conversation(conv_id, title)` -- insert a new conversation row.
- `list_conversations(limit)` -- return recent conversations ordered by
`updated_at`.
- `add_message(conv_id, role, content, ...)` -- insert a message and touch the
conversation's `updated_at`.
- `get_messages(conv_id, limit)` -- return messages in chronological order.
- `count_messages(conv_id)` -- count messages for flush-threshold checks.
- `add_scheduled_task(name, prompt, schedule)` -- persist a scheduled task.
- `get_due_tasks()` -- return tasks whose `next_run` is in the past or NULL.
- `update_task_next_run(task_id, next_run)` -- update the next execution time.
- `log_task_run(task_id, result, error)` -- record the outcome of a task run.
- `kv_set(key, value)` / `kv_get(key)` -- generic key-value store.
---
### `agent.py` -- Core Agent Loop
**File:** `cheddahbot/agent.py`
Contains the `Agent` class, the central coordinator.
**Key members:**
- `conv_id` -- current conversation ID (a 12-character hex string).
- `_memory` -- optional `MemorySystem` reference.
- `_tools` -- optional `ToolRegistry` reference.
**Primary method: `respond(user_input, files)`**
This is a Python generator that yields text chunks for streaming. The detailed
flow is described in the next section.
**Helper: `respond_to_prompt(prompt)`**
Non-streaming wrapper that collects all chunks and returns a single string. Used
by the scheduler and heartbeat for internal prompts.
---
### `router.py` -- System Prompt Builder
**File:** `cheddahbot/router.py`
Two functions:
1. `build_system_prompt(identity_dir, memory_context, tools_description)` --
assembles the full system prompt by concatenating these sections separated by
horizontal rules:
- Contents of `identity/SOUL.md`
- Contents of `identity/USER.md`
- Memory context string (from the memory system)
- Tools description listing (from the tool registry)
- A fixed "Instructions" section with core behavioral directives.
2. `format_messages_for_llm(system_prompt, history, max_messages)` --
converts raw database rows into the `[{role, content}]` format expected by
the LLM. The system prompt becomes the first message. Tool results are
converted to user messages prefixed with `[Tool Result]`. History is trimmed
to the most recent `max_messages` entries.
---
### `llm.py` -- LLM Adapter
**File:** `cheddahbot/llm.py`
Described in detail in a dedicated section below.
---
### `memory.py` -- Memory System
**File:** `cheddahbot/memory.py`
Described in detail in a dedicated section below.
---
### `media.py` -- Audio/Video Processing
**File:** `cheddahbot/media.py`
Three utility functions:
- `transcribe_audio(path)` -- Speech-to-text. Tries local Whisper first, then
falls back to the OpenAI Whisper API.
- `text_to_speech(text, output_path, voice)` -- Text-to-speech via `edge-tts`
(free, no API key). Defaults to the `en-US-AriaNeural` voice.
- `extract_video_frames(video_path, max_frames)` -- Extracts key frames from
video using `ffprobe` (to get duration) and `ffmpeg` (to extract JPEG frames).
---
### `scheduler.py` -- Scheduler and Heartbeat
**File:** `cheddahbot/scheduler.py`
Described in detail in a dedicated section below.
---
### `ui.py` -- Gradio Web Interface
**File:** `cheddahbot/ui.py`
Builds a Gradio Blocks application with:
- A model dropdown (populated from `llm.list_available_models()`) with a refresh
button and a "New Chat" button.
- A `gr.Chatbot` widget for the conversation (500px height, copy buttons).
- A `gr.MultimodalTextbox` supporting text, file upload, and microphone input.
- A "Voice Chat" accordion for record-and-respond audio interaction.
- A "Conversation History" accordion showing past conversations from the
database.
- A "Settings" accordion with guidance on editing identity and config files.
**Event wiring:**
- Model dropdown change calls `llm.switch_model()`.
- Refresh button re-discovers local models.
- Message submit calls `agent.respond()` in streaming mode, updating the chatbot
widget with each chunk.
- Audio files attached to messages are transcribed via `media.transcribe_audio()`
before being sent to the agent.
- Voice Chat records audio, transcribes it, gets a text response from the agent,
converts it to speech via `media.text_to_speech()`, and plays it back.
---
### `tools/__init__.py` -- Tool Registry
**File:** `cheddahbot/tools/__init__.py`
Described in detail in a dedicated section below.
---
### `skills/__init__.py` -- Skill Registry
**File:** `cheddahbot/skills/__init__.py`
Defines a parallel registry for "skills" (multi-step operations). Key pieces:
- `SkillDef` -- dataclass holding `name`, `description`, `func`.
- `@skill(name, description)` -- decorator that registers a skill in the global
`_SKILLS` dict.
- `load_skill(path)` -- dynamically loads a `.py` file as a module (triggering
any `@skill` decorators inside it).
- `discover_skills(skills_dir)` -- loads all `.py` files from the skills
directory.
- `list_skills()` / `run_skill(name, **kwargs)` -- query and execute skills.
---
### `providers/__init__.py` -- Provider Extensions
**File:** `cheddahbot/providers/__init__.py`
Reserved for future custom provider implementations. Currently empty.
---
## The Agent Loop in Detail
When `Agent.respond(user_input)` is called, the following sequence occurs:
```
1. ensure_conversation()
|-- Creates a new conversation in the DB if one doesn't exist
|
2. db.add_message(conv_id, "user", user_input)
|-- Persists the user's message
|
3. Build system prompt
|-- memory.get_context(user_input) --> memory context string
|-- tools.get_tools_schema() --> OpenAI-format JSON schemas
|-- tools.get_tools_description() --> human-readable tool list
|-- router.build_system_prompt(identity_dir, memory_context, tools_description)
|
4. Load conversation history from DB
|-- db.get_messages(conv_id, limit=max_context_messages)
|-- router.format_messages_for_llm(system_prompt, history, max_messages)
|
5. AGENT LOOP (up to MAX_TOOL_ITERATIONS = 10):
|
|-- llm.chat(messages, tools=tools_schema, stream=True)
| |-- Yields {"type":"text","content":"..."} chunks --> streamed to user
| |-- Yields {"type":"tool_use","name":"...","input":{...}} chunks
|
|-- If no tool_calls: store assistant message, BREAK
|
|-- If tool_calls present:
| |-- Store assistant message with tool_calls metadata
| |-- For each tool call:
| | |-- yield "Using tool: <name>" indicator
| | |-- tools.execute(name, input) --> result string
| | |-- yield tool result (truncated to 2000 chars)
| | |-- db.add_message(conv_id, "tool", result)
| | |-- Append result to messages as user message
| |-- Continue loop (LLM sees tool results and can respond or call more tools)
|
6. After loop: check if memory flush is needed
|-- If message count > flush_threshold:
| |-- memory.auto_flush(conv_id)
```
The loop allows the LLM to chain up to 10 consecutive tool calls before being
cut off. Each tool result is injected back into the conversation as a user
message so the LLM can reason about it in the next iteration.
---
## LLM Adapter Design
**File:** `cheddahbot/llm.py`
### Provider Routing
The `LLMAdapter` supports four provider paths. The active provider is determined
by examining the current model ID:
| Model ID Pattern | Provider | Backend |
|-----------------------------|---------------|----------------------------------|
| `claude-*` | `claude` | Claude Code CLI (subprocess) |
| `local/ollama/<model>` | `ollama` | Ollama HTTP API (OpenAI-compat) |
| `local/lmstudio/<model>` | `lmstudio` | LM Studio HTTP API (OpenAI-compat) |
| Anything else | `openrouter` | OpenRouter API (OpenAI-compat) |
### The `chat()` Method
This is the single entry point. It accepts a list of messages, an optional tools
schema, and a stream flag. It returns a generator yielding dictionaries:
- `{"type": "text", "content": "..."}` -- a text chunk to display.
- `{"type": "tool_use", "id": "...", "name": "...", "input": {...}}` -- a tool
invocation request.
### Claude Code CLI Path (`_chat_claude_sdk`)
For Claude models, CheddahBot shells out to the `claude` CLI binary (the Claude
Code SDK):
1. Separates system prompt, conversation history, and the latest user message
from the messages list.
2. Builds a full system prompt by appending conversation history under a
"Conversation So Far" heading.
3. Invokes `claude -p <prompt> --model <model> --output-format json --system-prompt <system>`.
4. The `CLAUDECODE` environment variable is stripped from the subprocess
environment to avoid nested-session errors.
5. Parses the JSON output and yields the `result` field as a text chunk.
6. On Windows, `shell=True` is used for compatibility with npm-installed
binaries.
### OpenAI-Compatible Path (`_chat_openai_sdk`)
For OpenRouter, Ollama, and LM Studio, the adapter uses the `openai` Python SDK:
1. `_resolve_endpoint(provider)` returns the base URL and API key:
- OpenRouter: `https://openrouter.ai/api/v1` with the configured API key.
- Ollama: `http://localhost:11434/v1` with dummy key `"ollama"`.
- LM Studio: `http://localhost:1234/v1` with dummy key `"lm-studio"`.
2. `_resolve_model_id(provider)` strips the `local/ollama/` or
`local/lmstudio/` prefix from the model ID.
3. Creates an `openai.OpenAI` client with the resolved base URL and API key.
4. In streaming mode: iterates over `client.chat.completions.create(stream=True)`,
accumulates tool call arguments across chunks (indexed by `tc.index`), yields
text deltas immediately, and yields completed tool calls at the end of the
stream.
5. In non-streaming mode: makes a single call and yields text and tool calls from
the response.
### Model Discovery
- `discover_local_models()` -- probes the Ollama tags endpoint and LM Studio
models endpoint (3-second timeout each) and returns `ModelInfo` objects.
- `list_available_models()` -- returns a combined list of hardcoded Claude
models, hardcoded OpenRouter models (if an API key is configured), and
dynamically discovered local models.
### Model Switching
`switch_model(model_id)` updates `current_model`. The `provider` property
re-evaluates on every access, so switching models also implicitly switches
providers.
---
## Memory System
**File:** `cheddahbot/memory.py`
### The 4 Layers
```
Layer 1: Identity -- identity/SOUL.md, identity/USER.md
(loaded by router.py into the system prompt)
Layer 2: Long-term -- memory/MEMORY.md
(persisted facts and instructions, appended over time)
Layer 3: Daily logs -- memory/YYYY-MM-DD.md
(timestamped entries per day, including auto-flush summaries)
Layer 4: Semantic -- memory/embeddings.db
(SQLite with vector embeddings for similarity search)
```
### How Memory Context is Built
`MemorySystem.get_context(query)` is called once per agent turn. It assembles a
string from:
1. **Long-term memory** -- the last 2000 characters of `MEMORY.md`.
2. **Today's log** -- the last 1500 characters of today's date file.
3. **Semantic search results** -- the top-k most similar entries to the user's
query, formatted as a bulleted list.
This string is injected into the system prompt by `router.py` under the heading
"Relevant Memory".
### Embedding and Search
- The embedding model is `all-MiniLM-L6-v2` from `sentence-transformers` (lazy
loaded, thread-safe via a lock).
- `_index_text(text, doc_id)` -- encodes the text into a vector and stores it in
`memory/embeddings.db` (table: `embeddings` with columns `id TEXT`, `text TEXT`,
`vector BLOB`).
- `search(query, top_k)` -- encodes the query, loads all vectors from the
database, computes cosine similarity against each one, sorts by score, and
returns the top-k results.
- If `sentence-transformers` is not installed, `_fallback_search()` performs
simple case-insensitive substring matching across all `.md` files in the memory
directory.
### Writing to Memory
- `remember(text)` -- appends a timestamped entry to `memory/MEMORY.md` and
indexes it for semantic search. Exposed to the LLM via the `remember_this`
tool.
- `log_daily(text)` -- appends a timestamped entry to today's daily log file and
indexes it. Exposed via the `log_note` tool.
### Auto-Flush
When `Agent.respond()` finishes, it checks `db.count_messages(conv_id)`. If the
count exceeds `config.memory.flush_threshold` (default 40):
1. `auto_flush(conv_id)` loads up to 200 messages.
2. All but the last 10 are selected for summarization.
3. A summary string is built from the selected messages (truncated to 1000
chars).
4. The summary is appended to the daily log via `log_daily()`.
This prevents conversations from growing unbounded while preserving context in
the daily log for future semantic search.
### Reindexing
`reindex_all()` clears all embeddings and re-indexes every line (longer than 10
characters) from every `.md` file in the memory directory. This can be called
to rebuild the search index from scratch.
---
## Tool System
**File:** `cheddahbot/tools/__init__.py` (registry) and `cheddahbot/tools/*.py`
(tool modules)
### The `@tool` Decorator
```python
from cheddahbot.tools import tool
@tool("my_tool_name", "Description of what this tool does", category="general")
def my_tool_name(param1: str, param2: int = 10) -> str:
return f"Result: {param1}, {param2}"
```
The decorator:
1. Creates a `ToolDef` object containing the function, name, description,
category, and auto-extracted parameter schema.
2. Registers it in the global `_TOOLS` dictionary keyed by name.
3. Attaches the `ToolDef` as `func._tool_def` on the original function.
### Parameter Schema Generation
`_extract_params(func)` inspects the function signature using `inspect`:
- Skips parameters named `self` or `ctx`.
- Maps type annotations to JSON Schema types: `str` -> `"string"`, `int` ->
`"integer"`, `float` -> `"number"`, `bool` -> `"boolean"`, `list` ->
`"array"`. Unannotated parameters default to `"string"`.
- Parameters without defaults are marked as required.
### Schema Output
`ToolDef.to_openai_schema()` returns the tool definition in OpenAI
function-calling format:
```json
{
"type": "function",
"function": {
"name": "tool_name",
"description": "...",
"parameters": {
"type": "object",
"properties": { ... },
"required": [ ... ]
}
}
}
```
### Auto-Discovery
When `ToolRegistry.__init__()` is called, `_discover_tools()` uses
`pkgutil.iter_modules` to find every `.py` file in `cheddahbot/tools/` (skipping
files starting with `_`). Each module is imported via `importlib.import_module`,
which triggers the `@tool` decorators and populates the global registry.
### Tool Execution
`ToolRegistry.execute(name, args)`:
1. Looks up the `ToolDef` in the global `_TOOLS` dict.
2. Inspects the function signature for a `ctx` parameter. If present, injects a
context dictionary containing `config`, `db`, `agent`, and `memory`.
3. Calls the function with the provided arguments.
4. Returns the result as a string (or `"Done."` if the function returns `None`).
5. Catches all exceptions and returns `"Tool error: ..."`.
### Meta-Tools
Two special tools enable runtime extensibility:
**`build_tool`** (in `cheddahbot/tools/build_tool.py`):
- Accepts `name`, `description`, and `code` (Python source using the `@tool`
decorator).
- Writes a new `.py` file into `cheddahbot/tools/`.
- Hot-imports the module via `importlib.import_module`, which triggers the
`@tool` decorator and registers the new tool immediately.
- If the import fails, the file is deleted.
**`build_skill`** (in `cheddahbot/tools/build_skill.py`):
- Accepts `name`, `description`, and `steps` (Python source using the `@skill`
decorator).
- Writes a new `.py` file into the configured `skills/` directory.
- Calls `skills.load_skill()` to dynamically import it.
---
## Scheduler and Heartbeat Design
**File:** `cheddahbot/scheduler.py`
The `Scheduler` class starts two daemon threads at application boot.
### Task Poller Thread
- Runs in `_poll_loop()`, sleeping for `poll_interval_seconds` (default 60)
between iterations.
- Each iteration calls `_run_due_tasks()`:
1. Queries `db.get_due_tasks()` for tasks where `next_run` is NULL or in the
past.
2. For each due task, calls `agent.respond_to_prompt(task["prompt"])` to
generate a response.
3. Logs the result via `db.log_task_run()`.
4. If the schedule is `"once:<datetime>"`, the task is disabled.
5. Otherwise, the schedule is treated as a cron expression: `croniter` is used
to calculate the next run time, which is saved via
`db.update_task_next_run()`.
### Heartbeat Thread
- Runs in `_heartbeat_loop()`, sleeping for `heartbeat_interval_minutes`
(default 30) between iterations.
- Waits 60 seconds before the first heartbeat to let the system initialize.
- Each iteration calls `_run_heartbeat()`:
1. Reads `identity/HEARTBEAT.md`.
2. Sends the checklist to the agent as a prompt: "HEARTBEAT CHECK. Review this
checklist and take action if needed."
3. If the response contains `"HEARTBEAT_OK"`, no action is logged.
4. Otherwise, the response is logged to the daily log via
`memory.log_daily()`.
### Thread Safety
Both threads are daemon threads (they die when the main process exits). The
`_stop_event` threading event can be set to gracefully shut down both loops. The
database layer uses thread-local connections, so concurrent access from the
scheduler threads and the Gradio request threads is safe.
---
## Database Schema
The SQLite database (`data/cheddahbot.db`) contains five tables:
### `conversations`
| Column | Type | Notes |
|--------------|------|--------------------|
| `id` | TEXT | Primary key (hex) |
| `title` | TEXT | Display title |
| `created_at` | TEXT | ISO 8601 UTC |
| `updated_at` | TEXT | ISO 8601 UTC |
### `messages`
| Column | Type | Notes |
|---------------|---------|--------------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `conv_id` | TEXT | Foreign key to `conversations.id` |
| `role` | TEXT | `"user"`, `"assistant"`, or `"tool"` |
| `content` | TEXT | Message body |
| `tool_calls` | TEXT | JSON array of `{name, input}` (nullable) |
| `tool_result` | TEXT | Name of the tool that produced this result (nullable) |
| `model` | TEXT | Model ID used for this response (nullable) |
| `created_at` | TEXT | ISO 8601 UTC |
Index: `idx_messages_conv` on `(conv_id, created_at)`.
### `scheduled_tasks`
| Column | Type | Notes |
|--------------|---------|---------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `name` | TEXT | Human-readable task name |
| `prompt` | TEXT | The prompt to send to the agent |
| `schedule` | TEXT | Cron expression or `"once:<datetime>"`|
| `enabled` | INTEGER | 1 = active, 0 = disabled |
| `next_run` | TEXT | ISO 8601 UTC (nullable) |
| `created_at` | TEXT | ISO 8601 UTC |
### `task_run_logs`
| Column | Type | Notes |
|---------------|---------|------------------------------------|
| `id` | INTEGER | Autoincrement primary key |
| `task_id` | INTEGER | Foreign key to `scheduled_tasks.id`|
| `started_at` | TEXT | ISO 8601 UTC |
| `finished_at` | TEXT | ISO 8601 UTC (nullable) |
| `result` | TEXT | Agent response (nullable) |
| `error` | TEXT | Error message if failed (nullable) |
### `kv_store`
| Column | Type | Notes |
|---------|------|-----------------|
| `key` | TEXT | Primary key |
| `value` | TEXT | Arbitrary value |
### Embeddings Database
A separate SQLite file at `memory/embeddings.db` holds one table:
### `embeddings`
| Column | Type | Notes |
|----------|------|--------------------------------------|
| `id` | TEXT | Primary key (e.g. `"daily:2026-02-14:08:30"`) |
| `text` | TEXT | The original text that was embedded |
| `vector` | BLOB | Raw float32 bytes of the embedding vector |
---
## Identity Files
Three Markdown files in the `identity/` directory define the agent's personality,
user context, and background behavior.
### `identity/SOUL.md`
Defines the agent's personality, communication style, boundaries, and quirks.
This is loaded first into the system prompt, making it the most prominent
identity influence on every response.
Contents are read by `router.build_system_prompt()` at the beginning of each
agent turn.
### `identity/USER.md`
Contains a user profile template: name, technical level, primary language,
current projects, and communication preferences. The user edits this file to
customize how the agent addresses them and what context it assumes.
Loaded by `router.build_system_prompt()` immediately after SOUL.md.
### `identity/HEARTBEAT.md`
A checklist of items to review on each heartbeat cycle. The scheduler reads this
file and sends it to the agent as a prompt every `heartbeat_interval_minutes`
(default 30 minutes). The agent processes the checklist and either confirms
"HEARTBEAT_OK" or takes action and logs it.
### Loading Order in the System Prompt
The system prompt assembled by `router.build_system_prompt()` concatenates these
sections, separated by `\n\n---\n\n`:
1. SOUL.md contents
2. USER.md contents
3. Memory context (long-term + daily log + semantic search results)
4. Tools description (categorized list of available tools)
5. Core instructions (hardcoded behavioral directives)