CheddahBot/docs/TOOLS.md

# CheddahBot Tools Reference

## Overview

CheddahBot uses an extensible tool system that allows the LLM to invoke Python
functions during a conversation. Tools are registered via the `@tool` decorator
and auto-discovered at startup. The LLM receives tool schemas in OpenAI
function-calling format and can request tool invocations, which the agent
executes and feeds back into the conversation.

---

## Registered Tools

### Category: files

#### `read_file`

Read the contents of a file.

| Parameter | Type   | Required | Description      |
|-----------|--------|----------|------------------|
| `path`    | string | Yes      | Path to the file |

Returns the file contents as a string. Files larger than 50,000 characters are
truncated. Returns an error message if the file is not found or is not a regular
file.

---

#### `write_file`

Write content to a file (creates or overwrites).

| Parameter | Type   | Required | Description                    |
|-----------|--------|----------|--------------------------------|
| `path`    | string | Yes      | Path to the file               |
| `content` | string | Yes      | Content to write to the file   |

Creates parent directories automatically if they do not exist.

---

#### `edit_file`

Replace text in a file (first occurrence).

| Parameter  | Type   | Required | Description              |
|------------|--------|----------|--------------------------|
| `path`     | string | Yes      | Path to the file         |
| `old_text` | string | Yes      | Text to find and replace |
| `new_text` | string | Yes      | Replacement text         |

Replaces only the first occurrence of `old_text`. Returns an error if the file
does not exist or the text is not found.

---

#### `list_directory`

List files and folders in a directory.

| Parameter | Type   | Required | Description                          |
|-----------|--------|----------|--------------------------------------|
| `path`    | string | No       | Directory path (defaults to `"."`)   |

Returns up to 200 entries, sorted with directories first. Each entry shows the
name and file size.

---

#### `search_files`

Search for files matching a glob pattern.

| Parameter   | Type   | Required | Description                          |
|-------------|--------|----------|--------------------------------------|
| `pattern`   | string | Yes      | Glob pattern (e.g. `"**/*.py"`)      |
| `directory` | string | No       | Root directory (defaults to `"."`)   |

Returns up to 100 matching file paths.

---

#### `search_in_files`

Search for text content across files.

| Parameter   | Type   | Required | Description                          |
|-------------|--------|----------|--------------------------------------|
| `query`     | string | Yes      | Text to search for (case-insensitive)|
| `directory` | string | No       | Root directory (defaults to `"."`)   |
| `extension` | string | No       | File extension filter (e.g. `".py"`) |

Returns up to 50 matches in `file:line: content` format. Skips files larger
than 1 MB.

---

### Category: shell

#### `run_command`

Execute a shell command and return output.

| Parameter | Type    | Required | Description                              |
|-----------|---------|----------|------------------------------------------|
| `command` | string  | Yes      | Shell command to execute                 |
| `timeout` | integer | No       | Timeout in seconds (default 30, max 120) |

Includes safety checks that block dangerous patterns:
- `rm -rf /`
- `format c:`
- `:(){:|:&};:` (fork bomb)
- `dd if=/dev/zero`
- `mkfs.`
- `> /dev/sda`

Output is truncated to 10,000 characters. Returns stdout, stderr, and exit code.

---

### Category: web

#### `web_search`

Search the web using DuckDuckGo.

| Parameter     | Type    | Required | Description                        |
|---------------|---------|----------|------------------------------------|
| `query`       | string  | Yes      | Search query                       |
| `max_results` | integer | No       | Number of results (default 5)      |

Uses DuckDuckGo HTML search (no API key required). Returns formatted results
with title, URL, and snippet.

---

#### `fetch_url`

Fetch and extract text content from a URL.

| Parameter | Type   | Required | Description    |
|-----------|--------|----------|----------------|
| `url`     | string | Yes      | URL to fetch   |

For HTML pages: strips script, style, nav, footer, and header elements, then
extracts text (truncated to 15,000 characters). For JSON responses: returns raw
JSON (truncated to 15,000 characters). For other content types: returns raw text
(truncated to 5,000 characters).

---

### Category: code

#### `run_python`

Execute Python code and return the output.

| Parameter | Type    | Required | Description                              |
|-----------|---------|----------|------------------------------------------|
| `code`    | string  | Yes      | Python code to execute                   |
| `timeout` | integer | No       | Timeout in seconds (default 30, max 60)  |

Writes the code to a temporary file and runs it as a subprocess using the same
Python interpreter that CheddahBot is running on. The temp file is deleted after
execution. Output is truncated to 10,000 characters.

---

### Category: memory

#### `remember_this`

Save an important fact or instruction to long-term memory.

| Parameter | Type   | Required | Description                     |
|-----------|--------|----------|---------------------------------|
| `text`    | string | Yes      | The fact or instruction to save |

Appends a timestamped entry to `memory/MEMORY.md` and indexes it in the
embedding database for future semantic search.

---

#### `search_memory`

Search through saved memories.

| Parameter | Type   | Required | Description       |
|-----------|--------|----------|-------------------|
| `query`   | string | Yes      | Search query text |

Performs semantic search (or keyword fallback) over all indexed memory entries.
Returns results with similarity scores.

---

#### `log_note`

Add a timestamped note to today's daily log.

| Parameter | Type   | Required | Description            |
|-----------|--------|----------|------------------------|
| `text`    | string | Yes      | Note text to log       |

Appends to `memory/YYYY-MM-DD.md` (today's date) and indexes the text for
semantic search.

---

### Category: scheduling

#### `schedule_task`

Schedule a recurring or one-time task.

| Parameter  | Type   | Required | Description                                       |
|------------|--------|----------|---------------------------------------------------|
| `name`     | string | Yes      | Human-readable task name                          |
| `prompt`   | string | Yes      | The prompt to send to the agent when the task runs|
| `schedule` | string | Yes      | Cron expression or `"once:YYYY-MM-DDTHH:MM"`      |

Examples:
- `schedule="0 9 * * *"` -- every day at 9:00 AM UTC
- `schedule="once:2026-03-01T14:00"` -- one-time execution

---

#### `list_tasks`

List all scheduled tasks.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| (none)    |      |          |             |

Returns all tasks with their ID, name, schedule, enabled status, and next run
time.

---

### Category: media

#### `analyze_image`

Describe or analyze an image file.

| Parameter  | Type   | Required | Description                                    |
|------------|--------|----------|------------------------------------------------|
| `path`     | string | Yes      | Path to the image file                         |
| `question` | string | No       | Question about the image (default: "Describe this image in detail.") |

Reads the image, base64-encodes it, and sends it to the current LLM as a
multimodal message. Supports PNG, JPEG, GIF, WebP, and BMP formats. Requires a
vision-capable model.

---

### Category: data

#### `read_csv`

Read a CSV file and return summary or specific rows.

| Parameter  | Type    | Required | Description                             |
|------------|---------|----------|-----------------------------------------|
| `path`     | string  | Yes      | Path to the CSV file                    |
| `max_rows` | integer | No       | Maximum rows to display (default 20)    |

Returns the data formatted as a Markdown table, with a count of total rows if
the file is larger than `max_rows`.

---

#### `read_json`

Read and pretty-print a JSON file.

| Parameter | Type   | Required | Description           |
|-----------|--------|----------|-----------------------|
| `path`    | string | Yes      | Path to the JSON file |

Returns the JSON content pretty-printed with 2-space indentation. Truncated to
15,000 characters.

---

#### `query_json`

Extract data from a JSON file using a dot-notation path.

| Parameter   | Type   | Required | Description                          |
|-------------|--------|----------|--------------------------------------|
| `path`      | string | Yes      | Path to the JSON file                |
| `json_path` | string | Yes      | Dot-notation path (e.g. `"data.users.0.name"`) |

Supports `*` as a wildcard for arrays. For example, `"results.*.id"` returns
the full array at `results`.

---

### Category: delivery

#### `email_file`

Email a file as an attachment via Gmail SMTP.

| Parameter   | Type   | Required | Description                                      |
|-------------|--------|----------|--------------------------------------------------|
| `file_path` | string | Yes      | Path to the file to send                         |
| `to`        | string | No       | Recipient address (defaults to `EMAIL_DEFAULT_TO`)|
| `subject`   | string | No       | Email subject (defaults to filename)             |

If the file is `.txt`, it is automatically converted to `.docx` before sending.
Requires `GMAIL_USERNAME` and `GMAIL_APP_PASSWORD` in `.env`.

---

### Category: content

#### `write_press_releases`

Full autonomous press-release pipeline.

| Parameter         | Type   | Required | Description                                    |
|-------------------|--------|----------|------------------------------------------------|
| `topic`           | string | Yes      | Press release topic                            |
| `company_name`    | string | Yes      | Company name                                   |
| `url`             | string | No       | Reference URL for context                      |
| `lsi_terms`       | string | No       | LSI keywords to integrate                      |
| `required_phrase` | string | No       | Exact phrase to include once                   |

Generates 7 headlines, AI-picks the best 2, writes 2 full press releases
(600-750 words each), generates JSON-LD schema for each, and saves all files.
Output includes `.txt`, `.docx` (Google Docs-ready), and `.json` schema files
in `data/generated/press_releases/{company}/`.

---

### Category: meta

#### `build_tool`

Create a new tool from a description. The agent writes Python code with the
`@tool` decorator.

| Parameter     | Type   | Required | Description                                  |
|---------------|--------|----------|----------------------------------------------|
| `name`        | string | Yes      | Tool name in snake_case                      |
| `description` | string | Yes      | What the tool does                           |
| `code`        | string | Yes      | Python code with `@tool` decorator           |

See the dedicated section below for details.

---

#### `build_skill`

Create a new multi-step skill from a description.

| Parameter     | Type   | Required | Description                                  |
|---------------|--------|----------|----------------------------------------------|
| `name`        | string | Yes      | Skill name in snake_case                     |
| `description` | string | Yes      | What the skill does                          |
| `steps`       | string | Yes      | Python code with `@skill` decorator          |

See the dedicated section below for details.

---

## How to Create Custom Tools Using `@tool`

### Step 1: Create a Python file in `cheddahbot/tools/`

The file name does not matter (as long as it does not start with `_`), but by
convention it should describe the category of tools it contains.

### Step 2: Import the decorator and define your function

```python
"""My custom tools."""
from __future__ import annotations
from . import tool


@tool("greet_user", "Greet a user by name", category="social")
def greet_user(name: str, enthusiasm: int = 1) -> str:
    exclamation = "!" * enthusiasm
    return f"Hello, {name}{exclamation}"
```

### Decorator Signature

```python
@tool(name: str, description: str, category: str = "general")
```

- `name` -- The tool name the LLM will use to invoke it. Must be unique across
  all registered tools.
- `description` -- A short description shown to the LLM in the system prompt
  and in the tool schema.
- `category` -- A grouping label for organizing tools in the system prompt.

### Function Requirements

- The return type should be `str`. If the function returns a non-string value,
  it is converted via `str()`. If it returns `None`, the result is `"Done."`.
- Type annotations on parameters are used to generate the JSON Schema:
  - `str` -> `"string"`
  - `int` -> `"integer"`
  - `float` -> `"number"`
  - `bool` -> `"boolean"`
  - `list` -> `"array"` (of strings)
  - No annotation -> `"string"`
- Parameters with default values are optional in the schema. Parameters without
  defaults are required.
- To access the agent's runtime context (config, database, memory system, agent
  instance), add a `ctx: dict = None` parameter. The tool registry will
  automatically inject a dictionary with keys `"config"`, `"db"`, `"agent"`,
  and `"memory"`.

### Step 3: Restart CheddahBot

The tool module is auto-discovered on startup. No additional registration code
is needed.

### Full Example with Context Access

```python
"""Tools that interact with the database."""
from __future__ import annotations
from . import tool


@tool("count_conversations", "Count total conversations in the database", category="stats")
def count_conversations(ctx: dict = None) -> str:
    if not ctx or not ctx.get("db"):
        return "Database not available."
    row = ctx["db"]._conn.execute("SELECT COUNT(*) as cnt FROM conversations").fetchone()
    return f"Total conversations: {row['cnt']}"


@tool("get_setting", "Retrieve a value from the key-value store", category="config")
def get_setting(key: str, ctx: dict = None) -> str:
    if not ctx or not ctx.get("db"):
        return "Database not available."
    value = ctx["db"].kv_get(key)
    if value is None:
        return f"No value found for key: {key}"
    return f"{key} = {value}"
```

---

## How `build_tool` (Meta-Tool) Works

The `build_tool` tool allows the LLM to create new tools at runtime without
restarting the application. This is the mechanism by which you can ask the agent
"create a tool that does X" and it will write, save, and hot-load the tool.

### Internal Process

1. **Validation** -- The tool name must be a valid Python identifier.
2. **Code wrapping** -- The provided `code` parameter is wrapped in a module
   template that adds the necessary `from . import tool` import.
3. **File creation** -- The module is written to
   `cheddahbot/tools/<name>.py`. If a file with that name already exists, the
   operation is rejected.
4. **Hot-loading** -- `importlib.import_module()` imports the new module. This
   triggers the `@tool` decorator inside the code, which registers the tool in
   the global `_TOOLS` dictionary.
5. **Cleanup on failure** -- If the import fails (syntax error, import error,
   etc.), the file is deleted to avoid leaving broken modules.

### What the LLM Generates

When the LLM calls `build_tool`, it provides:

- `name`: e.g. `"word_count"`
- `description`: e.g. `"Count words in a text string"`
- `code`: The body of the tool function, including the `@tool` decorator:

```python
@tool("word_count", "Count words in a text string", category="text")
def word_count(text: str) -> str:
    count = len(text.split())
    return f"Word count: {count}"
```

The `build_tool` function wraps this in the necessary imports and writes it to
disk.

### Persistence

Because the tool is saved as a `.py` file in the tools directory, it survives
application restarts. On the next startup, auto-discovery will find and load it
like any other built-in tool.

---

## How `build_skill` Works

The `build_skill` tool creates multi-step skills -- higher-level operations that
can orchestrate multiple actions.

### Internal Process

1. **Validation** -- The skill name must be a valid Python identifier.
2. **Code wrapping** -- The provided `steps` parameter is wrapped in a module
   template that adds `from cheddahbot.skills import skill`.
3. **File creation** -- The module is written to
   `skills/<name>.py` (the project-level skills directory, not inside the
   package).
4. **Dynamic loading** -- `skills.load_skill()` uses
   `importlib.util.spec_from_file_location` to load the module from the file
   path, triggering the `@skill` decorator.

### The `@skill` Decorator

```python
from cheddahbot.skills import skill

@skill("my_skill", "Description of what this skill does")
def my_skill(**kwargs) -> str:
    # Multi-step logic here
    return "Skill completed."
```

Skills are registered in the global `_SKILLS` dictionary and can be listed
with `skills.list_skills()` and executed with `skills.run_skill(name, **kwargs)`.

### Difference Between Tools and Skills

| Aspect     | Tools                                    | Skills                                |
|------------|------------------------------------------|---------------------------------------|
| Invoked by | The LLM (via function calling)           | Code or agent internally              |
| Schema     | OpenAI function-calling JSON schema      | No schema; free-form kwargs           |
| Location   | `cheddahbot/tools/` (inside the package) | `skills/` (project-level directory)   |
| Purpose    | Single focused operations                | Multi-step workflows                  |

---

## Example: Creating a Custom Tool Manually

Suppose you want a tool that converts temperatures between Fahrenheit and
Celsius.

### 1. Create the file

Create `cheddahbot/tools/temperature.py`:

```python
"""Temperature conversion tools."""
from __future__ import annotations
from . import tool


@tool("convert_temperature", "Convert temperature between Fahrenheit and Celsius", category="utility")
def convert_temperature(value: float, from_unit: str = "F") -> str:
    """Convert a temperature value.

    Args:
        value: The temperature value to convert
        from_unit: Source unit - 'F' for Fahrenheit, 'C' for Celsius
    """
    from_unit = from_unit.upper()
    if from_unit == "F":
        celsius = (value - 32) * 5 / 9
        return f"{value} F = {celsius:.1f} C"
    elif from_unit == "C":
        fahrenheit = value * 9 / 5 + 32
        return f"{value} C = {fahrenheit:.1f} F"
    else:
        return f"Unknown unit: {from_unit}. Use 'F' or 'C'."
```

### 2. Restart CheddahBot

The tool is auto-discovered. The LLM will see it in the system prompt:

```
### Utility
- **convert_temperature**(value, from_unit): Convert temperature between Fahrenheit and Celsius
```

### 3. Use it in conversation

> User: "What is 72 degrees Fahrenheit in Celsius?"
>
> The LLM calls `convert_temperature(value=72.0, from_unit="F")` and receives
> `"72.0 F = 22.2 C"`, which it incorporates into its response.

---

## Example: Asking the Agent to Create a Tool via `build_tool`

You can ask the agent to create a tool for you in natural language. The agent
will use the `build_tool` meta-tool to generate and register it.

### Conversation

> **User:** Create a tool that counts the number of lines in a file.
>
> **Agent:** I'll create that tool for you.
>
> *[Uses tool: build_tool]*
> ```
> name: "count_lines"
> description: "Count the number of lines in a file"
> code:
>   @tool("count_lines", "Count the number of lines in a file", category="files")
>   def count_lines(path: str) -> str:
>       from pathlib import Path
>       p = Path(path).resolve()
>       if not p.exists():
>           return f"File not found: {path}"
>       if not p.is_file():
>           return f"Not a file: {path}"
>       try:
>           content = p.read_text(encoding="utf-8", errors="replace")
>           line_count = len(content.splitlines())
>           return f"{path}: {line_count} lines"
>       except Exception as e:
>           return f"Error: {e}"
> ```
> *Tool 'count_lines' created and loaded successfully.*
>
> Done. The `count_lines` tool is now available. You can ask me to count lines
> in any file, or I'll use it automatically when relevant.

### What Happened Behind the Scenes

1. The agent decided `build_tool` was the right tool for the request.
2. It generated valid Python code including the `@tool` decorator.
3. `build_tool` wrote `cheddahbot/tools/count_lines.py` with the wrapped code.
4. `importlib.import_module` loaded the module, triggering registration.
5. The tool is now available for the LLM to use in the current session and all
   future sessions.

### Verifying the Tool Exists

After creation, the tool appears in:

- The system prompt's tools section (on the next turn).
- The output of `ToolRegistry.get_tools_schema()`.
- The file system at `cheddahbot/tools/count_lines.py`.

---

## Tool Registry API Summary

### `ToolRegistry(config, db, agent)`

Constructor. Auto-discovers and imports all tool modules.

### `ToolRegistry.get_tools_schema() -> list[dict]`

Returns all tools as OpenAI function-calling JSON schema objects.

### `ToolRegistry.get_tools_description() -> str`

Returns a human-readable Markdown string listing all tools organized by
category. This is injected into the system prompt.

### `ToolRegistry.execute(name, args) -> str`

Executes a tool by name with the given arguments. Returns the result as a
string. Automatically injects the `ctx` context dict if the tool function
accepts one.

### `ToolRegistry.register_external(tool_def)`

Manually registers a `ToolDef` object. Used for programmatic tool registration
outside the `@tool` decorator.