dnd-helpers/README.md

# D&D Helpers

D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.

## Core Objective: Automated Data Capture

The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.

### The Pipeline

1. **Listen**: Capture audio and convert it to text via Speech-to-Text (STT).
2. **Filter**: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
3. **Extract**: The system identifies key events and routes them to the appropriate destination:
   - **Lore**: Narrative details, NPC introductions, and world-building are appended to Markdown files.
   - **Character State & Inventory**: Changes to health, status effects, and loot are updated in JSON files.
4. **Confirm**: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.

## Features

### Data Trackers

- **Lore Tracker**: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
- **Character & Inventory Tracker**: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.

### Summarizer

Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.

## Interface & Usage

### CLI

The primary interface for confirming automated updates and querying current game state.

#### Command Line Arguments

Use these flags to manage data ingestion and run the live capture pipeline.

##### RAG Ingestion
Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.

| Flag | Description |
| :--- | :--- |
| `--ingest-pdf <path>` | Path to a PDF file to ingest |
| `--ingest-file <path>` | Path to a markdown file to ingest |
| `--ingest-dir <path>` | Path to a directory of markdown files to ingest |

##### LLM Configuration
These flags allow you to override the environment variables for the LLM backend.

| Flag | Description |
| :--- | :--- |
| `--llm-backend <backend>` | Backend to use (`openai`, `ollama`, or `vllm`) |
| `--llm-model <model>` | The model name to use |
| `--llm-api-key <key>` | API key for the LLM backend |
| `--llm-base-url <url>` | Base URL for the LLM backend |

##### Pipeline Execution
| Flag | Description |
| :--- | :--- |
| `--run-pipeline` | Starts the main orchestration pipeline (TUI + STT + LLM) |

##### Example Command

To run the live orchestration pipeline using the configuration specified in your `env.sh`, you can use:

```bash
python main.py --run-pipeline \
  --llm-backend vllm \
  --llm-model google/gemma-4-26b-a4b-it \
  --llm-api-key no-key-required \
  --whisper-model medium \
  --llm-base-url https://vllm.tipsy.codes/v1
```

### Text Editors

Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.

## Technical Stack

- **Language**: Python 3.10+
- **Data Persistence**: Local JSON and Markdown files.
- **AI Backend**: vLLM / OpenAI API compatible endpoints (via `openai` Python library).
- **STT Engine**: OpenAI Whisper (local) for high-accuracy transcription.
- **Audio Capture (Linux)**:
  - `sounddevice` or `PyAudio` for microphone and system audio capture.
  - `ffmpeg` for audio stream processing and format conversion.
- **Interface**:
  - `Textual` or `Rich` for a modern, intuitive Terminal User Interface (TUI).
  - `Click` or `Typer` for command-line argument parsing.