Files
dnd-helpers/README.md
T
charles da5ab1bb44 Refactor STT pipeline and CLI documentation
Split the STT worker into a collector and a transcription worker
to offload heavy processing to a background thread. Add the
`--whisper-model` flag and implement LLM latency logging. Expand
the README with comprehensive CLI usage instructions.
2026-05-31 15:04:41 -07:00

92 lines
3.7 KiB
Markdown

# D&D Helpers
D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.
## Core Objective: Automated Data Capture
The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.
### The Pipeline
1. **Listen**: Capture audio and convert it to text via Speech-to-Text (STT).
2. **Filter**: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
3. **Extract**: The system identifies key events and routes them to the appropriate destination:
- **Lore**: Narrative details, NPC introductions, and world-building are appended to Markdown files.
- **Character State & Inventory**: Changes to health, status effects, and loot are updated in JSON files.
4. **Confirm**: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.
## Features
### Data Trackers
- **Lore Tracker**: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
- **Character & Inventory Tracker**: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.
### Summarizer
Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.
## Interface & Usage
### CLI
The primary interface for confirming automated updates and querying current game state.
#### Command Line Arguments
Use these flags to manage data ingestion and run the live capture pipeline.
##### RAG Ingestion
Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.
| Flag | Description |
| :--- | :--- |
| `--ingest-pdf <path>` | Path to a PDF file to ingest |
| `--ingest-file <path>` | Path to a markdown file to ingest |
| `--ingest-dir <path>` | Path to a directory of markdown files to ingest |
##### LLM Configuration
These flags allow you to override the environment variables for the LLM backend.
| Flag | Description |
| :--- | :--- |
| `--llm-backend <backend>` | Backend to use (`openai`, `ollama`, or `vllm`) |
| `--llm-model <model>` | The model name to use |
| `--llm-api-key <key>` | API key for the LLM backend |
| `--llm-base-url <url>` | Base URL for the LLM backend |
##### Pipeline Execution
| Flag | Description |
| :--- | :--- |
| `--run-pipeline` | Starts the main orchestration pipeline (TUI + STT + LLM) |
##### Example Command
To run the live orchestration pipeline using the configuration specified in your `env.sh`, you can use:
```bash
python main.py --run-pipeline \
--llm-backend vllm \
--llm-model google/gemma-4-26b-a4b-it \
--llm-api-key no-key-required \
--whisper-model medium \
--llm-base-url https://vllm.tipsy.codes/v1
```
### Text Editors
Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.
## Technical Stack
- **Language**: Python 3.10+
- **Data Persistence**: Local JSON and Markdown files.
- **AI Backend**: vLLM / OpenAI API compatible endpoints (via `openai` Python library).
- **STT Engine**: OpenAI Whisper (local) for high-accuracy transcription.
- **Audio Capture (Linux)**:
- `sounddevice` or `PyAudio` for microphone and system audio capture.
- `ffmpeg` for audio stream processing and format conversion.
- **Interface**:
- `Textual` or `Rich` for a modern, intuitive Terminal User Interface (TUI).
- `Click` or `Typer` for command-line argument parsing.