da5ab1bb44
Split the STT worker into a collector and a transcription worker to offload heavy processing to a background thread. Add the `--whisper-model` flag and implement LLM latency logging. Expand the README with comprehensive CLI usage instructions.
92 lines
3.7 KiB
Markdown
92 lines
3.7 KiB
Markdown
# D&D Helpers
|
|
|
|
D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.
|
|
|
|
## Core Objective: Automated Data Capture
|
|
|
|
The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.
|
|
|
|
### The Pipeline
|
|
|
|
1. **Listen**: Capture audio and convert it to text via Speech-to-Text (STT).
|
|
2. **Filter**: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
|
|
3. **Extract**: The system identifies key events and routes them to the appropriate destination:
|
|
- **Lore**: Narrative details, NPC introductions, and world-building are appended to Markdown files.
|
|
- **Character State & Inventory**: Changes to health, status effects, and loot are updated in JSON files.
|
|
4. **Confirm**: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.
|
|
|
|
## Features
|
|
|
|
### Data Trackers
|
|
|
|
- **Lore Tracker**: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
|
|
- **Character & Inventory Tracker**: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.
|
|
|
|
### Summarizer
|
|
|
|
Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.
|
|
|
|
## Interface & Usage
|
|
|
|
### CLI
|
|
|
|
The primary interface for confirming automated updates and querying current game state.
|
|
|
|
#### Command Line Arguments
|
|
|
|
Use these flags to manage data ingestion and run the live capture pipeline.
|
|
|
|
##### RAG Ingestion
|
|
Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.
|
|
|
|
| Flag | Description |
|
|
| :--- | :--- |
|
|
| `--ingest-pdf <path>` | Path to a PDF file to ingest |
|
|
| `--ingest-file <path>` | Path to a markdown file to ingest |
|
|
| `--ingest-dir <path>` | Path to a directory of markdown files to ingest |
|
|
|
|
##### LLM Configuration
|
|
These flags allow you to override the environment variables for the LLM backend.
|
|
|
|
| Flag | Description |
|
|
| :--- | :--- |
|
|
| `--llm-backend <backend>` | Backend to use (`openai`, `ollama`, or `vllm`) |
|
|
| `--llm-model <model>` | The model name to use |
|
|
| `--llm-api-key <key>` | API key for the LLM backend |
|
|
| `--llm-base-url <url>` | Base URL for the LLM backend |
|
|
|
|
##### Pipeline Execution
|
|
| Flag | Description |
|
|
| :--- | :--- |
|
|
| `--run-pipeline` | Starts the main orchestration pipeline (TUI + STT + LLM) |
|
|
|
|
##### Example Command
|
|
|
|
To run the live orchestration pipeline using the configuration specified in your `env.sh`, you can use:
|
|
|
|
```bash
|
|
python main.py --run-pipeline \
|
|
--llm-backend vllm \
|
|
--llm-model google/gemma-4-26b-a4b-it \
|
|
--llm-api-key no-key-required \
|
|
--whisper-model medium \
|
|
--llm-base-url https://vllm.tipsy.codes/v1
|
|
```
|
|
|
|
### Text Editors
|
|
|
|
Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.
|
|
|
|
## Technical Stack
|
|
|
|
- **Language**: Python 3.10+
|
|
- **Data Persistence**: Local JSON and Markdown files.
|
|
- **AI Backend**: vLLM / OpenAI API compatible endpoints (via `openai` Python library).
|
|
- **STT Engine**: OpenAI Whisper (local) for high-accuracy transcription.
|
|
- **Audio Capture (Linux)**:
|
|
- `sounddevice` or `PyAudio` for microphone and system audio capture.
|
|
- `ffmpeg` for audio stream processing and format conversion.
|
|
- **Interface**:
|
|
- `Textual` or `Rich` for a modern, intuitive Terminal User Interface (TUI).
|
|
- `Click` or `Typer` for command-line argument parsing.
|