Removes the speaker diarization pipeline and alignment model from the STT module to reduce resource usage and complexity. The transcription API remains compatible by returning 'Unknown' as the speaker ID for all transcribed segments. - Removed DiarizationPipeline and align_model from Transcriber - Simplified transcribe method to return basic transcription segments - Updated logging and docstrings to reflect changes
D&D Helpers
D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.
Core Objective: Automated Data Capture
The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.
The Pipeline
- Listen: Capture audio and convert it to text via Speech-to-Text (STT).
- Filter: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
- Extract: The system identifies key events and routes them to the appropriate destination:
- Lore: Narrative details, NPC introductions, and world-building are appended to Markdown files.
- Character State & Inventory: Changes to health, status effects, and loot are updated in JSON files.
- Confirm: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.
Features
Data Trackers
- Lore Tracker: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
- Character & Inventory Tracker: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.
Summarizer
Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.
Interface & Usage
CLI
The primary interface for confirming automated updates and querying current game state.
Command Line Arguments
Use these flags to manage data ingestion and run the live capture pipeline.
RAG Ingestion
Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.
| Flag | Description |
|---|---|
--ingest-pdf <path> |
Path to a PDF file to ingest |
--ingest-file <path> |
Path to a markdown file to ingest |
--ingest-dir <path> |
Path to a directory of markdown files to ingest |
LLM Configuration
These flags allow you to override the environment variables for the LLM backend.
| Flag | Description |
|---|---|
--llm-backend <backend> |
Backend to use (openai, ollama, or vllm) |
--llm-model <model> |
The model name to use |
--llm-api-key <key> |
API key for the LLM backend |
--llm-base-url <url> |
Base URL for the LLM backend |
Pipeline Execution
| Flag | Description |
|---|---|
--run-pipeline |
Starts the main orchestration pipeline (TUI + STT + LLM) |
Example Command
To run the live orchestration pipeline using the configuration specified in your env.sh, you can use:
python main.py --run-pipeline \
--llm-backend vllm \
--llm-model google/gemma-4-26b-a4b-it \
--llm-api-key no-key-required \
--whisper-model medium \
--llm-base-url https://vllm.tipsy.codes/v1
Text Editors
Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.
Technical Stack
- Language: Python 3.10+
- Data Persistence: Local JSON and Markdown files.
- AI Backend: vLLM / OpenAI API compatible endpoints (via
openaiPython library). - STT Engine: OpenAI Whisper (local) for high-accuracy transcription.
- Audio Capture (Linux):
sounddeviceorPyAudiofor microphone and system audio capture.ffmpegfor audio stream processing and format conversion.
- Interface:
TextualorRichfor a modern, intuitive Terminal User Interface (TUI).ClickorTyperfor command-line argument parsing.