Files
charles da5ab1bb44 Refactor STT pipeline and CLI documentation
Split the STT worker into a collector and a transcription worker
to offload heavy processing to a background thread. Add the
`--whisper-model` flag and implement LLM latency logging. Expand
the README with comprehensive CLI usage instructions.
2026-05-31 15:04:41 -07:00

3.7 KiB

D&D Helpers

D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.

Core Objective: Automated Data Capture

The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.

The Pipeline

  1. Listen: Capture audio and convert it to text via Speech-to-Text (STT).
  2. Filter: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
  3. Extract: The system identifies key events and routes them to the appropriate destination:
    • Lore: Narrative details, NPC introductions, and world-building are appended to Markdown files.
    • Character State & Inventory: Changes to health, status effects, and loot are updated in JSON files.
  4. Confirm: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.

Features

Data Trackers

  • Lore Tracker: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
  • Character & Inventory Tracker: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.

Summarizer

Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.

Interface & Usage

CLI

The primary interface for confirming automated updates and querying current game state.

Command Line Arguments

Use these flags to manage data ingestion and run the live capture pipeline.

RAG Ingestion

Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.

Flag Description
--ingest-pdf <path> Path to a PDF file to ingest
--ingest-file <path> Path to a markdown file to ingest
--ingest-dir <path> Path to a directory of markdown files to ingest
LLM Configuration

These flags allow you to override the environment variables for the LLM backend.

Flag Description
--llm-backend <backend> Backend to use (openai, ollama, or vllm)
--llm-model <model> The model name to use
--llm-api-key <key> API key for the LLM backend
--llm-base-url <url> Base URL for the LLM backend
Pipeline Execution
Flag Description
--run-pipeline Starts the main orchestration pipeline (TUI + STT + LLM)
Example Command

To run the live orchestration pipeline using the configuration specified in your env.sh, you can use:

python main.py --run-pipeline \
  --llm-backend vllm \
  --llm-model google/gemma-4-26b-a4b-it \
  --llm-api-key no-key-required \
  --whisper-model medium \
  --llm-base-url https://vllm.tipsy.codes/v1

Text Editors

Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.

Technical Stack

  • Language: Python 3.10+
  • Data Persistence: Local JSON and Markdown files.
  • AI Backend: vLLM / OpenAI API compatible endpoints (via openai Python library).
  • STT Engine: OpenAI Whisper (local) for high-accuracy transcription.
  • Audio Capture (Linux):
    • sounddevice or PyAudio for microphone and system audio capture.
    • ffmpeg for audio stream processing and format conversion.
  • Interface:
    • Textual or Rich for a modern, intuitive Terminal User Interface (TUI).
    • Click or Typer for command-line argument parsing.