T

charles 284c50acd8 refactor(stt): remove speaker identification (diarization) from transcriber

Removes the speaker diarization pipeline and alignment model from the STT module to reduce resource usage and complexity.
The transcription API remains compatible by returning 'Unknown' as the speaker ID for all transcribed segments.

- Removed DiarizationPipeline and align_model from Transcriber
- Simplified transcribe method to return basic transcription segments
- Updated logging and docstrings to reflect changes

2026-06-06 20:52:04 -07:00

data/lore

feat: implement core D&D helpers logic and system architecture

2026-05-25 22:14:58 -07:00

personas

Formalize agent architecture and refine personas

2026-05-25 20:55:13 -07:00

src

refactor(stt): remove speaker identification (diarization) from transcriber

2026-06-06 20:52:04 -07:00

tests

feat: implement RAG capabilities and Context Pane integration

2026-05-26 22:07:12 -07:00

.env

Add LLM backend support and improve debugging observability

2026-05-28 23:06:25 -07:00

.gitignore

Stable state

2026-05-27 22:30:20 -07:00

.python-version

Migrate to WhisperX for speaker diarization

2026-05-26 21:48:30 -07:00

AGENTS.md

Formalize agent architecture and refine personas

2026-05-25 20:55:13 -07:00

main.py

Update main.py

2026-06-05 23:10:39 -07:00

pyproject.toml

Migrate to WhisperX for speaker diarization

2026-05-26 21:48:30 -07:00

README.md

Refactor STT pipeline and CLI documentation

2026-05-31 15:04:41 -07:00

requirements.txt

Add LLM backend support and improve debugging observability

2026-05-28 23:06:25 -07:00

README.md

D&D Helpers

D&D Helpers is designed to solve the "missing notes" problem. It focuses on the automatic extraction of game-relevant data from live conversation, turning spoken dialogue into structured records.

Core Objective: Automated Data Capture

The primary goal is to listen to game sessions and automatically identify and record critical information into structured files, while ignoring the "noise" of out-of-character (OOC) conversation.

The Pipeline

Listen: Capture audio and convert it to text via Speech-to-Text (STT).
Filter: An LLM analyzes the transcript to strip away OOC nonsense and non-game-relevant chatter.
Extract: The system identifies key events and routes them to the appropriate destination:
- Lore: Narrative details, NPC introductions, and world-building are appended to Markdown files.
- Character State & Inventory: Changes to health, status effects, and loot are updated in JSON files.
Confirm: A human-in-the-loop system suggests these updates via a CLI tool, allowing the user to confirm, edit, or reject the change before it is committed.

Features

Data Trackers

Lore Tracker: A personal wiki for your campaign's lore, NPCs, and locations. Stored in Markdown for rich text and easy version control.
Character & Inventory Tracker: A centralized record of character identity, stats, effects, and gear. Stored in JSON for portability and VTT compatibility.

Summarizer

Distill long sessions into concise highlights. Use LLMs to summarize recorded transcripts into a brief "The Story So Far" document.

Interface & Usage

CLI

The primary interface for confirming automated updates and querying current game state.

Command Line Arguments

Use these flags to manage data ingestion and run the live capture pipeline.

RAG Ingestion

Use these flags to add external documents to the RAG (Retrieval-Augmented Generation) system.

Flag	Description
`--ingest-pdf <path>`	Path to a PDF file to ingest
`--ingest-file <path>`	Path to a markdown file to ingest
`--ingest-dir <path>`	Path to a directory of markdown files to ingest

LLM Configuration

These flags allow you to override the environment variables for the LLM backend.

Flag	Description
`--llm-backend <backend>`	Backend to use (`openai`, `ollama`, or `vllm`)
`--llm-model <model>`	The model name to use
`--llm-api-key <key>`	API key for the LLM backend
`--llm-base-url <url>`	Base URL for the LLM backend

Pipeline Execution

Flag	Description
`--run-pipeline`	Starts the main orchestration pipeline (TUI + STT + LLM)

Example Command

To run the live orchestration pipeline using the configuration specified in your env.sh, you can use:

python main.py --run-pipeline \
  --llm-backend vllm \
  --llm-model google/gemma-4-26b-a4b-it \
  --llm-api-key no-key-required \
  --whisper-model medium \
  --llm-base-url https://vllm.tipsy.codes/v1

Text Editors

Since data is stored in Markdown and JSON, you can use any editor (VS Code, Vim, Obsidian) to manually refine your campaign data.

Technical Stack

Language: Python 3.10+
Data Persistence: Local JSON and Markdown files.
AI Backend: vLLM / OpenAI API compatible endpoints (via openai Python library).
STT Engine: OpenAI Whisper (local) for high-accuracy transcription.
Audio Capture (Linux):
- sounddevice or PyAudio for microphone and system audio capture.
- ffmpeg for audio stream processing and format conversion.
Interface:
- Textual or Rich for a modern, intuitive Terminal User Interface (TUI).
- Click or Typer for command-line argument parsing.