Ghost Pepper: Local Hold-to-Talk Speech-to-Text for macOS

Item: Ghost Pepper
Rating: 3
Author: altexs

Source: GitHub — matthartman/ghost-pepper | Author: matthartman | Published: 2025-01-15 Category: product-announcement | Credibility: medium

Executive Summary

Ghost Pepper is an open-source, MIT-licensed macOS menu bar app for Apple Silicon (M1+) that performs fully local speech-to-text via WhisperKit and then cleans up the raw transcript with a locally-run Qwen LLM via LLM.swift — no cloud, no subscription.
The project’s stated differentiator is its two-stage pipeline: WhisperKit handles transcription (models ranging 75 MB to 1.4 GB), and a small Qwen model (0.8B–4B) strips filler words and handles self-corrections before pasting the result into the active application.
It launched into a saturated market of macOS Whisper-based dictation apps; the main technical novelty is the on-device LLM cleanup step, which can misfire when the transcribed text resembles an AI prompt.

Critical Analysis

Claim: “100% local — transcriptions are never sent to any server”

Evidence quality: verifiable (open-source MIT code, no network calls in repository)
Assessment: The claim is credible given the architecture: WhisperKit and LLM.swift operate entirely on-device. No network permission is requested in the app. Debug logs are in-memory only and cleared on quit. The code is open-source and auditable.
Counter-argument: “Local” is only as trustworthy as the model download path. Models are fetched from Hugging Face on first run — the download itself is a network event. Users in high-sensitivity environments should verify the cached models match known checksums before trusting an air-gapped claim.
References:
- WhisperKit GitHub — argmaxinc/WhisperKit
- LLM.swift GitHub — eastriverlee/LLM.swift

Claim: “LLM cleanup removes filler words and handles self-corrections”

Evidence quality: anecdotal
Assessment: The concept is sound — a small, fast LLM can strip “uh”, “um”, and mid-sentence restarts from a raw transcript more gracefully than regex. The Qwen 0.8B model completes in 1–2 seconds, making the round-trip acceptable for casual dictation.
Counter-argument: Community testing on Hacker News revealed a reproducible failure mode: when the transcribed speech resembles an AI instruction (e.g., “create tests and ensure all tests pass”), the cleanup LLM tries to execute the instruction rather than return cleaned text. This is a prompt injection boundary issue that the default system prompt does not guard against. The customizable prompt in Settings partially mitigates this, but requires user awareness.
References:
- Hacker News discussion — Show HN: Ghost Pepper
- Ghost Pepper — Show HN thread community comments

Claim: “Powered by WhisperKit” — implies state-of-the-art accuracy

Evidence quality: benchmark (third-party)
Assessment: WhisperKit delivers strong accuracy on Apple Silicon. An ICML 2025 paper from Argmax (WhisperKit’s creators) reported 2.2% WER using the Large v3 Turbo model on the Neural Engine. The 2025 edge speech benchmark by Ionio showed standard Whisper at 19.96% WER, with distil variants performing significantly better. Ghost Pepper defaults to smaller models (tiny.en, small.en) which trade accuracy for speed — users who choose tiny.en (~75 MB) accept noticeably higher error rates.
Counter-argument: For most dictation use cases, 5–10% WER is unacceptable without cleanup. The LLM post-processing step is load-bearing for quality, not just polish. Competing apps like Hex ship with Parakeet v3 (NVIDIA model), which achieves lower WER than Whisper at similar sizes. Ghost Pepper’s model roster does not currently include Parakeet as a first-class option.
References:

Claim: “A free and open source alternative to SuperWhisper”

Evidence quality: case-study
Assessment: Accurate as a positioning statement. SuperWhisper is commercial (tiered pricing), Ghost Pepper is MIT-licensed free software. For users who want to avoid a subscription, Ghost Pepper fulfills the same core use case.
Counter-argument: The comparison understates how saturated this category is. The Hacker News comment section described the thread as “a support group for people who have each independently built the same app.” Other notable OSS alternatives include MacWhisper (free tier), TypeWhisper, OpenWhispr, Wordbird, and VoiceInk ($25 one-time). Ghost Pepper’s two-stage LLM cleanup is a genuine differentiator, but most users will not perceive the difference versus a well-tuned regex cleanup pass. The r/macapps subreddit reportedly gates new Whisper dictation submissions to require demonstrated differentiation.
References:
- SuperWhisper Alternatives Reviewed 2026 — Voibe
- Choosing the Right AI Dictation App for Mac — AFadingThought

Credibility Assessment

Author background: matthartman — a GitHub username with no publicly visible corporate affiliation or developer history beyond this project. No publication record or conference presence found. The project is a personal side project.
Publication bias: GitHub README / self-published. No peer-reviewed evaluation. The “Show HN” thread is the only external signal.
Verdict: medium — the code is open-source and auditable (which improves credibility over closed-source alternatives), but performance claims rest on the upstream WhisperKit benchmarks, not independent testing of the ghost-pepper application itself. The LLM cleanup failure mode is real and underdocumented.

Referenced in catalog