Skip to content

Ghost Pepper: Local Hold-to-Talk Speech-to-Text for macOS

matthartman April 7, 2026 product-announcement medium credibility
View source

Ghost Pepper: Local Hold-to-Talk Speech-to-Text for macOS

Source: GitHub — matthartman/ghost-pepper | Author: matthartman | Published: 2025-01-15 Category: product-announcement | Credibility: medium

Executive Summary

  • Ghost Pepper is an open-source, MIT-licensed macOS menu bar app for Apple Silicon (M1+) that performs fully local speech-to-text via WhisperKit and then cleans up the raw transcript with a locally-run Qwen LLM via LLM.swift — no cloud, no subscription.
  • The project’s stated differentiator is its two-stage pipeline: WhisperKit handles transcription (models ranging 75 MB to 1.4 GB), and a small Qwen model (0.8B–4B) strips filler words and handles self-corrections before pasting the result into the active application.
  • It launched into a saturated market of macOS Whisper-based dictation apps; the main technical novelty is the on-device LLM cleanup step, which can misfire when the transcribed text resembles an AI prompt.

Critical Analysis

Claim: “100% local — transcriptions are never sent to any server”

  • Evidence quality: verifiable (open-source MIT code, no network calls in repository)
  • Assessment: The claim is credible given the architecture: WhisperKit and LLM.swift operate entirely on-device. No network permission is requested in the app. Debug logs are in-memory only and cleared on quit. The code is open-source and auditable.
  • Counter-argument: “Local” is only as trustworthy as the model download path. Models are fetched from Hugging Face on first run — the download itself is a network event. Users in high-sensitivity environments should verify the cached models match known checksums before trusting an air-gapped claim.
  • References:

Claim: “LLM cleanup removes filler words and handles self-corrections”

  • Evidence quality: anecdotal
  • Assessment: The concept is sound — a small, fast LLM can strip “uh”, “um”, and mid-sentence restarts from a raw transcript more gracefully than regex. The Qwen 0.8B model completes in 1–2 seconds, making the round-trip acceptable for casual dictation.
  • Counter-argument: Community testing on Hacker News revealed a reproducible failure mode: when the transcribed speech resembles an AI instruction (e.g., “create tests and ensure all tests pass”), the cleanup LLM tries to execute the instruction rather than return cleaned text. This is a prompt injection boundary issue that the default system prompt does not guard against. The customizable prompt in Settings partially mitigates this, but requires user awareness.
  • References:

Claim: “Powered by WhisperKit” — implies state-of-the-art accuracy

  • Evidence quality: benchmark (third-party)
  • Assessment: WhisperKit delivers strong accuracy on Apple Silicon. An ICML 2025 paper from Argmax (WhisperKit’s creators) reported 2.2% WER using the Large v3 Turbo model on the Neural Engine. The 2025 edge speech benchmark by Ionio showed standard Whisper at 19.96% WER, with distil variants performing significantly better. Ghost Pepper defaults to smaller models (tiny.en, small.en) which trade accuracy for speed — users who choose tiny.en (~75 MB) accept noticeably higher error rates.
  • Counter-argument: For most dictation use cases, 5–10% WER is unacceptable without cleanup. The LLM post-processing step is load-bearing for quality, not just polish. Competing apps like Hex ship with Parakeet v3 (NVIDIA model), which achieves lower WER than Whisper at similar sizes. Ghost Pepper’s model roster does not currently include Parakeet as a first-class option.
  • References:

Claim: “A free and open source alternative to SuperWhisper”

  • Evidence quality: case-study
  • Assessment: Accurate as a positioning statement. SuperWhisper is commercial (tiered pricing), Ghost Pepper is MIT-licensed free software. For users who want to avoid a subscription, Ghost Pepper fulfills the same core use case.
  • Counter-argument: The comparison understates how saturated this category is. The Hacker News comment section described the thread as “a support group for people who have each independently built the same app.” Other notable OSS alternatives include MacWhisper (free tier), TypeWhisper, OpenWhispr, Wordbird, and VoiceInk ($25 one-time). Ghost Pepper’s two-stage LLM cleanup is a genuine differentiator, but most users will not perceive the difference versus a well-tuned regex cleanup pass. The r/macapps subreddit reportedly gates new Whisper dictation submissions to require demonstrated differentiation.
  • References:

Credibility Assessment

  • Author background: matthartman — a GitHub username with no publicly visible corporate affiliation or developer history beyond this project. No publication record or conference presence found. The project is a personal side project.
  • Publication bias: GitHub README / self-published. No peer-reviewed evaluation. The “Show HN” thread is the only external signal.
  • Verdict: medium — the code is open-source and auditable (which improves credibility over closed-source alternatives), but performance claims rest on the upstream WhisperKit benchmarks, not independent testing of the ghost-pepper application itself. The LLM cleanup failure mode is real and underdocumented.