Skip to content

LLM.swift

★ New
assess
AI / ML open-source MIT open-source

At a Glance

Minimal open-source Swift library for on-device LLM inference on Apple platforms, wrapping llama.cpp with GGUF model support, streaming generation, and a @Generatable macro for type-safe structured output.

Type
open-source
Pricing
open-source
License
MIT
Adoption fit
small
Top alternatives

LLM.swift

Source: eastriverlee/LLM.swift | License: MIT | Type: open-source

What It Does

LLM.swift is a lightweight Swift package by eastriverlee that wraps llama.cpp to provide on-device LLM inference across Apple platforms (macOS, iOS, watchOS, tvOS, visionOS). It exposes a readable Swift-native API for loading GGUF-quantized models, generating streaming text, managing conversation history, and producing type-safe structured output via a @Generatable macro that generates JSON schemas from Swift structs.

The library fills a niche: embedding an LLM directly inside a Swift app without an external process, network call, or the weight of Ollama’s daemon. It is primarily used in apps that need a small, local LLM for a specific narrow task — text cleanup, classification, or guided generation — rather than general-purpose chat.

Key Features

  • llama.cpp backend: runs any GGUF-quantized model compatible with llama.cpp, including Qwen, Mistral, Gemma, and others
  • @Generatable macro: annotate Swift structs and enums to auto-generate JSON schemas for constrained structured output
  • AsyncStream-based token streaming for responsive UI updates
  • Configurable conversation history with token limit management
  • Multiple prompt templates out of the box (ChatML, Gemma, etc.)
  • Customizable preprocessing, postprocessing, and update callbacks
  • Models can be bundled in the app binary or downloaded at runtime from Hugging Face
  • Targets iOS, macOS, watchOS, tvOS, and visionOS

Use Cases

  • Use case 1: In-app text post-processing pipeline (e.g., filler word removal in a dictation app) where a tiny Qwen or Mistral model runs entirely on-device
  • Use case 2: Structured data extraction from user input without a cloud API — forms, classification, or entity recognition inside a native app
  • Use case 3: Offline AI assistant features in apps that must pass App Store privacy nutrition label review without declaring network data use for AI inference

Adoption Level Analysis

Small teams (<20 engineers): Good fit for teams building native macOS or iOS apps that want to embed a small on-device LLM. Minimal dependencies, readable code, Swift Package Manager installation. Not suitable for teams that need Python tooling or cross-platform support.

Medium orgs (20–200 engineers): Narrow fit — only relevant for the Apple native app portion of a product stack. Teams building cross-platform or server-side AI features will use other runtimes (Ollama, llama.cpp directly, MLX). LLM.swift is a component, not an AI infrastructure platform.

Enterprise (200+ engineers): Does not fit as AI infrastructure. Enterprise use cases typically require managed inference, audit logging, model version control, and cross-platform support — none of which LLM.swift provides.

Alternatives

AlternativeKey DifferencePrefer when…
OllamaExternal daemon, REST API, wider model zooYou want a reusable local server shared across apps
Apple MLX SwiftApple-native, better throughput on M-series via MetalYou need maximum token generation speed on Apple Silicon
llama.cpp (direct)More control, C/C++ binding requiredYou need fine-grained control over batching and memory
LocalLLMClientSwift package wrapping both llama.cpp and MLXYou want a unified API supporting MLX models too

Evidence & Sources

Notes & Caveats

  • Solo-maintainer project. LLM.swift is maintained by a single developer (eastriverlee) with no organizational backing. Longevity and security response time are uncertain.
  • Apple-only. Hard dependency on Apple platforms via Swift Package Manager and CoreML/Metal paths in llama.cpp. Not usable outside the Apple ecosystem.
  • Mobile model size constraints. The library recommends 3B parameter models for mobile. Sub-1B models (like Qwen 0.8B) are appropriate for narrow tasks on older hardware but have noticeable quality degradation versus larger models.
  • Prompt injection risk in post-processing pipelines. When LLM.swift is used to process untrusted input (e.g., speech transcription), the model can misinterpret the content as an instruction. Robust system prompt design is required — the default template does not guard against this. Ghost Pepper’s Hacker News thread documented this failure mode specifically.
  • MLX alternative gaining ground. Apple’s MLX framework (and LocalLLMClient) is increasingly preferred for Apple Silicon inference due to better throughput on M-series chips. LLM.swift’s llama.cpp backend will likely be slower for generation-heavy workloads compared to a well-tuned MLX backend.

Related