An LLM you can pocket
Stack
Rust + llama.cpp + static SPA
Platforms
macOS, Linux, Windows
Footprint
~10 GB on an exFAT stick
Why I built it
I wanted a chat model I could carry into a room, a coffee shop, or someone else's laptop without leaving a trace. Cloud chat leaks every prompt to a vendor. Local installs leave weights, daemons, and history scattered across the host. USBuddy collapses both problems: the model runs from the stick, the chat UI runs in the browser tab, and yanking the drive kills the process. Nothing persists on the machine you borrowed.
Plug, chat, unplug
Double-click the launcher on the drive root and a ChatGPT-style SPA opens at localhost:8765.
The runtime spawns llama-server against a GGUF model on the stick, no install on the host.
Idle for five minutes and the model unloads from RAM. Quit and the process exits clean.
Pull the drive mid-session and the host has no service, no temp files, no scheduled task to clean up.
Safety without signing fees
Every write to the drive is atomic and survives a yank without corrupting state.
Model files are checked against the catalog's sha256 on every launch.
A RAM-fit advisor reads each model's KV-cache shape from its GGUF header and refuses to load anything that would spill to disk, which is the most common way local LLMs leak weights to the host.
Releases ship with SHA256SUMS, a CycloneDX SBOM, and SLSA build provenance instead of Apple Developer ID or Authenticode certs.
Three installers, one core
GUI installer for the click-through path: pick a drive, pick a model, go.
TUI installer for SSH sessions and headless setups.
CLI installer for scripting and CI, with the same Rust core under the hood.
Models on the catalog
Qwen 2.5 7B Instruct and Qwen 2.5 Coder 7B for general chat and coding.
Mistral 7B v0.3 and Llama 3.1 8B (gated) for broader coverage.
Dolphin 2.9.4 for uncensored research use.
Drop any .gguf into the drive's models/ directory and it shows up as a community model.
Stack
Rust workspace (installer + runtime)
llama.cpp / llama-server
Static SPA served from RAM
Metal, CUDA, Vulkan, ROCm autodetection
exFAT drive layout
CycloneDX SBOM + SLSA provenance
GGUF model catalog with sha256 integrity
What it changes
USBuddy turns a $20 USB stick into a portable, private assistant that runs on whatever machine you happen to be sitting at. No account, no install, no residue. For travelers, journalists, field engineers, and anyone who works on borrowed hardware, that is a different threat model than either cloud chat or a local Ollama install can offer.