Skip to main contentSkip to navigation
USBuddy

USBuddy

A private LLM that lives on a USB drive. Plug in, chat at localhost, unplug. The host keeps no model, no history, no daemon.

Rust
llama.cpp
Local LLM
Privacy
GGUF
Cross-platform
Open Source
GitHub

An LLM you can pocket

Stack
Rust + llama.cpp + static SPA
Platforms
macOS, Linux, Windows
Footprint
~10 GB on an exFAT stick

Why I built it

I wanted a chat model I could carry into a room, a coffee shop, or someone else's laptop without leaving a trace. Cloud chat leaks every prompt to a vendor. Local installs leave weights, daemons, and history scattered across the host. USBuddy collapses both problems: the model runs from the stick, the chat UI runs in the browser tab, and yanking the drive kills the process. Nothing persists on the machine you borrowed.
Check mark

Plug, chat, unplug

Double-click the launcher on the drive root and a ChatGPT-style SPA opens at localhost:8765.
The runtime spawns llama-server against a GGUF model on the stick, no install on the host.
Idle for five minutes and the model unloads from RAM. Quit and the process exits clean.
Pull the drive mid-session and the host has no service, no temp files, no scheduled task to clean up.
Check mark

Safety without signing fees

Every write to the drive is atomic and survives a yank without corrupting state.
Model files are checked against the catalog's sha256 on every launch.
A RAM-fit advisor reads each model's KV-cache shape from its GGUF header and refuses to load anything that would spill to disk, which is the most common way local LLMs leak weights to the host.
Releases ship with SHA256SUMS, a CycloneDX SBOM, and SLSA build provenance instead of Apple Developer ID or Authenticode certs.
Check mark

Three installers, one core

GUI installer for the click-through path: pick a drive, pick a model, go.
TUI installer for SSH sessions and headless setups.
CLI installer for scripting and CI, with the same Rust core under the hood.
Check mark

Models on the catalog

Qwen 2.5 7B Instruct and Qwen 2.5 Coder 7B for general chat and coding.
Mistral 7B v0.3 and Llama 3.1 8B (gated) for broader coverage.
Dolphin 2.9.4 for uncensored research use.
Drop any .gguf into the drive's models/ directory and it shows up as a community model.

Stack

Rust workspace (installer + runtime)
llama.cpp / llama-server
Static SPA served from RAM
Metal, CUDA, Vulkan, ROCm autodetection
exFAT drive layout
CycloneDX SBOM + SLSA provenance
GGUF model catalog with sha256 integrity

What it changes

USBuddy turns a $20 USB stick into a portable, private assistant that runs on whatever machine you happen to be sitting at. No account, no install, no residue. For travelers, journalists, field engineers, and anyone who works on borrowed hardware, that is a different threat model than either cloud chat or a local Ollama install can offer.

Cookie Consent

We only use cookies for site functionality and avoid any kind of tracking cookies or privacy invasive software.

Privacy-First Approach

Our optional Cloudflare analytics is privacy-focused and doesn't use cookies or track personal data.