USBuddy

A private LLM that lives on a USB drive. Plug in, chat at localhost, unplug. The host keeps no model, no history, no daemon.

Rust

llama.cpp

Local LLM

Privacy

GGUF

Cross-platform

Open Source

An LLM you can pocket

Stack

Rust + llama.cpp + static SPA

Platforms

macOS, Linux, Windows

Footprint

~10 GB on an exFAT stick

Why I built it

I wanted a chat model I could carry into a room, a coffee shop, or someone else's laptop without leaving a trace. Cloud chat leaks every prompt to a vendor. Local installs leave weights, daemons, and history scattered across the host. USBuddy collapses both problems: the model runs from the stick, the chat UI runs in the browser tab, and yanking the drive kills the process. Nothing persists on the machine you borrowed.

Plug, chat, unplug

Double-click the launcher on the drive root and a ChatGPT-style SPA opens at localhost:8765.

The runtime spawns llama-server against a GGUF model on the stick, no install on the host.

Idle for five minutes and the model unloads from RAM. Quit and the process exits clean.

Pull the drive mid-session and the host has no service, no temp files, no scheduled task to clean up.

Safety without signing fees

Every write to the drive is atomic and survives a yank without corrupting state.

Model files are checked against the catalog's sha256 on every launch.

A RAM-fit advisor reads each model's KV-cache shape from its GGUF header and refuses to load anything that would spill to disk, which is the most common way local LLMs leak weights to the host.

Releases ship with SHA256SUMS, a CycloneDX SBOM, and SLSA build provenance instead of Apple Developer ID or Authenticode certs.

Three installers, one core

GUI installer for the click-through path: pick a drive, pick a model, go.

TUI installer for SSH sessions and headless setups.

CLI installer for scripting and CI, with the same Rust core under the hood.

Models on the catalog

Qwen 2.5 7B Instruct and Qwen 2.5 Coder 7B for general chat and coding.

Mistral 7B v0.3 and Llama 3.1 8B (gated) for broader coverage.

Dolphin 2.9.4 for uncensored research use.

Drop any .gguf into the drive's models/ directory and it shows up as a community model.

Stack

Rust workspace (installer + runtime)

llama.cpp / llama-server

Static SPA served from RAM

Metal, CUDA, Vulkan, ROCm autodetection

exFAT drive layout

CycloneDX SBOM + SLSA provenance

GGUF model catalog with sha256 integrity

What it changes

USBuddy turns a $20 USB stick into a portable, private assistant that runs on whatever machine you happen to be sitting at. No account, no install, no residue. For travelers, journalists, field engineers, and anyone who works on borrowed hardware, that is a different threat model than either cloud chat or a local Ollama install can offer.

USBuddy

An LLM you can pocket

Why I built it

Plug, chat, unplug

Safety without signing fees

Three installers, one core

Models on the catalog

Stack

What it changes

Cookie Consent

Privacy-First Approach