How to Get Gemma 4 26B Running on a Mac Mini with Ollama
So you picked up a Mac mini with the idea of running local LLMs, pulled Gemma 4 26B through Ollama, and... it either crawls at 2 tokens per second or just refuses to load. I've been there. Let me w...

Source: DEV Community
So you picked up a Mac mini with the idea of running local LLMs, pulled Gemma 4 26B through Ollama, and... it either crawls at 2 tokens per second or just refuses to load. I've been there. Let me walk you through what's actually going on and how to fix it. The Problem: "Why Is This So Slow?" The Mac mini with Apple Silicon is genuinely great hardware for local inference. Unified memory means the GPU can access your full RAM pool — no separate VRAM needed. But out of the box, macOS doesn't allocate enough memory to the GPU for a 26B parameter model, and Ollama's defaults aren't tuned for your specific hardware. The result? The model either fails to load, gets killed by the OOM reaper, or runs painfully slowly because half the layers are falling back to CPU inference. Step 0: Check Your Hardware Before anything else, verify what you're working with: # Check your chip and memory sysctl -n machdep.cpu.brand_string sysctl -n hw.memsize | awk '{print $1/1024/1024/1024 " GB"}' # Check how muc