Local AI in Practice - Part 2 of 3

Part 2 of 3 - The models passed the speed test. Then I asked them for JSON. Why the fastest model lost its lead the moment I needed structured output - and what I had to build to make any of them reliable. Why speed is not enough In Part 1 of this series, Llama 3.2:3b won every speed metric - 13.4 tokens per second, lowest latency, fewest tokens generated. On paper, an obvious choice. ...

April 10, 2026 · 10 min · Rajesh Kancharla