Local AI Models Are Surprisingly Good at Code Generation
I had an unexpected discovery recently that I wanted to share with you.
I needed a small script to help anonymize some test data. Nothing fancy, but on my way to ChatGPT I ended up on the "wrong window" — and threw the prompt at Ollama's gpt-oss
instead.
And it surprised me by giving me a solid result quickly.
Small local models are the future and being able to run them on hardware you already own is a political statement.
No remote API calls. No burning through tokens/the environment. No company watching. Like a cowboy, just me and my machine.
The Experiment
I ended up testing the same prompt across several different local models and grading the results:
"I need a script that will give me at least 1042 distinct but made up show names. They should be funny and grammatically correct and written in TypeScript"
I expected gpt-oss:20b
to be the best of the lot, but surprisingly the 5-month-old llama3.2
crushed everything on the time dimension.
Key Findings
- 4 out of 7 models got winning results on the first (and only) try
- gpt-oss:20b delivered the highest quality code (4/5 rating)
- llama3.2 was the fastest by a significant margin
- 3 models failed to execute properly (mostly syntax errors)
The most surprising takeaway? These small, local models are good enough to be useful and powerful enough to matter for everyday coding tasks.
Why This Matters
We're entering an era where you don't need to send every coding question to a remote API. Local models give you:
- Privacy: No company watching your prompts
- Speed: No network latency
- Cost: No token burning
- Independence: Your hardware, your rules
The test code and full analysis is available on GitHub, and you can read the complete post with all the detailed results and code quality breakdowns on my site.
Read the full post with detailed results and code examples →
What local models have you been experimenting with? I'd love to hear about your experiences.