Do non-reasoning LLMs scale?
I tested an 8B model vs 14B vs 32B LLMs
Hello community.
I was wondering, if normal LLMs, you know, those non-reasoning models, are as good as Large Reasoning Models (LRM) if I just let them run longer on complex logical tasks. Is there a scaling happening if you apply an 8B vs an 32B model?
On the 8B model I used the distilled version of DeepSeek R1, you know, with the distilled reasoning traces from R1, now as a training set for Qwen3 8B. But only for the 8B, the 14B and 32B are pure non-reasoning models.
The result is unfortunately impressive: 32B model can perform with an improved complexity and provide better reasoning traces and results than smaller 8B models.
You can watch my live video of my tests and the reasoning traces generated and get your own impressions, when to switch to an 8B LLM or when it is recommended to improve your reasoning capability to higher levels.
But with new technology we are working to improve the reasoning performance of smaller LLMs, so we can run them locally, with improved privacy protection. More information how to implement this …
in a later post /video.

