This is a statement I sent to a reporter asking for comments on DeepSeek: The DeepSeek-R1 paper represents an interesting technical breakthrough that aligns with where many of us believe AI development needs to go - away from brute force approaches toward more targeted, efficient architectures. First, there's the remarkable engineering pragmatism. Working with H800 GPUs that have more constrained memory bandwidth due to U.S. sanctions, the team achieved impressive results through extreme optimization. They went as far as programming 20 of the 132 processing units on each H800 specifically for cross-chip communications - something that required dropping down to PTX (NVIDIA's low-level GPU assembly language) because it couldn't be done in CUDA. This level of hardware optimization demonstrates how engineering constraints can drive innovation. Their success with model distillation - getting strong results with smaller 7B and 14B parameter models - is particularly significant. Instead of following the trend of ever-larger models that try to do everything, they showed how more focused, efficient architectures can achieve state-of-the-art results in specific domains. This targeted approach makes more sense than training massive models that attempt to understand everything from quantum gravity to Python code. But the more fundamental contribution is their insight into model reasoning. Think about how humans solve complex multiplication - say 143 × 768. We don't memorize the answer; we break it down systematically. The key innovation in DeepSeek-R1 is using pure reinforcement learning to help models discover this kind of step-by-step reasoning naturally, without supervised training. This is crucial because it shifts the model from uncertain "next token prediction" to confident systematic reasoning. When solving problems step-by-step, the model develops highly concentrated (confident) token distributions at each step, rather than making uncertain leaps to final answers. It's similar to how humans gain confidence through methodical problem-solving rather than guesswork - if I asked you what is 143x768 then you might guess an incorrect answer (maybe in the right ballpark) - but if I give you a pencil and paper and you can write it out the way you learnt to do multiplications - you will arrive at the answer. So chain of thought reasoning is an example of "algorithms" encoded in the training data that can be explored to transform these models from "stochastic parrots" to thinking machines. Their work shows how combining focused architectures with systematic reasoning capabilities can lead to more efficient and capable AI systems, even when working with hardware constraints. This could point the way toward developing AI that's not just bigger, but smarter and more targeted in its approach.
Call it frugal innovation, or necessity being the mother of invention, of course if you don't have unlimited access to resources then you're going to find another way of achieving a goal. The idea that AI was only possible with the vast processing power that's been thrown at it is no different to back in the heady dot-com days when companies that were exploding with new business solved their problems by "throwing tin at it" as if scalability could only be achieved by more and more and larger servers. But we're in such a hype cycle that first the market swung towards the OpenAI and Nvidia model, and now someone has come along with something different, but in the end the truth doubtless will end up being somewhere in the middle. Wondering when we'll hit the "lets slow down a little and have a realistic think about this" stage of the cycle.
But at the same time the more specific thinking that is baked in via the Reward functions of RL, the more it will do well for that category of questions without any necessary improvement in a different category of questions, except through some pattern similarities. This approach could work well for those category of problems where a clear Reward function can be defined, but not for other category of problems...
I've been saying this for years: we need to focus on smaller, high-quality models instead of massive, inefficient ones. The problem is that smaller, precise models don't attract as much investment because they can be developed at a fraction of the cost. Many AI companies ride the hype wave, securing massive public and private grants with the same tired narrative: more money, bigger datasets, better outcomes. It's a flawed mindset that prioritizes scale over substance. 🤦🏼♂️
| AI/ML strategy for healthcare, financial, and industrial applications | CEO, MyCellome LLC | Recreational AI-assisted vibe coder :-) |
7moOne quick caveat -- if you go back to AI/ML 1.0, reinforcement learning (by the very nature of the algorithms, and its past use in robotics and control) learns "motor tapes" of the k-dim curve fitted Y = F(X) space. So learning and generalization are "piecewise" and not "global". Therefore extensive testing needs to be performed before a model can be "certified" to be able to adequately generalize over the entire space and to be sure that it is not extrapolating, in parts. The fact that the models are open source (and over 500 use cases have already been built on HuggingFace) will hopefully stress-test this very quickly.