- STG Insight - Strategy, Technology and Growth
- Posts
- DeepSeek’s $6 Million AI Shake-Up: How a Startup Outdid OpenAI’s Billion-Dollar Models 🤖
DeepSeek’s $6 Million AI Shake-Up: How a Startup Outdid OpenAI’s Billion-Dollar Models 🤖
Nvidia’s 16% Stock Plunge, a Chinese Upstart You’ve Never Heard Of, and the Future of AI 🚀
Nvidia’s stock took a 16% dive today, dragging down other tech giants along the way. The unexpected culprit? A Chinese startup called DeepSeek, which has just changed the way we think about AI.
This relatively unknown company created an AI model called DeepSeek-R1 that not only rivals OpenAI’s top-tier models in reasoning, maths, writing, and coding, but sometimes even outperforms them. And here’s the jaw-dropper 🤯: DeepSeek says it pulled this off with just $6 million in funding—while OpenAI is backed by almost $20 billion funding.
So, how on Earth did they manage that? Let’s break it down 🔎.
1. Smarter Training Without the Usual Playbook
Traditional AI models rely on massive human-labelling budgets and vast datasets. DeepSeek took a different path—focusing on a three-step training process that kept costs low and efficiency high:
Step 1: Cold Start with a Tiny Dataset
DeepSeek kicked things off with a small, curated dataset emphasising chain-of-thought reasoning. Instead of billions of data points, they used just a few thousand, carefully picked to teach the model how to think step by step.
Step 2: Rule-Based Reinforcement Learning
Unlike OpenAI, which employs human evaluators (expensive and time-consuming), DeepSeek created a rule-based system: does the maths check out? Does the code run? If yes, reward the model; if not, no reward. This streamlined approach worked just as well, but at a fraction of the usual cost.
Step 3: Self-Improvement via Rejection Sampling
DeepSeek-R1 generated multiple responses, kept the best ones, and learned from them. This technique, known as rejection sampling, yielded high-quality answers across reasoning, maths, and writing tasks.
The upshot? DeepSeek-R1 smashed benchmarks like AIME and MATH-500, with 79.8% on AIME and 97.3% on MATH-500—right on par with OpenAI’s top performance.
2. Distillation: Smaller Models, Big Returns
DeepSeek also used distillation to craft leaner, more efficient versions of R1. Here’s the gist:
The main model (R1) acts as a “teacher,” while smaller models (like Qwen-7B or Llama-70B) learn to mirror its reasoning.
These compact models are cheaper to run but still deliver stellar results.
In fact, DeepSeek’s distilled models cost 20–40 times less to operate than OpenAI’s equivalents. Experiments that cost $300 on OpenAI’s o1 can be done for about $10 with R1. That’s a huge saving 💸, making advanced AI accessible to far more organisations.
3. No Fancy GPUs, No Problem
Here’s where things get really interesting: DeepSeek didn’t have cutting-edge Nvidia GPUs (thanks to export restrictions). Instead of giving up, they improvised:
They embraced sparse activation, activating only a sliver of the model’s parameters at a time, which reduced computational load while keeping performance high.
Through careful optimisation, they dodged the need for pricey hardware like Nvidia’s H100 GPUs (which can cost over $30,000 each).
This approach allowed them to train a world-class model on less powerful kit, proving that clever engineering can trump brute force.
The Concerns
DeepSeek’s methods are undeniably impressive, but they also raise a few red flags 🚩:
Privacy Issues: Their data sources are unclear, leading to whispers that certain privacy safeguards may have been skipped to cut costs.
Ethical Risks: When speed and affordability dominate, what about transparency and fairness? Neglecting these could cause unintended harm.
Broader AI Impact: OpenAI’s big-budget, closed-off approach is now challenged. Will this make AI more democratic, or will it encourage rushed models with hidden flaws?
Why This Matters
DeepSeek isn’t just building a cheaper AI model; they’re rewriting the playbook. Here’s why it’s a big deal:
Cost Efficiency
They developed R1 for $6 million. In contrast, Meta’s Llama-3.1 can cost $60 million, and OpenAI’s budget stretches into the billions.
Accessibility
While OpenAI’s solutions often demand pricey infrastructure, DeepSeek’s distilled models can run on consumer-level GPUs, offering smaller businesses and researchers a chance to play.
Open-Source Collaboration
DeepSeek released R1 as an open-weight model, letting anyone study and refine it. By tapping into community contributions, they cut R&D costs and speed up innovation.
The Big Picture
DeepSeek has shown that you don’t need a bottomless budget to push AI to its limits. With a bit of ingenuity and openness, they’ve levelled the playing field. Is this a one-time shock, or the dawn of an AI era where lean, efficient methods steal the show?
Either way, the established heavyweights need to pay attention—this underdog story may just redefine the future of AI.
What do you think? Let me know!
Reply