We Have Been Inundated
We've been handling a continuous stream of expert network calls since Jan 16th 2025, as everyone is scrambling to analyze DeepSeek’s developments and their far-reaching implications. There have been also deep internal discussions at the companies we work for as well and DeepSeek is being deployed internally and evaluated already. After deeply reading the DeepSeek-V3 and DeepSeek-R1 papers, and discussing this in-depth in our calls, we will give our views on the common questions asked.
Key Questions from Our Clients That We Will Address Here
GPU vs. ASIC Distribution: How will the balance between GPUs and ASICs evolve in the coming years for both AI training and inference? What are the expected shifts in hardware allocation across different AI workloads?
AI Training vs. AI Inference Data Centers: How will the industry differentiate between training-focused vs. inference-optimized data centers? What will be the impact on capital expenditures (CapEx), and which hardware and infrastructure components (e.g., networking, cooling, power) will be most affected?
Industry Impact: Which companies and sectors will feel the greatest impact from DeepSeek’s advancements? This includes key players such as:
NVIDIA
Vertiv (data center infrastructure and power solutions)
Hyperscalers (AWS, Google, Azure, Meta, Tencent, etc.)
MongoDB/Elasticsearch
Training Data providers (Scale AI, Innodata, etc)
Synopsys/Cadence (impact on AI-driven chip design and verification)
Synthetic Data vs. Human Data: As AI shifts toward self-learning models and reinforcement learning, what role will synthetic data play in future training processes? How will the data need evolve: Supervised Fine Tuning (SFT) data vs. Reinforcement Learning (RL) data?
Edge AI vs. Cloud AI: Will AI inference remain centralized in the cloud, or will advancements in hardware efficiency and model distillation push AI inference closer to edge devices? How will the economics of cloud-based vs. on-device AI evolve in the next 3–5 years?
How Did DeepSeek Outperform o1 at a Fraction of the Cost?
How was DeepSeek able to scale so efficiently while Western teams struggled with higher costs and longer development cycles?
What architectural and training innovations allowed them to leapfrog existing approaches?
MoE vs. Dense Architectures: What are the trade-offs between Mixture-of-Experts (MoE) and Dense models in terms of efficiency, scalability, training cost, and inference performance? How will this influence the future direction of AI model architectures?
Rights of Use: What are the legal, ethical, and competitive implications of AI companies using outputs from closed-source models to improve open-source alternatives?
Please note: The insights presented in this article are derived from confidential consultations our team has conducted with clients across private equity, hedge funds, startups, and investment banks, facilitated through specialized expert networks. Due to our agreements with these networks, we cannot reveal specific names from these discussions. Therefore, we offer a summarized version of these insights, ensuring valuable content while upholding our confidentiality commitments.
Lets go through these questions one by one:
1. GPU vs. ASIC Distribution
Given, DeepSeek’s success, a central question is how the mix of GPUs vs. ASICs will evolve for both AI training and inference.