DeepSeek: How Chinese AI innovators are challenging the status quo

U. S. controls on exports of complex semiconductors were aimed at slowing China’s progress in AI, but possibly inadvertently spurred innovation. Unable to rely solely on newer hardware, corporations like Hangzhou-based DeepSeek have been forced to find artistic answers to do more with less.

What is more, China is pursuing an open-source strategy and emerging as one of the biggest providers of powerful, fully open-source AI models in the world.

This month, DeepSeek released its R1 model, using advanced techniques such as pure reinforcement learning to create a model that’s not only among the most formidable in the world, but is fully open source, making it available for anyone in the world to examine, modify, and build upon.

DeepSeek-R1 proves that China is not out of the AI ​​race and can in fact dominate the global AI progression thanks to its unexpected open source strategy. Through competitive open source models, Chinese corporations can strengthen their global influence and potentially shape foreign AI criteria and practices. Open source projects also attract talents and resources from around the world to contribute to the progress of Chinese AI. The additional strategy allows China to expand its technological success to emerging countries, potentially integrating its artificial intelligence systems (and, by extension, its values ​​and criteria) into the global virtual infrastructure.

DeepSeek-R1’s functionality is comparable to OpenAI’s most productive reasoning models for a variety of tasks, adding mathematics, coding, and complex reasoning. For example, on the AIME 2024 math benchmark, DeepSeek-R1 scored 79. 8%, compared to OpenAI-o1. 79. 2%. On the MATH-500 benchmark, DeepSeek-R1 achieved 97. 3% compared to o1’s 96. 4%. In coding tasks, DeepSeek-R1 reached the 96. 3 percentile in Codeforces, while o1 achieved the 96. 6th percentile; It is vital to keep in mind that reference checking effects can be flawed and do not deserve to be overinterpreted.

But most notably, DeepSeek has been able to achieve this largely through innovation rather than relying on newer computer chips.

They brought MLA (Multi-Head Latent Attention), which reduces memory usage to only 5% to 13% of the commonly used Multi-Head Latent Attention (MHA) architecture. MHA is a widely used method in AI to process data streams simultaneously, but it requires a lot of reminiscence.

To make their style even more efficient, DeepSeek created the DeepSeekMoESparse framework. “MoE” stands for Expert Mix, which means that the style uses only a small subset of its factors (or “experts”) for the task, rather than running the entire system. The “sparse” component refers to how only mandatory experts are activated, which saves computing power and reduces costs.

The DeepSeek-R1 architecture has 671 billion parameters, but only 37 billion are activated, demonstrating remarkable computing efficiency. The company published a comprehensive whitepaper on GitHub, offering transparency into the model architecture and educational process. The built-in open source code includes the model architecture, educational process and related components, allowing researchers to fully perceive and reflect its design.

These inventions allow the DeepSeek design to be robust and especially more so than its competitors. This has already provoked a war-worthy inference in China, the dangers of which extend to the rest of the world.

DeepSeek charges a small fraction of what OpenAI-o1 costs for API usage. This dramatic reduction in costs could potentially democratize access to advanced AI capabilities, allowing smaller organizations and individual researchers to leverage powerful AI tools that were previously out of reach.

DeepSeek has also been a pioneer in distilling the features of its giant models into smaller, more effective models. These distilled models, ranging from 1. 5 billion to 70 billion parameters, are also open source, providing the studio network with a resilient and effective team to pursue innovation.

By making its models freely available for advertising use, distillation and modification, DeepSeek generates goodwill within the global AI network and potentially sets new criteria for transparency in AI development.

DeepSeek was founded by Liang Wenfeng, 40, one of China’s leading quantitative investors. Its hedge fund, High-Flyer, funds the company’s AI research.

In a rare interview in China, DeepSeek founder Liang warned OpenAI: “In the face of disruptive technologies, gaps created through closed resources are temporary. Even OpenAI’s closed technique cannot prevent others from catching up.

DeepSeek is part of a developing trend of Chinese corporations contributing to the global open-source AI movement, countering the belief that China’s tech sector is primarily focused on imitation rather than innovation.

In September, the Chinese company Alibaba introduced more than a hundred new open source artificial intelligence models as a component of the Qwen 2. 5 family, which supports more than 29 languages. Chinese fashion giant Baidu offers the Ernie series, Zhipu AI the GLM series, and MiniMax the MiniMax-01 family, all offering competitive functionality at much lower prices than leading American models.

As China continues to invest in and publicize the progress of open source AI, while facing challenging situations posed by export controls, the global generational landscape will most likely see additional adjustments to power dynamics, collaboration models and innovation trajectories. The good fortune of this strategy may position China as a major force in shaping the long term of AI, with far-reaching consequences for technological progress, economic competitiveness, and geopolitical influence.

A community. Many voices.   Create a free account to share your thoughts.  

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

To do this, please comply with the posting regulations in our site’s terms of use.   We summarize some of those key regulations below. In short, civilized.

Your message will be rejected if we notice that it appears to contain:

User accounts will be locked if we become aware that users are engaging in:

So, how can you be a power user?

Thank you for reading our Community Guidelines. Please read the full list of posting regulations discovered in our site’s Terms of Use.

Be the first to comment on "DeepSeek: How Chinese AI innovators are challenging the status quo"

Leave a comment

Your email address will not be published.


*