T + T – Normal size
If we consider that graphics processors GPU It is the food of artificial intelligence, Chinese companies are facing what can be called a “famine” since the US ban on sales of advanced electronic chips to China. Now, almost two years after this decision, how did Chinese companies manage to circumvent this ban and return to the artificial intelligence race with minimal capabilities?
It is no secret that the world of artificial intelligence today depends in its prosperity on hundreds of thousands of graphic processors known in short as graphics processors.GPUWhich led to the amazing rise in the value of Nvidia and its shares, and contributed tremendously to the rapid development of major artificial intelligence platforms such as ChatGPT and others.
For example, Meta’s latest AI model, Lama 3, required 16,000 processors to train. H100 Advanced from Nvidia, and Meta seeks to increase this number by 600,000 processors before the end of the year.
But since October 2022, when the United States decided to ban the sale of powerful and advanced processors to China, Chinese companies have been struggling to survive in this fierce, super-smart race. Some of them have even had to resort – according to reports – to the black market and other parallel markets in order to buy these precious processors. However, the majority have turned to a different strategy by exploiting the available resources to the maximum possible extent, in an interesting way.
Let’s take DeepSec. DeepSeek Hangzhou-based Chinese startup CI is an example. In September, the company launched version 2.5 of its AI, which rivals leading open-source models in terms of programming and runs on 10,000 of Nvidia’s older processors—a large number for a Chinese company but small by U.S. standards.
But the DeepSec model deals with the shortage of numbers and development in the best possible way and with out-of-the-box ideas technically. For example, it relies on multiple networks of “virtual experts”, each of which specializes in solving a specific problem or task. The model distributes each problem to the appropriate “virtual expert”, which leads to improving speed and reducing processing time, which in turn leads to better results in less time.
Although DeepSec relies on 236 billion parameters, a variable value used to train the model, it only takes a tenth of that number to process any set of new inputs, compressing new information as it goes, allowing it to handle large data sets more efficiently than traditional models.
It’s not just DeepSec, there are many companies that have found innovative solutions to the resource shortage crisis, such as the model MiniCPMDeveloped by Tsinghua University and startup ModelBeast, it has just 2.4 billion parameters. Despite the relatively small number of parameters, the model delivers performance equivalent to 13 billion parameters, nearly 6 times its original capabilities. Like DeepSec, the model relies on virtual experts and input compression to achieve performance beyond its capabilities.
This is not to say that this trend is limited to Chinese companies. For example, the Geist model, which Google launched last July, was fed small samples of high-quality data before being allowed to receive larger, lower-quality inputs, which led to it being 13 times faster and 10 times more efficient than conventional methods, Google claims.
But the difference between American and Chinese companies is that the latter do not have the luxury of choosing how they want to deal with resource scarcity. As Nathan Benitch of AI investment fund Air Street Capital says, “A scarcity mindset definitely drives efficiency,” or as the saying goes in three comprehensive and concise words: necessity is the mother of invention.