If you have powerful hardware, such as the 10,000 NVIDIA H100 AI acceleration chip, it would take about three months to train a large model similar to OpenAI's ChatGPT, which has 1.8 trillion parameters. On the other hand, if you have less powerful hardware, like the 10,000 NVIDIA V100, it would take around two years to create a similar software.
The A800 loophole
The development of these large models is progressing very quickly, with a new generation every six months. Training a big model takes about two years. So, for specific applications, one might need to use smaller models. Small models can be useful for businesses, but at a national level, especially considering China's strategic focus, it's seen as problematic if large AI models cannot be developed independently and controlled.
From a geopolitical perspective, the AI big model competition is like the nuclear race of our time. If "military-grade" AI is a nuclear weapon, then arithmetic chips are the equivalent of enriched uranium, and semiconductor manufacturing equipment is the equivalent of centrifuges.
In the first version of the Advanced Computing Export Control Rule in October 2022, the U.S. government set the cut line for advanced computing chips at "more than 4,800 TOPS of arithmetic power on a single card and more than 600 GB/s of transmission rate in both directions".
TOPS, meaning trillions or tera operations per second, describes how many computing operations an AI chip can handle in one second.
The target of "4800 TOPS + 600GB/s" obviously came from the most advanced NVIDIA A100 accelerator card on the market at that time.
As it turned out, in 2022, after the release of the new regulations, NVIDIA was very smart to launch the A800 chip, which has the same arithmetic power as the A100, but with a slightly lower transfer rate of 400GB/s. The A800 is a "China special" chip, which is made purely to bypass exporting laws. It was designed to perform just below the cut line, so that it could be legally supplied to the Chinese market and become the best performing product that Chinese users could buy in large quantities. NVIDIA compensated as much as they could for the loss of revenue from not being able to sell the A100 to China.
Put in a different time and space, NVIDIA's operation would not be cause for too much concern: after all, what they did is fully legal. Such a practice is also very representative of multinational corporations, with the pursuit of maximum profit margins within the legal cut line in the era of rapid globalization.
NVIDIA cards are almost synonymous with power today, and are a form of currency in the eyes of AI developers.
Although the A800's transfer rate is lower than the A100's, the difference in bandwidth does not significantly affect the results of the work in some cases. NVIDIA, as a big tech giant with a quarter of its global business coming from China, is naturally a key target of public opinion, and the A800's "cleverness" in this context has attracted a lot of criticism from U.S. public opinion. For many, the A800 is a reflection of the loopholes in U.S. export controls. And this loophole must be plugged.
Export control

December 17, 2019: person repairing smartphone
Kilian Seiler/UNSPLASH
Another feature of this export control adjustment is the introduction of a new metric: performance density. Performance density is defined as the amount of computing power per unit of physical area of a chip.
Another policy intent of introducing the density metric is likely to block the possibility of China building large-scale arithmetic clusters using new chiplet technology solutions to bypass export controls. In chiplet technology, developers may utilize die-to-die packaging to increase the number of processor units that can be hosted on each chip, thereby increasing the actual arithmetic power. The BR100 chip utilizes die-to-die technology to achieve high computing power.
In the past 12 months, AI research around the world has evolved at the fastest rate in history.
Core technology has previously created high hopes in China, and is seen by many as an opportunity to break through the hegemony of NVIDIA technology. This rule change will greatly increase the difficulty for Chinese companies to continue to develop core products.
Due to export control and restrictions, all mainstream artificial intelligence arithmetic chips will become the object of export control.
Among the old products, the only ones outside the cut line are NVIDIA V100 and Google TPUv3. As mentioned earlier, the big model takes almost half a year to update a generation, so it would therefore be very time consuming to train the latest big model with a five year old chip — it would take years, so there is almost no practical value.
Compared with the version 1.0 released last year, the version 2.0 of the Advanced Computing Export Controls has been greatly improved in the consideration and design of technical details. There was a lot of research done, and lots of resources mobilized, exceeding the common standards of the U.S. non-military intelligence departments. It is important to realize that when the first version of the chip export controls were introduced in 2022, ChatGPT had not yet been released.
In the past 12 months, AI research around the world has evolved at the fastest rate in history. And today's export controls largely reflect the importance and impact of these technological advances.
The U.S. chose to release this heavy news before the meeting between President Joe Biden and his Chinese equivalent Xi Jinping, completely ignoring China's possible backlash, saying that "everything else can be negotiated, but chips are not negotiable".
More loopholes
However, NVIDIA has not given up. Just when most people thought that after the introduction of the new regulations for their Chinese AI card business was about to be completely cut off, NVIDIA has shown perseverance.
It demonstrated an unparalleled ability to "see the woods for the trees"
Total computing power and computing power density are just two of the many components of the big AI puzzle. Another common shortcoming in many of today's real-world applications is transmission bandwidth. Large arithmetic clusters are often limited by the data transfer flux from card to card, and are unable to utilize the most arithmetic performance of a single card.
One of NVIDIA’s chips, the H20, has a peak power of 2,368 TPP, which corresponds to an export-regulated arithmetic density limit of 3,2, so it is not subject to export control restrictions and can be legally sold to Chinese customers. The H20 has a significantly larger storage capacity and increased transmission bandwidth. Although its computing power is far less than that of the H100, its storage capacity exceeds the H100's 80GB as 96GB, and its transfer rate reaches 900GB/s through NVIDIA's proprietary NVlink technology, which is the same as the H100's. Overall, the H20's product strategy has been to increase the storage capacity and bandwidth of the H100 whilst remaining within the legal limits.
In some ways, the H20 may work better than the H100. But in training scenarios, which are more of a concern at the government level, the H20's performance is still definitely worse than H100. But absolute performance is not the most important thing here, after the release of the new regulations, the H20 is still the best product that Chinese users can legally buy in bulk. It's similar to last year's A800 and H800 — everyone knows it's a watered down version, but it's still the best choice.
The plot twist is amazing. NVIDIA demonstrated an unparalleled ability to "see the wood for the trees", by finding a tiny gap in the rules. This also shows that policy tools can indeed control cutting-edge technology, but that there is often a loophole.
The current chip war is naturally a game between China and the United States. But with NVIDIA, as is often the case with big multinationals, the game broadens to the U.S. government vs big business.
Winners and losers
Every time the U.S. renews its technology sanctions, new winners and losers emerge in China's related industries.
Being directly sanctioned does not just mean losing access to U.S. technology, but also affects one’s reputation, losing suppliers and partners, and increasing the financing cost.
For high-performance chips, the U.S. has adopted the "Foreign Direct Product" rule. That is to say, even if these chips are completely designed by Chinese engineers in Chinese companies in China, as long as any U.S. technology is used in the design process (the most common is the EDA software, which almost all design companies cannot avoid), then its products are subject to U.S. export controls.
In other words, if a Chinese company wants to use or create high quality AI chips, it must obtain a license from the U.S. Department of Commerce. And the probability of getting this license is not high.
Arithmetic power is the power of the country, so whoever has a card in their hand will be rich.
From Your Site Articles
Related Articles Around the Web