But not everyone will agree with this wording, because the results of the ML Perf 0.7 rating published today can be interpreted slightly differently. For example, NVIDIA is talking about the fastest supercomputer – based on the A100, of course – among the commercially available solutions. Whereas Google used unannounced TPU v4.
When it comes to machine intelligence systems, it usually means either the use of already trained neural networks, or the process of training a new network. The latter requires orders of magnitude more computing capabilities and implies the use of powerful multi-core systems. The MLPerf test suite is often used to assess performance. As for the full list of MLPerf 0.7 participants with detailed results, it is available on the MLPerf project website.
Google has been developing its own machine learning accelerators for a long time: back in 2017, we described one of the first TPU models capable of quickly multiplying 256 × 256 matrices. More recently, the third version of TPU set a number of records in the field of “coaching” neural networks. The basis of the record-breaking system was then the Cloud TPU Pod module, each of these modules contained more than 1000 Google TPU chips and developed over 100 Pflops.
The main competitor of Google in this area can be called NVIDIA, which also pays very serious attention to the development of AI accelerators. Even the V100-based solutions easily competed with Google’s TPU v3, and the newest A100 based on Ampere architecture demonstrated even higher levels of performance at MLPerf Training.
However, Google is not going to give up and the Google Research division has published the results of a new test of MLPerf Training 0.7, which is based on the yet-to-be-announced TPU v4 tensor coprocessors. It was not possible to smash the A100 in all tests, but the rivalry turned out to be quite worthy: in some scenarios, NVIDIA nevertheless turned out to be faster, but in others the development of Google took the lead.
NVIDIA, in turn, reports 16 records for the latest DGX A100 and separately notes that its products are available for purchase (and run any ML Perf tests or real loads), while competitors’ results are often either incomplete or obtained on hardware. experimental or impossible to acquire right now.
For testing, we used the implementation of AI models in TensorFlow, JAX, PyTorch, XLA and Lingvo. Four of the eight models were “trained” in less than 30 seconds, which is a very impressive result. For comparison, in 2015, on modern hardware at that time, a similar learning process would have taken more than three weeks. In general, TPU v4 promises to be 2.7 times faster than TPU v3, but the official announcement of the fourth iteration of the Google coprocessor will dot the i’s.
For more information on testing MLPerf 0.7, check out the official Google Cloud blog. There you can also find details about TPU-based systems, but this information is still limited to the third version of the chip. So far, it is known that the fourth generation TPU is more than twice as fast in matrix multiplication operations, boasts a faster memory subsystem and has an improved interconnect system.