Specifications of the BR100

This GPU is based on the 7nm process node featuring 77 billion transistors (Just 3 billion shy of NVIDIA’s H100). TSMC’s 2.5D CoWoS design has been used for this process node. As for the memory, this monstrosity is powered by 64GB of HBM2e having a bandwidth of around 2.3TB/s. The chip size comes out to be around 1074mm².  

An Architectural Overview

As stated above, the GPU features an MCM design consisting of 2 chiplets where each chiplet is powered by 16 SPC (Streaming Processing Clusters). Every SPC consists of 16 EUs (Execution Units) and 4 EUs form a Compute Unit (CU).

Chiplets : 2SPCs : 2×16 = 32EUs = 32×16 = 512CUs = 512/4 = 128

Inside the SPC, we can find 16 EUs. A more detailed insight shows that each EU consists of 16 streaming processing cores (V-core) and a T-core or a Tensor core. The x16 streaming processing cores (Or 1 V-Core) power FP32, FP16, INT32, INT16 computations. 

BR100 vs A100

In comparison to last-gen’s Ampere based A100, the BR100 is around 2.6x faster in select benchmarks. This puts to show how quick China is accelerating in the GPU department. However, sorry for being a killjoy but the Hopper based H100 is around 2-3x faster in the same benchmarks. Those Tensor cores can boost this lead to around 30x in various tests.

General Use

The GPU is meant for China’s AI department and is said to mimic human behavior with its enhance AI performance. This is so that China can rely on its own technology.    Featured Image Credit : ferdibtk at Freepik

Meet the Biren BR100  China s Fastest GPU That is Nearly 3x Faster Than NVIDIA s A100 - 45Meet the Biren BR100  China s Fastest GPU That is Nearly 3x Faster Than NVIDIA s A100 - 38Meet the Biren BR100  China s Fastest GPU That is Nearly 3x Faster Than NVIDIA s A100 - 10