NVIDIA Volta GV100 12nm FinFET GPU Unveiled – Tesla V100 Detailed

by Adeel Younas
0 comment

NVIDIA Volta has been announced at GTC 2017 and the boy it’s a beast. The next-gen graphics processing unit is the world’s first chip that would make use of the industry leading TSMC 12nm FinFET process so let’s cover the whole details of this compute powerhouse.

NVIDIA Volta GV100 Unveiled – Tesla V100 With 5120 CUDA Cores, 16 GB HBM2, and 12nm FinFET Process

In the last GTC, NVIDIA announced the Pascal-based GP100 GPU it was back then fastest graphics chip designed for supercomputers. This year the company is taking next leap in graphics performance and they announced their Volta based GV100 GPU and we are going to take a deep look into the next-gen GPU designed for Artificial Intelligence Learning.



First of all, we also need to talk about workloads this specific chip is designed to handle. The NVIDIA Volta GV100 GPU is designed to power the most computationally intensive HPC, AI and graphics workloads.

GV100 GPU includes 21.1 billion transistors with a die size of 815mm2. It is also fabricated on new TSMC 12 nm FFN high-performance manufacturing process customized for NVIDIA. This GPU is much bigger than 610mm2 Pascal GP100 GPU. NVIDIA Volta GV100 delivers considerably provide more computing performance and adds many new features compared to its predecessor the Pascal GP100 GPU and it architecture family.

SEE ALSO: In the race to build the best AI, there’s already one clear winner

The chip itself behometh this chip is featuring brand new chip architecture that is just insane in terms of raw specifications. The NVIDIA Volta GV100 GPU is composed of six GPC (Graphics Processing Clusters). It has a total 84 Volta streaming multiprocessor unit and 42 TPCs. The 82 SMs is coming with 64 CUDA cores per SM so we looking at a total of 5376 CUDA cores on a complete die. All of 5376 CUDA cores can also be used for FP32 and INT32 programming instructions while there also a total of 2688 FP64 cores. Aside from these, we are looking at 672 Tensor processors 336 Texture Units.NVIDIA-Volta-GV100-GPU-Performance

The memory architecture is also updated with eight 512-bit memory controllers. This rounds up a total of 4096-bit bus interface which supports up to 16GB of HBM2 VRAM. The bandwidth is boosted with speeds of 900 MHz which delivers increased transfer rate of 900 GB/s compared to 729 GB/s on Pascal GP100 each memory controller is attached to 768 KB of L2 cache which total to 6 MB of L2 cache for the entire chip.

NVIDIA Tesla Graphics Cards Comparison

Tesla Graphics Card NameNVIDIA Tesla M2090NVIDIA Tesla K40NVIDIA Telsa K80NVIDIA Tesla P100NVIDIA Tesla V100
GPU Process40nm28nm28nm16nm12nm
GPU NameGF110GK110GK210 x 2GP100GV100
Die Size520mm2561mm2561mm2610mm2815mm2
Transistor Count3.00 Billion7.08 Billion7.08 Billion15 Billion21.1 Billion
CUDA Cores512 CCs (16 CUs)2880 CCs (15 CUs)2496 CCs (13 CUs) x 23840 CCs5120 CCs
Core ClockUp To 650 MHzUp To 875 MHzUp To 875 MHzUp To 1480 MHzUp To 1455 MHz
FP32 Compute1.33 TFLOPs4.29 TFLOPs8.74 TFLOPs10.6 TFLOPs15.0 TFLOPs
FP64 Compute0.66 TFLOPs1.43 TFLOPs2.91 TFLOPs5.30 TFLOPs7.50 TFLOPs
VRAM Size6 GB12 GB12 GB x 216 GB16 GB
VRAM Bus384-bit384-bit384-bit x 24096-bit4096-bit
VRAM Speed3.7 GHz6 GHz5 GHz700 MHz900 MHz
Memory Bandwidth177.6 GB/s288 GB/s240 GB/s720 GB/s900 GB/s
Maximum TDP250W300W235W300W300W

Nvidia Tesla
Key Features

Key compute features of the NVIDIA Volta GV100 based Tesla V100 include the following:

  • New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point data paths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improve performance while also simplifying programming.
  • Second-Generation NVLink The second generation of NVIDIA’s NVLink high-speed interconnect delivers higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. GV100 supports up to 6 NVLink links at 25 GB/s for a total of 300 GB/s. NVLink now supports CPU mastering and cache coherence capabilities with IBM Power 9 CPU-based servers. The new NVIDIA DGX-1 with V100 AI supercomputer uses NVLink to deliver greater scalability for ultra-fast deep learning training.
  • HBM2 Memory: Faster, Higher Efficiency Volta’s highly tuned 16GB HBM2 memory subsystem delivers 900 GB/sec peak memory bandwidth. The combination of both a new generation HBM2 memory from Samsung, and a new generation memory controller in Volta provides 1.5x delivered memory bandwidth versus Pascal GP100 and greater than 95% memory bandwidth efficiency running many workloads.
  • Volta Multi-Process Service Volta Multi-Process Service (MPS) is a new feature of the Volta GV100 architecture providing hardware acceleration of critical components of the CUDA MPS server, enabling improved performance, isolation, and better quality of service (QoS) for multiple computer applications sharing the GPU. Volta MPS also triples the maximum number of MPS clients from 16 on Pascal to 48 on Volta.
  • Enhanced Unified Memory and Address Translation Services GV100 Unified Memory technology in Volta GV100 includes new access counters to allow more accurate migration of memory pages to the processor that accesses the pages most frequently, improving efficiency for accessing memory ranges shared between processors. On IBM Power platforms, new Address Translation Services (ATS) support allows the GPU to access the CPU’s page tables directly.
  • Cooperative Groups and New Cooperative Launch APIs Cooperative Groups is a new programming model introduced in CUDA 9 for organizing groups of communicating threads. Cooperative Groups allows developers to express the granularity at which threads are communicating, helping them to express richer, more efficient parallel decompositions. Basic Cooperative Groups functionality is supported on all NVIDIA GPUs since Kepler. Pascal and Volta include support for new Cooperative Launch APIs that support synchronization amongst CUDA thread blocks. Volta adds support for new synchronization patterns.
  • Maximum Performance and Maximum Efficiency Modes In Maximum Performance mode, the Tesla V100 accelerator will operate unconstrained up to its TDP (Thermal Design Power) level of 300W to accelerate applications that require the fastest computational speed and highest data throughput. Maximum Efficiency Mode allows data center managers to tune power usage of their Tesla V100 accelerators to operate with optimal performance per watt. A not-to-exceed power cap can be set across all GPUs in a rack, reducing power consumption dramatically, while still obtaining excellent track performance.
  • Volta Optimized Software New versions of deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others harness the performance of Volta to deliver dramatically faster training times and higher multi-node training performance. Volta-optimized versions of GPU-accelerated libraries such as cuDNN, cuBLAS, and TensorRT leverage the new features of the Volta GV100 architecture to deliver higher performance for both deep learning and High-Performance Computing (HPC) applications. The NVIDIA CUDA Toolkit version 9.0 includes new APIs and support for Volta features to provide even easier programmability.


Flagship GPUVega 10Navi 10?NVIDIA GP100NVIDIA GV110
GPU ProcessFinFET7nm FinFET?TSMC 16nm FinFETTSMC 12nm FinFET
GPU Transistors15-18 BillionTBC15.3 Billion21.1 Billion
Memory (Consumer Cards)HBM2Next-Gen MemoryGDDR5X/HBM2GDDR6/HBM2?
Memory (Dual-Chip Professional/ HPC)HBM2Next-Gen MemoryHBM2HBM2
HBM2 Bandwidth512 GB/s (Instinct MI25)>1 TB/s?732 GB/s (Peak)900 GB/s
Graphics ArchitectureNext Compute Unit (Vega)Next Compute Unit (Navi)5th Gen Pascal CUDA6th Gen Volta CUDA
Successor of (GPU)Radeon RX 500 Series?Radeon RX 600 Series?GM200 (Maxwell)GV110 (Volta)



Share your thoughts on NVIDIA Volta GV100 12nm FinFET GPU in comment section below.


You may also like

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More