NVIDIA Volta GV100 12nm FinFET GPU Unveiled – Tesla V100 Detailed

- Advertisement -

NVIDIA Volta has been announced at GTC 2017 and the boy it’s a beast. The next-gen graphics processing unit is the world’s first chip that would make use of the industry leading TSMC 12nm FinFET process so let’s cover the whole details of this compute powerhouse.

NVIDIA Volta GV100 Unveiled – Tesla V100 With 5120 CUDA Cores, 16 GB HBM2, and 12nm FinFET Process

In the last GTC, NVIDIA announced the Pascal-based GP100 GPU it was back then fastest graphics chip designed for supercomputers. This year the company is taking next leap in graphics performance and they announced their Volta based GV100 GPU and we are going to take a deep look into the next-gen GPU designed for Artificial Intelligence Learning.

- Advertisement -


First of all, we also need to talk about workloads this specific chip is designed to handle. The NVIDIA Volta GV100 GPU is designed to power the most computationally intensive HPC, AI and graphics workloads.

- Advertisement -

GV100 GPU includes 21.1 billion transistors with a die size of 815mm2. It is also fabricated on new TSMC 12 nm FFN high-performance manufacturing process customized for NVIDIA. This GPU is much bigger than 610mm2 Pascal GP100 GPU. NVIDIA Volta GV100 delivers considerably provide more computing performance and adds many new features compared to its predecessor the Pascal GP100 GPU and it architecture family.

SEE ALSO: In the race to build the best AI, there’s already one clear winner

- Advertisement -

The chip itself behometh this chip is featuring brand new chip architecture that is just insane in terms of raw specifications. The NVIDIA Volta GV100 GPU is composed of six GPC (Graphics Processing Clusters). It has a total 84 Volta streaming multiprocessor unit and 42 TPCs. The 82 SMs is coming with 64 CUDA cores per SM so we looking at a total of 5376 CUDA cores on a complete die. All of 5376 CUDA cores can also be used for FP32 and INT32 programming instructions while there also a total of 2688 FP64 cores. Aside from these, we are looking at 672 Tensor processors 336 Texture Units.NVIDIA-Volta-GV100-GPU-Performance

The memory architecture is also updated with eight 512-bit memory controllers. This rounds up a total of 4096-bit bus interface which supports up to 16GB of HBM2 VRAM. The bandwidth is boosted with speeds of 900 MHz which delivers increased transfer rate of 900 GB/s compared to 729 GB/s on Pascal GP100 each memory controller is attached to 768 KB of L2 cache which total to 6 MB of L2 cache for the entire chip.

NVIDIA Tesla Graphics Cards Comparison

Tesla Graphics Card Name NVIDIA Tesla M2090 NVIDIA Tesla K40 NVIDIA Telsa K80 NVIDIA Tesla P100 NVIDIA Tesla V100
GPU Process 40nm 28nm 28nm 16nm 12nm
GPU Name GF110 GK110 GK210 x 2 GP100 GV100
Die Size 520mm2 561mm2 561mm2 610mm2 815mm2
Transistor Count 3.00 Billion 7.08 Billion 7.08 Billion 15 Billion 21.1 Billion
CUDA Cores 512 CCs (16 CUs) 2880 CCs (15 CUs) 2496 CCs (13 CUs) x 2 3840 CCs 5120 CCs
Core Clock Up To 650 MHz Up To 875 MHz Up To 875 MHz Up To 1480 MHz Up To 1455 MHz
FP32 Compute 1.33 TFLOPs 4.29 TFLOPs 8.74 TFLOPs 10.6 TFLOPs 15.0 TFLOPs
FP64 Compute 0.66 TFLOPs 1.43 TFLOPs 2.91 TFLOPs 5.30 TFLOPs 7.50 TFLOPs
VRAM Size 6 GB 12 GB 12 GB x 2 16 GB 16 GB
VRAM Bus 384-bit 384-bit 384-bit x 2 4096-bit 4096-bit
VRAM Speed 3.7 GHz 6 GHz 5 GHz 700 MHz 900 MHz
Memory Bandwidth 177.6 GB/s 288 GB/s 240 GB/s 720 GB/s 900 GB/s
Maximum TDP 250W 300W 235W 300W 300W

Nvidia Tesla
Key Features

Key compute features of the NVIDIA Volta GV100 based Tesla V100 include the following:

  • New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point data paths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improve performance while also simplifying programming.
  • Second-Generation NVLink The second generation of NVIDIA’s NVLink high-speed interconnect delivers higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. GV100 supports up to 6 NVLink links at 25 GB/s for a total of 300 GB/s. NVLink now supports CPU mastering and cache coherence capabilities with IBM Power 9 CPU-based servers. The new NVIDIA DGX-1 with V100 AI supercomputer uses NVLink to deliver greater scalability for ultra-fast deep learning training.
  • HBM2 Memory: Faster, Higher Efficiency Volta’s highly tuned 16GB HBM2 memory subsystem delivers 900 GB/sec peak memory bandwidth. The combination of both a new generation HBM2 memory from Samsung, and a new generation memory controller in Volta provides 1.5x delivered memory bandwidth versus Pascal GP100 and greater than 95% memory bandwidth efficiency running many workloads.
  • Volta Multi-Process Service Volta Multi-Process Service (MPS) is a new feature of the Volta GV100 architecture providing hardware acceleration of critical components of the CUDA MPS server, enabling improved performance, isolation, and better quality of service (QoS) for multiple computer applications sharing the GPU. Volta MPS also triples the maximum number of MPS clients from 16 on Pascal to 48 on Volta.
  • Enhanced Unified Memory and Address Translation Services GV100 Unified Memory technology in Volta GV100 includes new access counters to allow more accurate migration of memory pages to the processor that accesses the pages most frequently, improving efficiency for accessing memory ranges shared between processors. On IBM Power platforms, new Address Translation Services (ATS) support allows the GPU to access the CPU’s page tables directly.
  • Cooperative Groups and New Cooperative Launch APIs Cooperative Groups is a new programming model introduced in CUDA 9 for organizing groups of communicating threads. Cooperative Groups allows developers to express the granularity at which threads are communicating, helping them to express richer, more efficient parallel decompositions. Basic Cooperative Groups functionality is supported on all NVIDIA GPUs since Kepler. Pascal and Volta include support for new Cooperative Launch APIs that support synchronization amongst CUDA thread blocks. Volta adds support for new synchronization patterns.
  • Maximum Performance and Maximum Efficiency Modes In Maximum Performance mode, the Tesla V100 accelerator will operate unconstrained up to its TDP (Thermal Design Power) level of 300W to accelerate applications that require the fastest computational speed and highest data throughput. Maximum Efficiency Mode allows data center managers to tune power usage of their Tesla V100 accelerators to operate with optimal performance per watt. A not-to-exceed power cap can be set across all GPUs in a rack, reducing power consumption dramatically, while still obtaining excellent track performance.
  • Volta Optimized Software New versions of deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others harness the performance of Volta to deliver dramatically faster training times and higher multi-node training performance. Volta-optimized versions of GPU-accelerated libraries such as cuDNN, cuBLAS, and TensorRT leverage the new features of the Volta GV100 architecture to deliver higher performance for both deep learning and High-Performance Computing (HPC) applications. The NVIDIA CUDA Toolkit version 9.0 includes new APIs and support for Volta features to provide even easier programmability.


GPU Family AMD Vega AMD Navi NVIDIA Pascal NVIDIA Volta
Flagship GPU Vega 10 Navi 10? NVIDIA GP100 NVIDIA GV110
GPU Process FinFET 7nm FinFET? TSMC 16nm FinFET TSMC 12nm FinFET
GPU Transistors 15-18 Billion TBC 15.3 Billion 21.1 Billion
Memory (Consumer Cards) HBM2 Next-Gen Memory GDDR5X/HBM2 GDDR6/HBM2?
Memory (Dual-Chip Professional/ HPC) HBM2 Next-Gen Memory HBM2 HBM2
HBM2 Bandwidth 512 GB/s (Instinct MI25) >1 TB/s? 732 GB/s (Peak) 900 GB/s
Graphics Architecture Next Compute Unit (Vega) Next Compute Unit (Navi) 5th Gen Pascal CUDA 6th Gen Volta CUDA
Successor of (GPU) Radeon RX 500 Series? Radeon RX 600 Series? GM200 (Maxwell) GV110 (Volta)
Launch 2017 2018 2016



Share your thoughts on NVIDIA Volta GV100 12nm FinFET GPU in comment section below.

- Advertisement -
63045c01f4ad860aad8ad74961a9564b - NVIDIA Volta GV100 12nm FinFET GPU Unveiled - Tesla V100 Detailed
Adeel Younashttps://techwafer.com
Editor-in-Chief at TechWafer (previously The Tech) an Entrepreneur, Blogger, Developer, and Freelancer.

More from author


Please enter your comment!
Please enter your name here

Related posts


Latest posts

realme GT Master Edition Series Officially Released in China

realme – the world’s fastest growing smartphone brand has released the GT Master Edition Series. Designed by Naoto Fukasawa, the famous Japanese industrial designer,...

The Unveiling OF CAMON 17 From a Tech Guru’s POV!

TECNO Pakistan launched the CAMON 17 as the "Clearest Selfie Camera Phone". Well! All thanks to its huge 16MP AI-enabled selfie camera that comes...

Bigger Celebrations, Bigger Offers! OPPO F19 and A54 dropped down to new amazing prices for you to enjoy your Eid!

As consumers prepare for the festivities of Eid Ul-Adha, the Global leading smartphone brand OPPO decides to make your Eid even more exciting by...

Want to stay up to date with the latest news?

We would love to hear from you! Please fill in your details and we will stay in touch. It's that simple!