Mi100 architecture. html>wt The supercomputer, disclosed Tuesday at an 4x Matrix Core Units (per CU) 4x 16-wide SIMD (per CU) for total of 64 Shader Cores per CU. Jul 30, 2020 · The MI100 will be the first compute GPU developed on the CDNA architecture and will launch this December, targeting HPC/AI/machine learning applications for the oil/gas and academic sectors. 1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads. ROCm & PCIe atomics. 33 Structure of the AMD Instinct accelerator (MI100 generation). GeForce GTX 1080 11Gbps. CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit leled scale. Solution architects, system administrators, and other interested readers within those Nov 16, 2020 · The accelerator features 32GB High-bandwidth HBM2 memory at a clock rate of 1. 5 TFLOPS of peak FP64 performance for HPC and up to 46. Being a dual-slot card, the AMD Radeon Instinct MI100 draws power from 2x 8-pin power connectors, with power omputing revolution. 9 shows the AMD Instinct accelerator with its PCIe Gen 4 x16 link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host processor(s). AMD's Arcturus GPU uses the CDNA 1. Accelerate Your DiscoveriesAMD Instinct™ MI100 accelerator is the world’s fastest HPC GPU, engineered from the ground up for the new era of computing. With an up to 4x advantage in HPC performance compared to competitive GPUs, the MI200 accelerator is the first data center GPU to deliver 383 teraflops of theoretical mixed The MI100 generation of the AMD Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total of 32GB with a 4,096bit-wide memory interface. #6. Aug 8, 2022 · AMD made a major design shift in GPU architecture starting with AMD Instinct MI100 products with a new focus on compute intensive use cases like HPC and AI/ML training. Nov 17, 2020 · AMD has announced its latest HPC GPU, the Instinct MI100 accelerator. GPU architecture documentation. Please click the tabs below to switch between GPU product lines. , [n. # Fig. For GPU compute applications, OpenCL version 2. Testing Conducted by AMD performance labs as of October 30th, 2020, on three platforms and software versions typical for the launch dates of the Radeon Instinct MI25 (2018), MI50 (2019) and AMD Instinct MI100 GPU (2020) running the benchmark application TensorFlow ResNet 50 FP 16 batch size 128. The above image shows the AMD Instinct accelerator with its PCIe Gen 4 x16 link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host processor(s). Nov 16, 2020 · The new GPU is essentially a ground-up design focused on high performance and high memory bandwidth for HPC and, to a lesser extent, for AI. Feb 29, 2024 · The AMD Instinct MI300 series accelerators are based on the AMD CDNA 3 architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine learning (ML) workloads. AMD Instinct™ MI300 microarchitecture. AMD's Instinct MI100 will Jan 16, 2024 · The MI100 generation of the AMD Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total of 32GB with a 4,096bit-wide memory interface. The perf to value Nov 8, 2021 · The hub currently offers a range of containers supporting either Radeon Instinct™ MI50, AMD Instinct™ MI100 or AMD Instinct MI200 accelerators including several applications like Chroma, CP2k, LAMMPS, NAMD, OpenMM and more, along with popular ML frameworks TensorFlow and PyTorch. Supported by new For maximum MI100 GPU performance on systems with AMD EPYC™ 7002 series processors (codename “Rome”) and AMI System BIOS, the following configuration of System BIOS settings has been validated. . Use Driver Shipped with ROCm. 4 GB/s are supplied, and together with 4096 Bit memory interface this creates a bandwidth of 1,229 Based on the 2nd Gen AMD CDNA™ architecture, AMD Instinct™ MI200 accelerators deliver a quantum leap in HPC and AI performance over competitive data center GPUs today. The CDNA architecture employs terms such as Compute Unit (CU), CU Core, and Matrix Core, as opposed to SMs, CUDA core, and tensor core for Nvidia GPUs. AMD CDNA architecture is supported by AMD ROCm™, an open software stack that includes a broad set of programming models, tools, compilers, libraries, and runtimes for AI and HPC solution development targeting AMD Instinct accelerators. Nov 16, 2020 · Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2 nd Gen AMD EPYC processors. Nov 16, 2020 · Some of the key highlights of the AMD Instinct MI100 GPU accelerator include: All-New AMD CDNA Architecture- Engineered to power AMD GPUs for the exascale era and at the heart of the MI100 May 25, 2023 · The overall system architecture is designed for extreme scalability and compute performance. Mar 5, 2024 · GPU architecture. Chip lithography. AMD Instinct MI300/CDNA3 ISA. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing / GPGPU applications. Jun 9, 2023 · AMD Instinct™ accelerators enables energy efficient HPC and AI by offering exceptional performance per watt at the device and system level therefore improving the energy efficiency of computation. The estimates for pricing for the AMD MI200 Jul 30, 2020 · As per the benchmarks, the Radeon Instinct MI100 delivers around 13% better FP32 performance versus the Ampere A100 and over 2x performance increase versus the Volta V100 GPUs. 2024-02-29. It is primarily aimed at gamer market. Review hardware aspects of the AMD Instinct™ MI300 series of GPU accelerators and the CDNA™ 3 architecture. 31921 MIOpen runtime version: 2. 6 FP16 TFLOPS, 23. We couldn't decide between Radeon Instinct MI100 and Radeon RX 7900 XTX. A 4 MB L2 cache sits on the XCD as well, and serves all of the die’s CUs. We selected several comparisons of graphics cards with performance close to those reviewed, providing you with more options to consider. LLVM ASan. MI250. AMD Instinct MI300 series. 3 min read time. 2x for one/two GPUs) compared to V-100 GPUs for the UNet test we performed. 33 shows the AMD Instinct accelerator with its PCIe Gen 4 x16 link (16 GT/sec, at the bottom) that connects the GPU to (one of) the host processor(s). 200 Watt. These settings must be used for the qualification process and should be set as default values for the system BIOS. While this guide is a good starting point, developers are encouraged to perform their own performance testing for Jan 16, 2024 · The overall system architecture is designed for extreme scalability and compute performance. The AMD Instinct MI300 series accelerators are well-suited for extreme scalability and compute performance, running on everything . Powered by the AMD CDNA architecture, the MI100 accelerators deliver a giant leap in compute and interconnect performance, offering a nearly 3. We couldn't decide between Radeon Instinct MI100 and GeForce RTX 3090. vs. New containers are continually being added to the hub. performance AI workloads, using Dell PowerEdge servers and AMD Instinct™ MI100 GPUs. Radeon Instinct MI100. ]a) for their MI100 and MI250 products. in a multi-chip architecture that enables dense compute and high-bandwidth memory integration. GPU memory. Nov 16, 2020 · The MI100 GPU is the first to incorporate AMD’s Compute DNA (CDNA) architecture with 120 CUs organized into four arrays. The architecture is similar to the Nvidia Volta/Ampere architecture. 0 architecture and is made using a 7 nm production process at TSMC. MI300X has eight XCDs, giving it 304 total Compute Units. Jun 11, 2024 · The microarchitecture of the AMD Instinct accelerators is based on the AMD CDNA architecture, which targets compute applications such as high-performance computing (HPC) and AI & machine learning (ML) that run on everything from individual servers to the world’s largest exascale supercomputers. Nov 8, 2021 · For reference, this is more than double the FP32 throughput of the MI100, or more than four-times the FP64 throughput. This helps reduce data-movement overhead while enhancing power efficiency. 7 nm. AMD Instinct is AMD 's brand of data center GPUs. e coherence. Multi-Chip Architecture. 355 Watt. Nov 14, 2022 · AMD matrix cores. MI100-14. The AMD Instinct MI300 series accelerators are well-suited for extreme scalability and compute performance, running on everything Sep 18, 2022 · Overall, the compute architecture of the MI200 GCD takes an iterative, evolutionary approach over the previous architecture of the MI100. 6TB/s, and a peak theoretical performance of 19. An evolution of AMD’s earlier GCN architecture, CDNA includes new matrix core engines that boost computational throughput for different numerical formats. d. 5 FP64 TFLOPS) and probably add CDNA (Compute DNA) is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. 5x the boost for HPC (FP32 matrix) and a nearly 7x boost for AI (FP16) throughput compared to Nov 16, 2020 · Powered by the new CDNA architecture, the AMD Instinct MI100 is the world’s fastest HPC GPU, and the first to break the 10 TFLOPS FP64 barrier! Compared to the last-generation AMD accelerators, the AMD Instinct MI100 offers HPC applications almost 3. Be aware that Radeon Instinct MI100 is a desktop card while RTX A4500 is a workstation one. But taking into account difference in the price (about 40-50%), this is a good alternative for data centers. 5 teraflops in single-precision (FP32) operations. The GPU is operating at a frequency of 1000 MHz, which can be boosted up to 1502 MHz, memory is running at 1200 MHz. Inference optimization with MIGraphX. RDNA shift on the consumer side Nov 16, 2020 · Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2 nd Gen AMD EPYC processors. Analogous settings for other However, as shown in the right panel of Figure 2, the training throughput of AMD MI100 with FP32 precision is comparable or slightly higher (1. The peak memory bandwidth of the attached HBM2 is 1. view of the AMD CDNATM 2 architectureThe AMD CDNA 2 architecture incorporates 112 physical compute units per GCD, divided into four arrays; the initial products include 104 (for AMD InstinctTM MI250 and the MI210) or 110 (for AMD. Nov 16, 2020 · Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD EPYC processors. Jan 1, 2019 · November 18, 2020, 07:35:55 PM. The overall system architecture is designed for extreme scalability and compute performance. Dec 17, 2023 · Specifically, every XCD physically has 40 CDNA 3 Compute Units, with 38 of these being enabled per XCD on the MI300X. 1 can be used. AMD Instinct MI300 Series accelerators are built on AMD CDNA™ 3 architecture, which offers Matrix Core Technologies and support for a broad range of precision capabilities—from the highly efficient INT8 and FP8 (including sparsity support for AI), to the most demanding FP64 for HPC. Nov 16, 2020 · The Instinct MI100 GPU accelerator that AMD is announcing today as the SC20 supercomputing conference is in full swing is the first step in revealing how this has been done, and therefore, where at least some of the HPC and AI market will be heading in the coming years. AMD started Radeon Instinct MI100 sales 16 November 2020. 228 TB/sec at a memory clock frequency of 1. The AMD InstinctTM MI200 accelerator family took initial steps towards advanced packaging Feb 29, 2024 · The AMD Instinct MI300 series accelerators are based on the AMD CDNA 3 architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine learning (ML) workloads. Since the introduction of AMD’s CDNA Architecture, Generalized Matrix Multiplication (GEMM) computations are now hardware-accelerated through Matrix Core Processing Units. omputing revolution. The A100 comes to around 73TFlops without matrix acceleration. MI100. If a GPU is not listed on this table, the GPU is not officially supported by AMD. Power consumption (TDP) 300 Watt. The die is split into 4 compute engines, each fed by an Asynchronous Compute Engine, with each compute engine divided into 2 shader engines, and each shader engine having 14 Compute Units (CUs). 5x the boost for HPC (FP32 matrix) and a nearly 7x boost for AI (FP16) throughput compared to Jun 7, 2024 · The microarchitecture of the AMD Instinct MI250 accelerators is based on the AMD CDNA 2 architecture that targets compute applications such as HPC, artificial intelligence (AI), and machine learning (ML) and that run on everything from individual servers to the world’s largest exascale supercomputers. 8 nm. Inception v3 with PyTorch. Radeon R9 390 X2. This is a 7nm TSMC-fabricated GPU code-named Arcturus, and is the first to feature AMD's CDNA architecture. System Architecture {numref} mi100-arch shows the node-level architecture of a system that comprises two AMD EPYC™ processors and (up to) eight AMD Instinct™ accelerators. Unlike CPUs and FPGAs, where there seem to be endless variants of features Summary. 10 driver, 512GB DDR4, RHEL 8. 1x/1. Supported by new accelerated compute platforms from Dell, HPE, Gigabyte and Supermicro, the MI100, combined with AMD EPYC CPUs and ROCm 4. Fabricated on a 7nm process, it features 7680 stream processors, 128GB of HBM2e memory with a bandwidth of 1. 20. The overall system architecture is Mar 19, 2024 · AMD GPU Architecture AMD GPUs use the CDNA architecture (Inc. The AMD Instinct MI300 series accelerators are well-suited for extreme scalability and compute performance, running on everything for the AMD Instinct™ MI100 (32GB HBM2) accelerator (PCIe®) designed with AMD CDNA™ architecture 7nm FinFet process technology at 1,502 MHz peak clock resulted in 32 GB HBM2 memory capacity and 1. 2288 TFLOPS peak theoretical memory bandwidth performance. Under the Hood. 2. Applies to Linux and Windows. And what really matters is the bang for the buck of the devices, and so we have taken the Nvidia A100 street prices, shown in black, and then made estimates shown in red. Compiler disambiguation. File structure (Linux FHS) GPU isolation techniques. The overall system architecture is Mar 22, 2022 · Calculations conducted by AMD Performance Labs as of Sep 18, 2020, for the AMD Instinct™ MI100 (32GB HBM2) accelerator (PCIe®) designed with AMD CDNA™ architecture 7nm FinFet process technology at 1,502 MHz peak clock resulted in 32 GB HBM2 memory capacity and 1. 1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads 2. 2 GHz. Arcturus does not support DirectX. Herefore, the current AMD MI100 mixed precision configuration may need to be optimized for some of the deep learning tasks. 5X faster performance (FP32 matrix), and AI applications nearly 7X boost in throughput (FP16). AMD CDNA™ architecture—underlying AMD Instinct™ accelerators—is built for compute-intensive AI and HPC. Samsung has built a claimed first-of-its-kind supercomputer containing AMD datacenter GPUs affixed with its processing-in-memory chips, which the company said can significantly improve the performance and energy efficiency of training large AI models. The AMD Instinct MI300 series accelerators are based on the AMD CDNA 3 architecture which was designed to deliver leadership performance for HPC, artificial intelligence (AI), and machine learning (ML) workloads. The execution units of the GPU are depicted in the above image as Compute Units (CU). 1 TFLOPS peak FP32 Matrix performance for AI and machine learning, the company is billing the Instinct MI100 as the world’s fastest HPC accelerator for scientific research. The need for HPC and AI/ML was shifting rapidly and AMD was the first GPU vendor to make the decision to focus on this trend with a dedicated GPU architecture. MI300. This is a desktop graphics card based on a CDNA 1. 1x AMD “MI100” Instinct GPU • 1x HPE Slingshot-10 interconnect port • 2x 10 GbE Ethernet NICs for user access • 2x 480 GB SSDs • The processors are the same as on the compute node, but the internal node architecture is different. Sep 18, 2023 · Linux Supported GPUs #. 0 software, is designed to propel Accelerate Your DiscoveriesAMD Instinct™ MI100 accelerator is the world’s fastest HPC GPU, engineered from the ground up for the new era of computing. 1 Nov 16, 2020 · MI100 platform (2020): Gigabyte G482-Z51-00 system comprised of Dual Socket AMD EPYC™ 7702 64-Core Processor, AMD Instinct™ MI100 GPU, ROCm™ 3. Nov 16, 2020 · AMD announced on Monday its Instinct MI100 accelerator, a GPU aimed at speeding up AI software and math-heavy workloads for supercomputers and high-end servers. Audio/Video Cables; Ethernet Cables May 25, 2023 · The overall system architecture is designed for extreme scalability and compute performance. This document also provides suggestions on items that should be the initial focus of additional, application-specific tuning. Using CMake. To meet the AMD HPC and AI energy efficiency goals requires the next level of thinking around architecture, memory and interconnects that combine In this chapter, we are going to briefly review hardware aspects of the AMD Instinct™ MI100 accelerators and the CDNA architecture that is the foundation of these GPUs. ties for HPC and AI. OpenMP support in ROCm May 29, 2023 · The overall system architecture is designed for extreme scalability and compute performance. This document is based on the AMD EPYC™ 7003-series processor family (former codename “Milan”). ]b, [n. Meanwhile, matrix/tensor throughput stands at 90. The PR slides seem to suggest MI100 has 185TFLops (fp16) of "matrix" peak. The MI100 offers up to 11. 5 TFLOPS for FP64/32 Oct 24, 2022 · Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2 nd Gen AMD EPYC processors. Aug 20, 2022 · MI100 MI250X 2X Improvement Performance Contributors Design Frequency Increase • Leveraged CPU expertise • Streamlined micro-architecture and design Power Optimizations • Tuned for low voltage operation • Minimized clock and data movement power Architecture Innovations • Efficient matrix data-paths • Extensive operand reuse and Dec 13, 2022 · Dylan Martin. In the professional GPU market, Nvidia is the undisputed leader after the release of the A100 SXM4 model in the spring of 2020 with 40 GB of HBM2e memory. [1] [2] It replaced AMD's FirePro S brand in 2016. The AMD MI100 is a second-generation GPU accelerator based on the company’s CDNA architecture. #. We couldn't decide between Radeon Instinct MI100 and RTX A4500. The APU uses state-of-the-art die stacking and chiplet technology . COPYRIGHT HPE 2021 Dec 7, 2021 · Meanwhile, with 64GB of memory and CDNA 2 architecture, the MI210 can indeed offer a formidable combination of performance and capabilities compared to AMD’s own Instinct MI100 or Nvidia’s The microarchitecture of the AMD Instinct MI250 accelerators is based on the AMD CDNA 2 architecture that targets compute applications such as HPC, artificial intelligence (AI), and machine learning (ML) and that run on everything from individual servers to the world’s largest exascale supercomputers. 23 TB/s of memory bandwidth to support large data sets. 0 Is XNNPACK available: True. Dec 13, 2023 · GPU models and configuration: AMD Instinct MI100 (gfx908:sramecc+:xnack-) Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: 5. The following tables provide an overview of the hardware specifications for AMD Instinct™ accelerators, and AMD Radeon™ PRO and Radeon™ GPUs. Tue 13 Dec 2022 // 21:30 UTC. 2 GHz and delivers an ultra-high 1. 5x the boost for HPC (FP32 matrix) and a nearly 7x boost for AI (FP16) throughput compared to Based on the 2nd Gen AMD CDNA™ architecture, AMD Instinct™ MI200 accelerators deliver a quantum leap in HPC and AI performance over competitive data center GPUs today. 9 Structure of the AMD Instinct accelerator (MI100 generation). It features 8192 shading units, 512 texture mapping Aug 22, 2022 · AMD CDNA 2 architecture – 2nd Gen Matrix Cores accelerating FP64 and FP32 matrix operations, AMD Instinct MI100 AMD Radeon Instinct MI60 AMD Radeon Instinct MI50 Hardware Software Partners Solutions Services Explore SHI Tools +353 (01) 234 2463 All Hardware; Cables. Fig. Mar 22, 2022 · Calculations conducted by AMD Performance Labs as of Sep 18, 2020, for the AMD Instinct™ MI100 (32GB HBM2) accelerator (PCIe®) designed with AMD CDNA™ architecture 7nm FinFet process technology at 1,502 MHz peak clock resulted in 32 GB HBM2 memory capacity and 1. AMD InstinctTM MI250X, at the heart of the first Exascale system, was enabled by the AMD CDNATM 2 architecture and advanced packaging, as well as AMD Infinity FabricTM, connecting the Instinct GPUs and AMD EPYC 7453s CPU with cac. 32 GB of HBM2 memory clocked at 2. Radeon Instinct MI60. AMD Instinct MI100c has only 32Gb HBM2-memory. With an up to 4x advantage in HPC performance compared to competitive GPUs, the MI200 accelerator is the first data center GPU to deliver 383 teraflops of theoretical mixed 300 Watt. Jan 16, 2024 · The MI100 generation of the AMD Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total of 32GB with a 4,096bit-wide memory interface. 5 TFLOPS Jun 12, 2023 · The overall system architecture is designed for extreme scalability and compute performance. This document is intended for organizations interested in simplifying and accelerating DL solutions with advanced computing and scale-out data management solutions. The table below shows supported GPUs for Instinct™, Radeon Pro™ and Radeon™ GPUs. 350 Watt. 5x the boost for HPC (FP32 matrix) and a nearly 7x boost for AI (FP16) throughput compared to Aug 10, 2023 · The overall system architecture is designed for extreme scalability and compute performance. Jul 28, 2021 · AMD's codenamed Instinct MI200 'Aldebaran' compute GPU is expected to more than double performance of the Instinct MI100 (184. This week at Supercomputing ’20, AMD unveiled our new AMD Instinct™ MI100 accelerator, the world’s fastest HPC GPU accelerator for scientific workloads and the first GPU architecture documentation. Dec 6, 2021 · There is $100 million in non-recurring engineering funds in the Frontier system alone to try to close some of that ROCm-CUDA gap. 1 FP32 TFLOPS, 11. 7. MI100-15. So I assume the real fp16 throughput is 92Tflops, since AMD's matrix acceleration seems to double the default rate (matrix fp32 is around 46TFlops while default fp32 is 23TFlops, for example) 8. Like the NVIDIA A100, the AMD MI100 utilizes fast HMB2 Nov 16, 2020 · AMD is claiming that the Instinct MI100, which is using the company's first-generation CDNA architecture, offers an unmatched FP64 and FP32 performance for HPC workloads, coming in at 11. AMD Radeon Instinct MI100 Block Diagram Jun 18, 2024 · Accelerator and GPU hardware specifications. For more information about the terms used, see the specific documents and guides, or Understanding the HIP programming model. We've got no test results to judge. The AMD InstinctTM MI100 accelerator has been designed in lock-step with AMD’s award winning 2nd Gen AMD EPYCTM processors, built on our Infinity Architecture, to deliver true heterogeneous compute capabil. Nov 17, 2020 · Published Nov 17, 2020. LLVM target name. Feb 4, 2022 · Contents Preface . Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA (Radeon DNA), a consumer graphics focused microarchitecture. Combine these innovations with our partners’ system oferings and the open and portable AMD Jan 31, 2024 · The MI100 generation of the AMD Instinct accelerator offers four stacks of HBM generation 2 (HBM2) for a total of 32GB with a 4,096bit-wide memory interface. Feb 1, 2024 · Architectural Overview AMD MI100. Should you still have questions concerning choice between the reviewed GPUs, ask them in Comments section, and we shall answer. AMD has paired 32 GB HBM2 memory with the Radeon Instinct MI100, which are connected using a 4096-bit memory interface. That’s a large increase over the MI250X’s 220 CUs. Built on the company’s new CDNA architecture and offering up to 11. 0 architecture and made with 7 nm manufacturing process. Each device includes: • Twenty-four x86-architecture ‘Zen 4’ cores in three Comparisons with similar GPUs. We would like to show you a description here but the site won’t allow us. SPOCK USER ACCESS NODES. Matrix multiplication is a fundamental aspect of Linear Algebra and it is an ubiquitous computation within High Performance Computing (HPC) Applications. Combine these innovations with our partners’ system oferings and the open and portable AMD Nov 16, 2020 · Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD EPYC processors. 2024-04-16. With a die size of 750 mm² and a transistor count of 25,600 million it is a very big chip. Nov 8, 2021 · Fundamentally, MI200 appears to use an updated and enhanced version of the GPU that powered the MI100 — AMD calls the architecture CDNA2, similar to the RDNA2 vs. ma ls wt tg jv gf hw et kw wo