ARM launches the ARMv9 architecture with SVE2 and new security features
Earlier today as part of its Vision Day event, ARM disclosed some details about its new ARMv9 architecture, which the company expects will be used in over 300 billion chips this decade.
The last major revision to ARM’s ISA was v8, which was introduced in October of 2011 with the 64-bit AArch64 instruction set. However, ARM has extended ARMv8 over the years with new features such as Memory Tagging in ARMv8.5. With ARMv9, the company is continuing to use AArch64 as the baseline instruction set but has extended it with new features aimed to improve security and performance.
According to ARM, here are the major new features of the ARMv9-A architecture:
- SVE2: extending the benefit of scalable vectors to many more use cases
- Realm Management Extension (RME): extending Confidential Compute on Arm platforms to all developers.
- BRBE: providing profiling information, such as Auto FDO
- Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE): enhanced trace capabilities for Armv9
- TME: hardware transactional memory support for the Arm architecture
For a deeper dive into the high-level changes coming with ARMv9, I recommend reading Andrei Frumusanu’s reporting over at AnandTech, but I’ll be providing a summary of the key changes that you should be aware of.
NEON succeeded by SVE2
NEON is an advanced single instruction multiple data (SIMD) architecture extension. SIMD here refers to a single instruction operating on multiple data items in parallel. These data items are organized into registers that hold vectors of bits.
Scalable Vector Extensions, or SVE, is an extension to ARMv8.2 or later that extends the vector processing capability of AArch64 to address the computing requirements of high performance computing (HPC) tasks and machine learning. Importantly, it also allows for vector register lengths between 128 to 2048 bits. From a software development standpoint, the benefit of a variable vector register length is that code only needs to be compiled once to take full advantage of future CPUs with longer vector registers. Similarly, that code can also be run on CPUs with fewer SIMD execution pipelines, such as those in IoT devices.
As SVE was aimed more at HPC workloads and was also not as versatile an instruction set as NEON, ARM introduced SVE2 in early 2019 to address these issues. SVE2 added new instructions targeting DSP workloads that still rely on NEON. Now with ARMv9, SVE2 is succeeding NEON as a baseline feature of ARMv9 CPUs.
Machine learning improvements
ARM sees machine learning workloads becoming more and more popular in the next decade, which is why previous revisions to ARMv8 introduced new matrix multiplication instructions. These will be baseline features of ARMv9 CPUs, enabling smaller scope ML workloads to run directly on the CPU rather than dedicated accelerators. Obviously, running ML workloads on dedicated accelerators is desired when one prefers fast performance or power efficiency, but it is not always possible to do so on all hardware.
ARMv9’s Confidential Compute Architecture
In an effort to improve security, ARMv9 introduces a new Confidential Compute Architecture (CCA). As AnandTech explains, ARM’s CCA is a shift away from the current software stack situation wherein secure applications running on a device have to trust the OS and hypervisor they’re running on. The current model of security is built upon the fact that more privileged tiers of software can monitor the execution of less privileged software tiers, which can be problematic when the OS or hypervisor are compromised.
How CCA fixes this problem is by dynamically creating “realms”, which are secure, containerized execution environments that are opaque to the OS or hypervisor. Apps within “realms” can attest their trustworthiness to a “realm manager”, code that’s a fraction of the size of a hypervisor, which is now solely responsible for resource allocation and scheduling. The benefit of using “realms” is that the chain of trust is reduced, allowing for secure applications to be run on any device regardless of the underlying OS which will be transparent to security issues.
Source: ARM. Via: AnandTech.
According to AnandTech, ARM didn’t detail exactly how “realms” are separated from the OS and hypervisor, but they speculate that this separation stems from hardware-backed address spaces that can’t interact with each other.
Future ARM CPU and GPU designs
Although it isn’t directly related to ARMv9, ARM shared its projected performance expectations for future v9-based CPU designs. Over the next two generations of mobile IP core designs, ARM expects an aggregate of 30% gains in IPC performance. That means the actual generational increase in performance amounts to around 14%, as AnandTech explains. Clearly, the rate of improvement has slowed down somewhat compared to previous years.
We’ve seen how CPU implementations by companies like Qualcomm, Samsung, and Huawei don’t reach the expected performance projections of new ARM core designs, a fact that ARM points out in a slide that details how CPU performance can be improved by improving the memory path, caches, or frequencies.
Source: ARM. Via: AnandTech.
Still, ARMv9 promises to bring welcome improvements to performance, security, and machine learning when new CPUs based on the ISA ship in commercial devices in early 2022.
As for future Mali GPUs, ARM has disclosed that it is working on technologies such as variable rate shading (VRS) and ray tracing. These features have become popular among high-end PC GPU hardware and the ninth-generation of video game consoles such as Sony’s PlayStation 5 and Microsoft’s Xbox Series X/S.
Featured image credits: ARM via AnandTech