ARM announces the Cortex-A77 CPU core with 20-35% performance improvements
At ARM’s annual TechDay event, ARM has announced the Cortex-A77 CPU core. The Cortex-A77 announcement comes alongside the announcement of the ARM Mali-G77 GPU, which is the first GPU that has the brand new “Valhall” GPU architecture. Together, these two products succeed last year’s Cortex-A76 CPU and Mali-G76 GPU respectively.
UK-based ARM, purchased by Japan-based Softbank in 2016, is one of the most important companies in the technology industry. Every smartphone in the world is powered by ARM’s instruction set. Qualcomm uses a semi-custom “Made for Cortex” license that allows the company to incorporate customized variants of ARM’s CPU IP in its products (for example, the Kryo 485 Gold is a semi-custom variant of the Cortex-A76). Huawei’s HiSilicon group was another high-profile licensee of ARM’s CPU IP, using stock versions of ARM’s CPU cores, whereas Samsung Systems LSI and Apple use fully custom cores on top of ARM’s instruction set. Samsung and HiSilicon also license ARM’s Mali GPUs for their in-house SoCs, while Qualcomm and Apple opt to go with their custom GPU solutions (for example, Qualcomm uses its own Adreno GPUs).
This is why when ARM makes a new announcement, it has significant implications on the smartphone industry. The good news is that ARM has been on a roll for a while now when it comes to making new CPU microarchitectures. The Cortex-A72, Cortex-A73, and Cortex-A75 were all respectable designs that made up for the mistakes of the Cortex-A57. However, last year’s Cortex-A76 took a step beyond in terms of performance as it promised “laptop-class performance” with a 35% performance improvement over the already capable Cortex-A75. Accordingly, Qualcomm promised a 45% performance improvement with the Snapdragon 855—the largest performance bump of any Snapdragon SoC in history.
The Cortex-A76 was a high performer in the fields of IPC, PPA, and efficiency. It had the best PPA in the industry with small die area sizes. It did benefit from TSMC’s excellent 7nm FinFET process, but the IPC improvements it brought also made their mark. It managed to outperform Samsung’s Exynos M3 custom core in the Exynos 9810, despite having a narrower decode width (4-wide vs. 6-wide). Even this year’s release of the Exynos M4 core in the Exynos 9820 wasn’t enough to snatch ARM’s performance advantage (although it did close the gap), as the Cortex-A76 still enjoys a performance and efficiency advantage over the Exynos M4. (The Exynos was also let down by an inferior manufacturing process: 8nm LPP vs. 7nm FinFET). In particular, the energy efficiency of the Cortex-A76 has been found to be incredible. SoCs using the Cortex-A76 include flagship SoCs such as the HiSilicon Kirin 980 and the Qualcomm Snapdragon 855, but we have also started seeing it in mid-range SoCs in the form of the Qualcomm Snapdragon 675 and the Snapdragon 730/730G. The impact on performance has been effective.
In the mobile space, the Cortex-A76 is still inferior to Apple’s custom cores as seen on the Apple A11 and the Apple A12 in terms of instructions per clock (IPC). ARM, however, has shown no sign of slowing its rate of improvements. In August, the company unveiled its CPU core roadmap with a “Deimos” core for 2019 and a “Hercules” core for 2020, both of them based on the Cortex-A76. Impressively, the company promised a 20-25% CAGR improvement in performance every year with each new chipset in the Austin core family. ARM is speeding forward.
The Cortex-A77 is the “Deimos” CPU core, and it will be making its way to late 2019 and early 2020 flagship SoCs. It’s an evolution of the Cortex-A76, and is the second iteration of the Austin core family. The CPU is a direct microarchitectural successor of the A76, and most of its core features are the same. Vendors will be able to upgrade the SoC IP without a lot of effort. In terms of architecture, it remains an ARM v8.2 CPU core that is meant to be paired with a Cortex-A55 “little” core instead a DynamIQ Shared Unit (DSU) cluster.
The cache sizes of the Cortex-A77 are: 64KB L1 instruction and data caches, 256 and 512KB L2 caches, and up to 4MB shared L3 cache. The performance improvements will have to come from microarchitectural improvements, as the frequency of the core isn’t expected to change (ARM still targets 3GHz like the A76, but as with the A76, it’s likely we will see vendors ship designs having lower clocked cores). The process improvements for the next generation of SoCs aren’t expected to be as major as they were in 2018. (TSMC has moved to a 7nm EUV process this year, which will likely be the basis of the next Kirin and Snapdragon chipsets.)
The Cortex-A77, therefore, has an improved microarchitecture that results in 20%-35% performance improvements. The A76 was different from its predecessors in terms of the architecture, and it was meant to serve as a baseline for the next two designs in the Austin core family: the Cortex-A77 in 2019, and “Hercules” in 2020.
ARM’s primary goals were to increase the IPC of the architecture as well as to continue to focus on delivering the best PPA (power, performance, and area) in the industry. The area size and energy efficiency advantages of the A76 will still remain advantages for the A77.
In terms of the microarchitecture, ARM has changed quite a lot. On the front-end the core has a higher fetch bandwidth with a doubling of the brand predictor capability, a new macro-OP cache structure acting as an L0 instruction cache, a new integer ALU pipeline, and revamped load/store queues and issue capability. There are also dynamic code optimizations in tow, and they are explained in detail in ARM’s blog post. The decode width remains at 4-wide.
The back-end of the core also contains improvements, and I recommend users read AnandTech’s coverage for a lot more detail. ARM has added an additional integer ALU. Data prefetchers have also been improved, which is good news considering that the A76 already had superb prefetchers according to AnandTech. New additional prefetching engines have been added to improve prefetching accuracy. All this is related to the memory subsystem of the core, which is a fundamental aspect. The memory subsystem of a CPU comprises of memory latency and memory bandwidth.
ARM promises 20-35% performance improvements for the Cortex-A77
According to ARM, the Cortex-A77 has a 20% IPC single-thread performance improvement over its predecessor in Geekbench 4, 23% in SPECint2006, 35% in SPECfp2006, 20% in SPECint2017, and 25% in SPECfp2017. All of these are projected at a 7nm process and at 3GHz frequency. If these improvements play out, the next-generation SoCs could power some amazing performance and battery life experiences on future smartphones. The FP improvements, in particular, are a significant generational improvement. Of course, the A77 won’t be without competition, as Samsung will be back with the Exynos M5 in 2020, and before that, Apple’s A13 is a certainty to be part of the new iPhones.
ARM also states that the energy efficiency of the A77 will remain the same as the A76 SoCs. What this means is that peak performance, CPU cores will use the same amount of energy (measured in joules) to complete a task. However, power and energy are two different concepts. The A77 will have increased power usage that is linear with the increased performance. This may lead to problems with TDP limits in phones. To counter act this, we are already seeing the major vendors adopt big + medium + little unconventional core configurations (2+2+4 in HiSilicon’s case, and 1+3+4 in Qualcomm’s case). The A77 will also be 17% bigger than the A76, which means that it’s on track to still have best-in-class PPA.
I have been a great fan of the A76’s implementations, as it just works so well even in mid-range SoCs such as the Snapdragon 675. The Snapdragon 855 and the Kirin 980 are both highly performant flagship SoCs, and I can’t wait to see the level of improvements brought by the A77’s implementations in the next-generation SoCs. ARM states that its major clients are still heavily focusing on having the best PPA, and it’s easy to see that the company delivers the best solutions in this regard.
When will we see the A77 in a SoC? Prior to recent tumultuous events with Huawei, I would have said the HiSilicon Kirin 985 would surely be expected to feature the A77 as well as the Mali-G77 GPU for a true next-generation SoC in 2019. However, with ARM deciding to cut ties with Huawei, I doubt if this is possible anymore, unless the combustible situation with Huawei is resolved in the coming weeks. Qualcomm’s next flagship Snapdragon SoC probably won’t ship to consumers until the first quarter of 2020, so consumers looking to use ARM’s newest CPU core may have to wait a while.