ARM announces the Mali-G77 GPU with new “Valhall” GPU architecture and 1.4x performance improvements
ARM has announced the Mali-G77 GPU alongside the Cortex-A77 CPU at its annual TechDay. While the Cortex-A77 is a significant generational advancement over its predecessor, the Cortex-A76, the Mali-G77 GPU is something different entirely. It’s the first GPU in ARM’s Mali lineup to bring a new GPU architecture since the Mali-G71, which brought the Bifrost architecture in 2016. The Mali-G77 brings the brand new “Valhall” architecture.
Although’s ARM’s CPU IP has been historically quite competitive in the wider smartphone landscape, the company’s Mali lineup of GPUs has struggled to compete with best-in-class solutions over the years. Time and time again, the Mali series of GPUs proved to be inferior to their Adreno and Imagination Technologies’ PowerVR GPUs in terms of performance and power efficiency. The Bifrost architecture succeeded the Midgard architecture, switching from a vector type to a scalar type. Unfortunately, it didn’t result in overcoming the performance and power efficiency gap that was seemingly growing larger. The Mali-G71 and the Mali-G72 suffered from excessively high power consumption and throttling, which made them inferior to the Qualcomm’s Adreno GPUs and Apple’s custom GPU (starting with the Apple A11).
The poor GPU performance became such a significant issue that vendors were looking down the prospect of minor GPU gains achieved after a generation. The Exynos 9810‘s Mali-G72MP18 GPU was a mild improvement from its predecessor, for instance. Huawei’s HiSilicon Group struggled with the Mali GPUs to a much larger extent. The HiSilicon Kirin 960 and the Kirin 970 were let down by GPUs consuming abnormally high amounts of power while providing relatively less performance, to the extent that Huawei was forced to introduce an unconventional throttling mechanism, which led to benchmark cheating being discovered for several Huawei phones last year.
Last year’s Mali-G76 did, thankfully, provide substantial improvements on both the performance and power efficiency fronts. Using a 10-core version of the Mali-G76, HiSilicon was able to promise 46% performance improvements, and even though the company achieved the performance numbers, it still wasn’t able to take the GPU performance (both peak and sustained performance) as well as power efficiency crown. Samsung Systems LSI implemented a 12-core version of the GPU in the Exynos 9820, and ended up narrowing the gap to the Qualcomm Snapdragon 855’s Adreno 640 GPU. Qualcomm’s Adreno GPUs have remained the class leaders in the Android market, but Apple went one better last year with the Apple A12’s custom GPU. Apple was able to beat Qualcomm both in terms of peak and sustained performance, and the company showcased competitive power efficiency as well. Currently, the A12’s GPU remains the leader, while the Snapdragon 855’s Adreno 640 GPU is placed second on most benchmarks.
In the face of this competitive environment, ARM needed to step up to met the challenge.
The result of this was the Mali-G77 and the new Valhall architecture. ARM says that it delivers a 30% increase in performance density, 30% energy efficiency improvements, and 60% improvement for machine learning (ML). ARM expects Mali-G77-based to deliver 40% better peak graphics performance in mobile devices.
The company expects the Mali-G77 to bring more high-end gaming to mobile phones, and notes that 2018 was the year when revenues from mobile gaming overtook revenues on console and PC-based gaming for the first time.
With respect to ML, ARM says that the Mali-G77 provides devices with the capabilities to perform “increasingly complex” ML tasks faster on the device with 60% performance density improvement. This is better than sending them to the cloud for processing, which leads to more security concerns and decreased performance, as well as higher latency.
The new Valhall architecture is the basis of the Mali-G77 and future Mali GPUs. ARM says that the following features of Valhall make it a “novel architecture”:
- “A new superscalar engine, which delivers another leap in energy efficiency and performance density
- A simplified scalar ISA with a new instruction set that is more compiler friendly
- New dynamic scheduling of instructions
- Reworked datastructures better aligned to modern APIs, such as Vulkan.
- While there are many different advancements and new features, the two key ones are the execution engine and texture mapper in Mali-G77.”
The wide execution engines of the Mali-G77 improves performance density through sharing control over a wide number of lanes, according to ARM. The Mali-G76 has 8-wide warps and a total of 24 FMA lanes per shader core, while the Mali-G77 has 16 wide-warps, 32 lanes (two clusters of 16 FMA per execution engine) and one engine per shader core. This results in 33% more compute in the same area when compared to the G76, according to the company.
ARM also states that the improved gaming performance of the Mali-G77 is linked to the quad texture mapper, which provides four texels/cycle, that is 2x better throughput than the Mali-G76 and 4x greater than the G72. It’s said to provide improvements across the board of high-fidelity and casual gaming, but it will have an especially large impact on texture heavy games. The compute capability of the G77 has been increased, so the texture capability also needed to be increased to keep the machine balanced, according to ARM. The end goal? Deliver more performance per square millimeter than before.
The Mali-G77 has been optimized to match the new 16-wide execution engines and the quad texture mapper. This optimization includes a re-design of the LSC and attribute pipe with a focus on performance density and energy efficiency.
ARM says that it has a “significant focus” on improving energy efficiency, and promotes that the Mali-G77 can do the same work in 50% of the energy of Mali-G72 from two years ago. According to the company, the Valhall architecture and the Mali-G77 boost energy efficiency across all workloads, leading to an improvement of 1.3x across “a wide range of content,” which means that users will get longer battery life on premium devices.
ARM states that dynamic instruction scheduling is now handled in hardware to enable better performance. The dynamic scheduler is said to decide which instructions to execute from which warps, and the work is then issued to independent parallel ALUs in superscalar style.
Lastly, ARM notes that the Valhall architecture continues the evolution of the ARM Frame Buffer Compression through AFBC 1.3. It brings some new features that can be read in ARM’s blog post.
ARM has some big promises for the Mali-G77, proclaiming that it will bring significant performance improvements in complex AR and ML, and provide “uncompromising graphics performance and increased efficiency.” If the claims play out, we may finally see an ARM Mali GPU going head to head with, or even bettering the Adreno GPU of a given generation, and the mobile GPU market just become quite a bit more competitive.