ARM’s Cortex-X custom CPU program may finally make Android flagship performance competitive with Apple

ARM’s Cortex-X custom CPU program may finally make Android flagship performance competitive with Apple

Each year in May, UK-based ARM, owned by Japan-based Softbank, announces its new Mobile IP (intellectual property) for use in mobile devices. This IP consists of new CPU cores as well as new GPUs. ARM’s instruction set is used in every smartphone in the world – it’s a crucially important company. In terms of CPU core architectures, from 2021, every major mobile chip vendor of note will use ARM’s stock CPU IP (as Samsung System LSI has given up on its Exynos M custom cores). That’s why, it’s doubly important that ARM gets things right. Now for this year, ARM has announced the ARM Cortex-A78 CPU architecture and the Mali-G78 GPU, the successors of the Cortex-A77 CPU, and the Mali-G77 GPU respectively. While these announcements were expected, what was not expected was for ARM to announce another CPU core in the form of the Cortex-X. For years, tech reviewers and users have bemoaned the fact that Apple’s CPU architectures are multiple years ahead of ARM’s Cortex-A series. With the Cortex-X CPU program and the Cortex-X1, this may finally change in 2021.

ARM knows that its customers are demanding more solutions and products based on different needs in different product segments. The Cortex-A76, for example, is used in flagship SoCs as well as in some lower mid-range SoCs. Its maximum performance was not as high as Apple’s competitors because ARM needed to focus on PPA (performance, power, and area) first. Energy efficiency and power efficiency were higher priorities for the company instead of absolute performance.

With the Cortex-X1, this changes.

ARM has announced the Cortex-X Custom (CXC) program. This program entails close collaboration with ARM engineering teams and ARM’s program partners, who can shape a final CPU product to meet specific market demands. ARM notes that this allows program partners to define their own performance points outside of the “usual Cortex-A envelope of PPA”. The final custom CPU, designed and built by ARM, will be delivered under the ARM Cortex-X brand. The first CPU as part of the CXC program is the ARM Cortex-X1 CPU.

ARM is very proud of the Cortex-X1, saying that it’s the most powerful Cortex CPU to date. It brings 30% peak performance improvement over the current Cortex-A77. It’s said to bring “ultimate performance” for next-generation custom solutions. The CPU came in response to partners who wanted to maximize performance in line with their own use cases.

The Cortex-X1, as expected, is also faster than the newly announced Cortex-A78, which slots in below it. The wording is important here. ARM says that it provides performance uplifts when compared to the Cortex-A78 with up to 22% single-thread integer performance improvements. The “uplifts” refers to the fact that the improvements are related to short bursts of high performance, which are best for reactivity and responsiveness, according to ARM. This will supposedly enable the highest performance ever for smartphones and large screen devices, but on account of the numbers, the Cortex-X1 still won’t be able to match the upcoming Apple A14, with which it will compete. It may be able to score on par with 2019’s Apple A13, though.

The Cortex-X1 offers 2x machine learning (ML) performance improvements over the A77. This is a notable improvement, and it comes as part of ARM’s wider push for more local compute performance.

The DynamIQ cluster of 4x Cortex-A78 and 4x Cortex-A55 cores provides 20% sustained performance improvements over the 4x Cortex-A77 and 4x Cortex-A55 cluster. For more information on the 20% claim, check out our article. (Yes, ARM didn’t announce a successor to the Cortex-A55, unfortunately. It may come next year.) The Cortex-X1, on the other hand, enables greater scalability while boosting peak performance. Partners adding 1x Cortex-X1 as part of the DynamIQ cluster alongside 3x Cortex-A8 and 4x Cortex-A55 will get 30% improvement in peak performance over the previous generation, which is a feat worth noting. The A78 is specially made for efficiency, so when combined with the Cortex-X1, the combo will deliver the best sustained and peak performance. Flagship Android phones will get a lot faster.

ARM says the key market for solutions with the Cortex-X1 are smartphones and new form factors (foldable phones and big, multi-screen devices). The X1 provides a quicker UX with faster app loading times and improved web page scrolling responsiveness. AI and ML-based experiences will get better with the improvement in ML performance. The X1 will, predictably, also improve use cases such as productivity, communication, security, multiple digital immersion, camera-based, advanced gaming, and XR experiences.

ARM Cortex-X1 – CPU architecture

Cortex-X1’s architecture is where things get interesting. It has numerous microarchitectural upgrades that provide that peak performance boost. The Cortex-A76, which was announced in 2018, upgraded the instruction decode width to 4-wide from the 3-wide of the Cortex-A75, which, in turn, had increased from the 2-wide width of the Cortex-A73. However, the Cortex-A77 opted to keep the decode width constant at 4-wide. Apple’s A-series chips are big and wide, as the decode width of all A-series chips since the A11 has been 7-wide, which is wider than even desktop CPU architectures. ARM has taken a step closer to Apple with the Cortex-X1, as the decode bandwidth has been increased by 25% to 5 instructions decoded per cycle.

Moreover, ARM says the MOP cache throughput has been increased by 33% to 8 MOPs per cycle. The Cortex-X1’s Neon engine gets two additional pipes that double its compute capacity over the A78. In terms of cache sizes, the X1 supports 64kB L1 and up to 1MB L2 cache, while the DynamIQ cluster has been upgraded to now support 8MB of L3 for ultimate performance. The larger L3 can also be used by the A78 when it is used in combination with the Cortex-X1.

The Cortex-X1 is the first example of a Cortex-CPU produced under the CXC program. The very need of the CXC program is to push performance at an envelope outside of the Cortex-A PPA. That’s because all that increased performance comes at a cost. The Cortex-X1 is 1.5x the size of the Cortex-A78. This means it has worse PPA as well as worse energy efficiency. Thus it’s unlikely to be found in any mid-range or budget phone, as it will likely be restricted to high-end flagship phones. Allowing partners to have a CPU that is specific to their market needs will differentiate between the roadmap of the Cortex-A CPUs. It should be noted here that program partners will not be able to directly customize any CPU under the CXC program. Instead, the CXC program is essentially the successor of the “Built for Cortex” license, where ARM makes modifications upon partners’ request, and designs the CPU IP to be sold to the partner. In this way, ARM says it will meet the needs of the ever-expanding ecosystem.

The Cortex-X1’s target clock speed is 3GHz. ARM has been targeting 3GHz since the A76, and the clock speed has notably failed to materialize. With the upcoming advent of 5nm SoCs, though, ARM is hopeful that vendors will finally ship ARM’s big core design at 3GHz. ARM notes that all performance estimates were based on SPECint2006, which is an industry-standard benchmark.

Outlook

The Cortex-X1’s announcement is exciting for aspiring buyers of flagship Android phones in 2021. For the first time since 2013 and the Apple A7, ARM will be able to get close to Apple’s A-series chips in terms of peak performance. Even if the Cortex-X1 doesn’t match the A14, it will be closer than it was in the last seven years.

The upcoming Qualcomm Snapdragon 875 will probably incorporate both the Cortex-X1 as well as the Cortex-A78 as part of its “Prime Core” and “Performance Cores”. HiSilicon is in no position to adopt ARM’s newest IP as TSMC has been barred from supplying it chips, so Huawei phones won’t feature the new CPU cores this year, and probably not even early next year. Notably, Samsung is in a strong position to adopt the Cortex-X1 + Cortex-A78 as part of the next flagship Exynos SoC, which will succeed the Exynos 990. Samsung released a statement in which it said it was “very excited” to see the new direction ARM is taking with the Cortex-X Custom program. The Cortex-X1 essentially negates Samsung’s failed custom cores venture. It is to be hoped that next year, the Exynos-powered Galaxy S21/S30 phones will finally be free of major or minor CPU performance deficits against the Snapdragon-powered competition. Finally, it’s uncertain whether MediaTek will adopt the Cortex-X1. The Dimensity 1000‘s successor may adopt the A78 only, or it could go for the X1 plus A78 combo in order to compete head-on with Qualcomm. We will have to wait to see how things play out next year.

The future for CPU performance in Android looks bright even as one major CPU chip producer stands on the brink of closure.


Sources: ARM (1, 2), AnandTech