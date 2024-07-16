Key Takeaways Arm's energy-efficient computing approach has given its partners an edge in power consumption and performance.

Intel aims to challenge Arm's efficiency claim with Lunar Lake, but improving x86 will require significant work.

While Intel believes x86 can match Arm's efficiency, hardware limitations may hinder immediate gains.

Arm is often lauded for its efficiency; after all, there's a reason it's become the most pervasive architecture keeping portable devices going for longer. It's been in our smartphones for well over a decade (sans a brief x86 stint that very few phones took part in), and laptops are starting to make the transition thanks to the extended battery life that those machines can get.

With that said, Intel claims with Lunar Lake that it can "bust the myth" that Arm is more efficient than x86. However, is Arm more efficient than x86? Is there a myth? Or is it all just marketing? When I asked Arm for their stance on the efficiency of the architecture, they gave me the following comment.

Arm has always concentrated on energy-efficient computing from its very inception, and this has enabled Arm's partners to derive solutions that consume lower power than their competition while delivering on the performance that is required. This approach has been key to the use of Arm in mobile phones and IoT, and why Arm is so pervasive. Arm's approach of providing highly performant, energy-efficient cores has enabled partners to develop solutions that are more energy-efficient than the solutions that those partners have been able to develop with other architectures - NVIDIA is a great example; what they are doing with Grace Blackwell is one of the best examples for how a partner has been able to leverage the flexibility of Arm, innovate and customize on top of Arm technology to reduce its power consumption by 25x, while also increasing performance by 30x per GPU, compared to NVIDIA H100 GPUs using competitive architectures for LLMs.

Intel did not respond to a request for comment.

x86 has benefits over Arm, but efficiency isn't one of them

If x86 can be power efficient too, then Intel will get the best of both worlds

First and foremost, x86 is an extremely powerful architecture. x86 processors are based on the Complex Instruction Set Computing (CISC) architecture, which includes a larger set of more complex instructions, which in turn consume more power. Some x86 instructions can even require multiple cycles to execute, which in turn can increase power but decrease efficiency.

With its more complex instructions, too, x86 can have a more complex pipeline. For example, x86 uses a variable-length instruction set from 1 to 15 bytes, whereas Arm itself is of a fixed length (though Thumb instructions can be variable). Branch prediction is also significantly more important in an x86 processor thanks to the complexity of the instructions being executed, as those instructions are often converted into simpler RISC-like micro-ops. These are significantly advanced predictors as the cost of a mispredict and a subsequent stall can be significantly more taxing than the cost of a stall in the Arm architecture.

On top of that, Arm has fewer transistors per instruction, which is part of what lends itself to having a reduced power requirement. These are just some of the ways that Arm manages to be efficient, but there are a ton of minor differences between the two architectures that gives Arm an advantage. However, fewer transistors per instruction also means a reduced complexity, which is where x86 can shine as a powerful architecture capable of meeting huge computational demand.

Arm's efficiency comes is an architectural advantage

Intel has a lot of work to bring x86 on par

To make x86 capable of being as efficient as Arm, there's a lot of work that Intel would need to do. For starters, the instruction set itself is costly in a power sense, as the fetch, decode, and execution cycle is more complex on x86 than it is on Arm. Combining simple instructions into a single micro-op can also help, especially when it comes to reducing overheads.

In contrast, Arm's RISC architecture is a massive advantage, particularly given that each instruction in Arm is designed so that it's faster and easier to execute. They also have fixed-length instructions which makes decoding simpler, and the use of lower-bit Thumb instructions can reduce the size of code and with a reduced memory space. Thumb instructions are smaller, meaning fewer memory fetches are required for execution, and more instructions can fit into the processor's cache.

On top of that, Arm chips are often a part of a wider system-on-chip, rather than a separate CPU that interfaces with the rest of the computer through a motherboard. Those direct connections that an Arm CPU can have to the memory controller, GPU, and other important aspects of the computer's hardware can lend themselves to efficiency gains too. This is exactly how Apple's unified memory works, and is a contributor to the company's excellent battery life.

Intel can make x86 more efficient, but the company is quiet about how

At the moment, all we have is faith

Intel is confident that x86 can be just as efficient as Arm, but the problem is, there are several hardware limitations that make that difficult to do. There are definitely improvements to be made to the x86 architecture, but I'd be surprised if we see efficiency gains out of Lunar Lake that actually rival the best of what Arm can do. I don't want to write off Intel entirely and say they can't do it, but I tried to get information about how Intel plans on "busting the myth", and I didn't receive any information from Intel at the time of publishing on how exactly they planned on doing that.

I'm hopeful that Intel can do it. Not because I'm an Intel fan, but because competition is great for everyone. Intel was a dominant force in the CPU landscape for years, but then AMD caught up and now new players are starting to enter the fold, too. Intel is truly on the backfoot for the first time in a long, long time, and it wouldn't just be a fantastic comeback to launch an x86 processor that can go toe to toe with Arm; it would be a phoenix rising from the ashes.

We're still a while away from seeing Lunar Lake actually reach devices, so we won't be able to make judgments until the end of the year and beyond. Intel says that shipments will start in Q3 and Q4 of 2024, but we expect it will be like Meteor Lake, where the real consumer access will come in 2025. While Intel has also been boisterous about its efficiency compared to Arm, the launch didn't tell us much about what will actually reach consumers.

What are the SKUs? Are the eight cores in Lunar Lake referring to the top-end SKU, or all SKUs? Is the NPU performance, advertised as up to 48 TOPS, going to come to all of the SKUs? There are a lot of questions with very few answers, and the biggest question mark at the top is efficiency. We'll need to wait and see, and while I'm excited to see Intel back at the top again, there's no doubt that the company has been pushing for some inconceivable achievements. While the company is on track to meet its ludicrous goals of "five nodes in four years," it's just yet another example of Intel suddenly playing catch up when compared to the rest of the industry.