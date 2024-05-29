Key Takeaways AI TOPS, or Tera Operations Per Second, doesn't tell the full story of a processor's power due to varying precision levels of operations.

Most AI tasks currently run on CPUs rather than NPUs, making AI TOPS more about marketing than real-world performance.

When buying an "AI PC," prioritize a powerful graphics card over AI TOPS as GPUs remain the best for powering AI workloads.

In recent years, the term “AI TOPS” has become a buzzword in the tech industry, particularly in discussions surrounding artificial intelligence and machine learning. But what does it actually mean, and why is it often considered a largely meaningless term? Whenever you see "TOPS" in reference to AI, there's a lot more happening behind the scenes than most people realize.

TOPS stands for “Tera Operations Per Second" or "Trillion Operations Per Second." It’s a measure of computational power, indicating how many trillion operations a processor can handle in a single second. On paper, this seems like a straightforward metric to compare the performance of different AI chips or processors. However, the reality is a bit more complex.

The reality of measuring TOPS

There's a lot more than meets the eye

Source: Apple

When discussing AI TOPS, context is everything. AI tasks can vary widely in terms of complexity and resource requirements. For instance, some tasks may be heavily reliant on matrix multiplications, while others might focus more on convolutions (such as in the case of Convolutional Neural Networks, or CNNs). The type of operations being performed can significantly impact the effective performance of a processor, rendering a simple TOPS number somewhat superficial.

Even without that, there's more to it still. A critical aspect that's particularly often overlooked when discussing AI TOPS is the precision of the operations being performed. Precision refers to the format and accuracy of the data being processed. In AI and machine learning, different tasks may require different levels of precision, typically measured in bits, such as 8-bit, 16-bit, or 32-bit operations. As a primer, here are some of the most common bits of precision measurements.

• 8-bit Precision (INT8/FP8): This lower precision is often adequate for many AI tasks, particularly for inference tasks like image recognition or object detection. Using lower precision reduces the computational power and memory required, enabling faster processing speeds.

• 16-bit Precision (INT16/FP16): This level of precision strikes a balance between speed and accuracy. It's commonly used in AI model training to speed up the process without significantly sacrificing accuracy.

• 32-bit Precision (INT32/FP32): Operations at this higher precision are more accurate but also much more computationally demanding. This level of precision is necessary for tasks requiring utmost accuracy, such as scientific computations and certain types of neural network training.

The problem with using TOPS as a performance metric is that it doesn’t specify the precision of the operations being counted. A processor might achieve high TOPS by performing a large number of low-precision operations, which may not be directly comparable to another processor performing fewer high-precision operations. Most of the industry now seems to be standardizing around INT8, but companies like Nvidia and Apple sometimes don't specify what precision they use in their calculations.

Hype versus real-world performance

Why use an NPU when the CPU works just fine?

A little-known fact about AI is that most of it currently runs on CPUs rather than NPUs. While this may change in the future, software developers know they can rely on CPUs to run their models and algorithms. With the standardization of CPUs, there’s no need to account for the myriad configurations found in NPUs, which simplifies development.

As a result, AI TOPS are often more about marketing than substance. In many cases, developers aren’t even utilizing the NPUs that companies heavily promote. More crucial factors to consider are the processor’s architecture, power efficiency, and how well it handles specific tasks.

For instance, most AI applications on Macs don’t utilize Apple’s Neural Engine. Tools like LM Studio and most open implementations of Stable Diffusion run on the CPU and GPU instead. This makes Apple’s Neural Engine TOPS numbers less relevant, as they don’t reflect the actual AI workloads these devices handle.

The situation is different with Intel and AMD. Both companies actively support the development and use of their NPUs. Intel already natively supports over 500 models on its NPUs and assists developers in supporting theirs. AMD is similarly proactive. In contrast, Apple only supports CoreML, an API that can use the Neural Engine in Apple Silicon but doesn’t fully support essential technologies for models like LLMs, such as quantization.

Moreover, due to unified memory, Apple’s GPUs can access the same memory space at the same speed as the Neural Engine, so storage and memory-related bottlenecks persist regardless of how AI models are executed. Consequently, comparing Apple’s NPU to others doesn’t hold much significance, even though the company continues to share TOPS numbers comparable to industry standards.

Should TOPS influence your next buying decision?

Absolutely not

While AI TOPS can provide a rough idea of a processor's computational power when it comes to artificial intelligence, it’s far from a definitive measure of its real-world performance. It's often used in marketing to differentiate one product from another without actually meaning anything, to the point that it simply muddies the waters around AI and what does and doesn't matter.

If you're looking to buy an "AI PC" of any kind, then even if Copilot+ has its own requirements, the best AI PC you can buy is one with a powerful graphics card. GPUs are still the best way to power pretty much all AI workloads, including LLMs and other on-device stuff. Companies like Qualcomm, Intel, and AMD are pushing for in-CPU NPUs, but that's not the best way to experience AI.