Qualcomm Hexagon 685 DSP is a Boon for Machine Learning

Qualcomm Hexagon 685 DSP is a Boon for Machine Learning

The Snapdragon 845 – the newest system-on-chip in Qualcomm’s Snapdragon family – is a powerhouse of a processor. It boasts speedy CPU cores, a third-generation Spectra image signal processor (ISP), and an architecture that’s 30 percent more power-efficient than the previous generation. But arguably its most impressive component is a co-processor — the Hexagon 685 DSP — that’s tailor made for artificial intelligence and machine learning.

Just what makes Qualcomm’s Hexagon 685 DSP tick?

hexagon dsp

The Hexagon DSP architecture in the Snapdragon 835. Source: Qualcomm

“Vector math is the foundation of deep learning.” – Travis Lanier, Senior Director of Product Management at Qualcomm

To understand what makes the Hexagon DSP so unique, it helps to know that AI is driven by the kind of math college engineering majors are intimately familiar with. Machine learning involves computation with large vectors, which poses a challenge for smartphone, tablet, and PC processors. It’s hard for general-purpose chips to compute algorithms like stochastic gradient descent — the sorts of algorithms that are at the core of AI-powered apps — quickly and efficiently. Qualcomm’s Hexagon DSP was introduced in part to solve this: It’s great at handling image and sensor data, especially photography.

But the Hexagon DSP is capable of much more than sprucing up selfies. The included HVX contexts (more on those later) give it the advantage of both general-purpose processors and fixed-function cores; the Hexagon 685 DSP is terrifically efficiency at computing the math behind on-device machine-learning, but retaining the flexibility of more-programmable processors.

AI chips like the Hexagon 685 DSP, which are sometimes referred to as “neural processing units”, “neural engines”, or “machine learning cores”, are tailored specifically to AI algorithms’ mathematical needs. They’re much more rigid in design than a traditional CPUs, and contain special instructions and arrangements (in the Hexagon 685 DSP’s case, the aforementioned HVX architecture) that accelerate certain scalar and vector operations, which become noticeable in large-scale implementations.

The Snapdragon 845’s Hexagon 685 DSP can handle thousands of bits of vector units per processing cycle, compared to the average CPU core’s hundreds of bits per cycle. That’s by design. With four parallel scalar threads for Very Long Instruction Word (VLIW) operations and multiple HVX contexts, the DSP is capable of juggling multiple execution units on a single instruction and blazing through integer and fixed-point decimal operations.

Rather than pushing performance through raw MHz, the Hexagon 685’s design aims for high levels of work per cycle at a reduced clock speed. It includes hardware multi-threading that works well for VLIW, as multi-threading hides pipeline latencies enable better utilization of VLIW packets. The multi-threading of the DSP means it can service multiple offload sessions — i.e., concurrent apps for audio, camera, computer vision, and so on — and accelerate various tasks concurrently, preventing applications from having to fight for execution time.

Source: Qualcomm

But those aren’t the Hexagon DSP’s only strengths. Its instruction set architecture (ISA) boasts improved efficiency over traditional VLIW thanks to the improved control code, and it employs clever tricks to recover performance from idle and stalled threads. It also implements zero-latency round-robin thread scheduling, meaning that the DSP’s threads process new instructions immediately after completing the previous data packet.

hexagon dsp

Source: Qualcomm

None of this is new, to be clear. Qualcomm introduced the ‘first-generation’ (or proper) Hexagon DSP  — the Hexagon 680, or QDSP6 v6 — alongside the Snapdragon 820 in 2015, and the Hexagon 680 was followed by the ever-so-slightly improved Hexagon 682. But the latest generation is the most sophisticated yet, and delivers up to three times the overall performance of the Snapdragon 835’s DSP.

That’s thanks in large part to the HVX, which worked very well for image processing (think augmented reality, computer vision, video, and pictures). The DSP’s HVX registers can be controlled by any two of the scalar registers, and the HVX units and scalar units can be used simultaneously, resulting in substantial performance gains and concurrency.

Here’s Qualcomm’s explanation:

“Say you’re processing on the mobile CPU in control code mode and you switch to computational mode on the coprocessor. If you need any control code, you have to stop and go back from the coprocessor to the main CPU. With Hexagon, both the control code processor on the DSP and the computational code processor on HVX can run at the same time for tight coupling of control and computational code. That allows the DSP to take the result of an HVX computation and use it in a control code decision in the next clock cycle.”

The HVX affords another big advantage in image sensor processing. Snapdragon devices with the Hexagon 685 DSP can stream data directly from the imaging sensor to the DSP’s local memory (L2 Cache), bypassing the device’s DDR memory controller. That reduces latency, of course, but also improves battery life — the Snapdragon processor is designed to idle throughout the operation.

It’s specifically optimized for 16-bit floating point networks, and controlled by Qualcomm’s machine learning software: Snapdragon Neural Processing Engine.

“We’ve [taken] it very seriously,” a Qualcomm spokesperson said. “We’ve been working with partners for the last three years to have them utilize […] our silicon for AI and imaging.”

heaxgon dsp

Those partners include Google, which used the Hexagon DSP’s image-processing part to power the Pixel and Pixel 2’s ’s HDR+ algorithm, for example. While Google has introduced their own Pixel Core as well, it’s worth noting that Hexagon 685 DSP-enabled devices are the ones that see the best results with the famous Google Camera port, in part because (as we’ve confirmed) of HVX utilization. Facebook, another partner, worked closely with Qualcomm to accelerate Messenger’s real-time camera filters and effects.

Oppo’s optimized its face unlock technology for the Hexagon 685 DSP, and Lenovo’s developed its Landmark Detection feature around it.

One reason for platform’s wealth of support is its simplicity. Qualcomm’s extensive Hexagon SDK supports the Halide language for high-performance image processing, and there’s no need to worry about machine learning training frameworks — implementing a model is as simple as making an API call, in most cases.

“We’re not […] competing with the likes of IBM and Nvidia [in AI], but we have areas that developers can tap into — and already have,” Qualcomm told XDA Developers.

Hexagon vs. the Competition

The Snapdragon 845’s Hexagon 685 DSP comes as an increasing number of original equipment manufacturers (OEM) pursue mobile and on-device AI solutions of their own. Huawei’s Kirin 970 — the system-on-chip inside the Mate 10 and Mate 10 Pro — has a “neural processing unit” (NPU) that can reportedly recognize more than 2,000 images per second at just 1/50th the power consumption of an average smartphone CPU. And the Apple A11 Bionic system-on-chip in the iPhone 8, iPhone 8 Plus, and iPhone X has a “Neural Engine” that performs real-time facial modeling and up to 600 billion operations per second.

But Qualcomm says that the Hexagon’s platform agnosticism gives it an advantage. Unlike Apple and Huawei, which largely force developers to use proprietary APIs, Qualcomm sought to support some of the most popular open-source frameworks from the get-go. For example, it worked with Google to optimize TensorFlow, Google’s machine learning platform, for the Hexagon 685 DSP — Qualcomm says it runs up to eight times faster and 25 times more power-efficiently than on non-Hexagon devices.

Source: Qualcomm

On Qualcomm’s DSP architecture, Google’s GoogLeNet Inception Deep Neural Network — a machine learning algorithm designed to assess the quality of object detection and classification systems — demonstrated gains in a demo showing one TensorFlow-powered image recognition app on two smartphones: One that runs the app on the CPU, and the other that runs it on Qualcomm’s Hexagon DSP. The DSP-accelerated smartphone app captured more images per second, identified objects faster, and had higher confidence in its conclusion on what the object was than the CPU-only app.

Google also uses the Hexagon 685 DSP to accelerate Project Tango, its augmented reality platform for smartphones. Lenovo’s Phab 2 Pro, Asus’s ZenFone AR, and other devices with Tango’s depth-sensing IR module and image-tracking cameras take advantage of Qualcomm’s Heterogeneous Processing Architecture, which delegates processing tasks among the Snapdragon chipset’s Hexagon 685 DSP, the sensor hub, and image signal processor (ISP). The result is a “less than 10 percent” overhead on the system-on-chip’s CPU, according to Qualcomm.

“As far as we know, we’re the only mobile guys out there who [are] optimizing for performance and power efficiency,” a Qualcomm spokesperson said.

Of course, competitors are also working to expand their influence sphere and foster developer support on their platforms. The Kirin 970’s neural chip launched with support for TensorFlow and Caffe (Facebook’s open API framework) in addition to Huawei’s Kirin APIs, with TensorFlow Lite and Caffe2 integration on the way later this year. And Huawei worked with Microsoft to optimize its AI-powered Translator for the Mate 10.

hexagon dsp

But Qualcomm has another advantage: Reach. The chipmaker commanded 42 percent of the smartphone chip market in the first half of 2017, followed by Apple and MediaTek with 18 percent each, according to Strategy Analytics. Suffice it to say, it’s not shaking in its boots just yet.

And Qualcomm predicts it’ll only grow. The chipmaker’s projecting $160 billion in revenue by 2025 with AI software technologies like computer vision, and sees the smartphone market — which is expected to reach 8.6 billion units shipped by 2021 — as the largest platform.

With the Hexagon 685 DSP and other “tertiary” improvements continuously making their way downstream to mid-range hardware, it’s also easier for Qualcomm chips to bring on-device machine learning to all sorts of devices in the near future. They also offer a handy SDK for developers (no need to fiddle with DSP assembly language) to take advantage of the Hexagon 685 DSP and HVX in their applications and services.

“There’s a need for these dedicated processing units for neural processing, but you also need to expand it, so you can support [open source] frameworks,” a Qualcomm spokesperson said. “If you don’t create that ecosystem, there’s no way […] developers can create on it.”

Discuss This Story

Want more posts like this delivered to your inbox? Enter your email to be subscribed to our newsletter.