How Qualcomm is Improving the Camera Experiences on Android Phones with its Spectra ISPs
As the maker of system-on-chips (SoCs) that power much of the world’s smartphones and wearables, US-based Qualcomm is undoubtedly one of the giants of the chip maker industry. The Snapdragon line of SoCs, for example, is used by nearly every major Android device maker for flagship, mid-range, and budget smartphones. Qualcomm gets plaudits every year at the company’s annual Tech Summit for advancements in the CPU, GPU, and AI fields, as it incorporates ARM’s new CPU microarchitectures and complements them with yearly improvements in its custom GPUs. However, its advancements in the field of cameras aren’t noticed as much, as they tend to go under the radar.
This doesn’t mean, however, that Qualcomm’s work in smartphone cameras is unimportant. On the contrary, Qualcomm’s Spectra ISPs in its Snapdragon SoCs help make much of modern smartphone cameras possible with increased computational processing power, features such as 8K video recording, HDR10 video, support for high-megapixel QCFA cameras, and much, much more. Qualcomm has promoted that the Spectra 380 ISP in the Snapdragon 855 was the world’s first CV-ISP, and it has promoted the world’s first 4K HDR video recording features, which itself has now supplemented by 2nd generation 4K HDR10+ video recording. The Spectra 480 ISP in the latest-generation Snapdragon 865 is highly capable – it can process two gigapixels per second, a 40% increase over its predecessor. It’s an intellectual property (IP) that differentiates Qualcomm from its competitors in the mobile chip vendor space.
While Qualcomm explains most of the headline features in its press releases and product keynotes, up until now, consumers haven’t got a chance to know most of the low-level detail that makes these things work.
That’s why we at XDA Developers were happy to accept an offer to speak with Judd Heape, Senior Director, Product Management at Qualcomm. XDA’s Editor-in-Chief, Mishaal Rahman, and I had an interview with Judd in June 2020 to learn and see how Qualcomm is pushing the goalposts with smartphone photography and video recording. We spoke about topics including AI image processing, multi-frame noise reduction (MFNR), AV1, Dolby Vision video recording, pixel binning in high-megapixel cameras, and much more. Let’s take a look at Judd’s insights at each topic one-by-one:
AI image processing workloads
Mishaal Rahman: I’ll start off with one of the ones that Idrees had, which is an interesting one, and which I was also interested in. So we’re wondering what are the AI image processing workloads that Qualcomm uses in the Spectra ISP and to what degree are they customizable by device makers?
Judd Heape: Yeah, so we look at a lot of AI workloads and there are some AI that can run in the ISP itself like, for example, our next generation 3A: auto exposure, auto white balance, and auto focus are AI based.
But we also look at a few other AI workloads, which would run outside of the ISP, in one of the other computing elements. So in particular we look at things like: we have an AI based noise reduction core which runs externally from the ISP, in the AI engine (AIE) part of the chip.
Also, we have things like face detection, which is a full deep learning engine that also runs in the AIE complex, but of course assists the camera. And there’s other things we are working on other than face detection and denoising; we’re also looking at doing things like an auto adjustment of snapshots using AI that would automatically set parameters per scene based on HDR content, we’d process to modify shadow and highlights and color and that sort of thing.
One of our partners, Morpho, just won a huge AI workload award at the Embedded Vision Summit this year. Independent software vendor partners also have a lot of really intense AI-based algorithms and those can range from anything like smooth camera transition, like what Arcsoft does, (I mentioned that at the last Snapdragon Tech Summit which is AI-based), to Morpho’s semantic segmentation engine. Morpho’s solution is an AI engine that understands different parts of the scene, like what is you know, fabric versus skin versus sky and grass and building and that sort of thing and then the ISP can take that information and process those pixels differently for texture and noise and color for example.
Qualcomm’s statement: For ML & AI we’re also not announcing any new updates for the features of face detection and “3A” (AE, AF and AWB) today, either. However, as Judd said, we are committed, going forward, to bringing more ML/AI capability to the camera, including these two feature areas.
Multi-frame noise reduction
Idrees Patel: Qualcomm has been mentioning multi-frame noise reduction as a feature. I would like to know more detail about it as in how the image stacking works. Is it similar in any way to like what Google is doing with their HDR+ technology or is it completely different?
Judd Heape: It’s similar but different. Imagine the camera doing a burst and capturing five to seven frames in rapid succession. Then the ISP engine takes a look at those frames and picks the best one (called the “anchor frame”) for focus and clarity and then it can pick 3-4 frames on either side of that frame and then average them all together. It tries to pick frames that are close enough together so that there’s very little movement.
And when it settles on those frames, it then averages them together to discern what is different, for example, what is actual image data versus what is noise data. So when you have more and more information, from more and more frames, you can actually do simple things like look at the differences between the frames. The differences are probably noise, whereas what’s equal in the frames are probably image data.
So we can do that real-time frame combining to reduce noise. Now, you can also do the same thing with low light and HDR and that’s a lot like what Google is probably doing. We’re not privy to their algorithm. But they’re using multi-frame techniques to increase sensitivity so that you can better “see”; once you’ve reduced the noise floor, you can now look at doing more local tone mapping, or adding gain to the image without adding more noise.
So that’s how they handle low light, as well as HDR. Enhancements to the multi-frame noise reduction feature will be coming from Qualcomm, which will also include low light and HDR. But that is something we’ll roll out shortly.
Mishaal Rahman: So you mentioned rolling out this feature shortly. Is that coming in like an update to the BSP for partners?
Judd Heape: In our next-generation products, through a software addition, we will have the ability to engage with – actually it’s happening right now on the next generation products – we’re engaging with customers right now to do more multi-frame techniques beyond noise reduction, but also to handle HDR and low-light situations. It is using the same base ISP HW engine, but we’re adding more software to handle these multi-frames for more than just noise reduction.
So it’s not something that has rolled out but we’re engaging with some key lead customers on those features.
Super resolution for video
Mishaal Rahman: So, something that I heard at the Tech Summit. Actually, I think it was in an interview with Android Authority. Is that Qualcomm is planning to extend super resolution to video as a software solution for partners and that this would be rolling out in an update, apparently. I’m wondering if you have any updates to share on this feature.
Judd Heape: Yes, so that’s a feature that we’ve had the ability to do for a while, and it’s just now rolling out. I wouldn’t say it’s in a software update, but I would say it’s kind of like an added benefit of the existing multi-frame, low-light feature capability. We are engaging with some specific lead customers on that feature. So yeah, video super resolution is something in another generation or so we will have it as what we call a plan of record feature where it actually is built into the software code base for [the] camera. But right now, it’s more on the level of specific customer engagements for that new feature.
High-megapixel Quad Bayer sensors
Idrees Patel: Let’s talk about Quad Bayer sensors. Since 2019, many phones now have 48MP, 64MP, and now even 108MP sensors. These are Quad Bayer sensors; you don’t actually have true color resolution of 48 or 64 or 108MP. One thing I wanted to ask was how is the ISP differing in terms of image processing for these Quad Bayer or Nona Bayer Sensors (4-in-1 or 9-in-1 pixel binning), when compared to traditional sensors, which don’t have any pixel binning.
Judd Heape: Yeah, so of course the benefit of these quad CFA (Quad Color Filter Array) sensors is the ability in bright light to run them at full resolution, and then the ISP can process them at a full 108 megapixels or 64 megapixels or whatever is available.
However, typically in most lighting situations, like indoor or dark, you have to bin because the sensor pixels are so tiny that you have to combine pixels to get the better light sensitivity. So I would say the majority of the time, especially if you’re shooting video or if you’re in low light for snapshot, you’re running in binned mode.
Now, the ISP can process the sensor either way. You can look at the sensor in binned mode in which case it’s just a regular Bayer image coming in, or it can look at it in full resolution mode in which the incoming data is quad CFA. And if it’s in that mode the ISP converts it to Bayer.
So we’re doing – what we call – “remosaicing”. This is doing some interpolation of the quad CFA image to make it look like full resolution Bayer again. And that is typically done in software for snapshot, although we are going to eventually add this capability in the hardware to support video as well.
What is in the ISP hardware today is binning. So you can bin in the sensor and you can actually have the sensor decide if it’s going to output full or quarter or 1/9th resolution or you can bin in the ISP. And that’s a feature that we added in Snapdragon 865, actually. So if you bin in the ISP and then run the sensor at full resolution that gives is the ISP to have the ability to have both the full resolution image and the binned image at the same time. Therefore, it can use the smaller resolution or “binned” image for video (camcorder) and preview (viewfinder) and simultaneously use the full resolution image for full-size snapshot.
But again, that would be in the case of bright lighting conditions. But at least if you bin in the ISP, you have the ability to handle both the big and little image at the same time, and therefore, you can get simultaneous video and snapshot, you can also get full resolution ZSL; all without having to switch the sensor back and forth, which takes a considerable amount of time.
This is a really good feature. And as Quad CFA sensors and even you know, the 9x sensors and maybe even more come out, and as these sensors become more ubiquitous – we’re looking more and more to handle those sensors in the hardware, not just for binning but also for remosaicing.
And so the benefit of that is that if you do it in the hardware versus doing it in software you reduce the latency for your customers and therefore, your shot to shot times and your burst rates will be much faster. So as we march forward with new ISPs and new chips, you’ll start seeing a lot more of what we’re doing for these new types of sensors put into hardware.
ML-based facial recognition
Mishaal Rahman: So I think earlier you had mentioned that ML-based facial recognition is supported in the Spectra 480. That’s something that I actually heard at the Tech Summit. [That this is] one of the improvements from the 380 to the 480; that it’s part of the – there’s a new objective detection block in the video analytics engine that’s used for spatial recognition going forward.
Can you talk more about how much this improves facial recognition and what potential applications do you see it being used by vendors?
Judd Heape: Yeah actually, so you’re right in the embedded computer vision block, which is the “EVA” block, which we talked about at Tech Summit. That has a general object detection core in it which we’re using when the camera is running, we’re using that to detect faces. The techniques in that block are more traditional techniques, so the object recognition is done with traditional classifiers, but on top of that we do have a software engine running to actually improve the accuracy of that block.
So we’re using ML-based software to filter out the false positives, as the hardware might detect more things as faces in the scene, and then the ML software is saying, “okay that is a face”, or “that’s really not a face” and so it’s increasing the accuracy by a few percentage points by running that ML filter on top of the hardware.
I mentioned a lot of things about the future. Going forward in the future, what we plan to also do is run the actual entire face detection itself in ML or in deep learning mode in software. Especially, that will be true at the lower tiers, so for example in a tier where we don’t have the EVA hardware engine, we will start to phase in deep learning as detection, which is running in the AI engine of the chip and then later on, in the upper tiers in the 700-800 tiers we do have the EVA hardware to do this…
I will say in general though, we will be moving more toward ML approaches to do face detection and that would include both software in the medium term and hardware in the later term. I’m not going to disclose which products will have it but of course as we march forward in improving the ISP, we will be adding more and more hardware capability to do ML, for sure.
Mishaal Rahman: Awesome. Well, I think it’s a given that the direction you’re going is bringing the 800 series’ machine learning improvements down to the lower tier, so I think that’s generally a given. But of course, no specifics you can give us on that. Thank you for the update.
Judd Heape: Face detection is something that we’re very passionate about. We want to improve these accuracies, you know generation over generation in all tiers all the way from 800 tier down to the 400 tier. ML is a big part of that.
Improvements in the image processing engine
Mishaal Rahman: Awesome. So one of the things that I briefly heard during the roundtable discussions after the Snapdragon Tech Summit was an improvement to the image processing engine. I heard that there’s been improved low middle frequency noise reduction or LEANR. And that you’re applying a dynamic reverse gain map; is it something you mentioned earlier in the conversation.
Judd Heape: Oh, okay. So I think you’re mixing two things together. Yeah, so there is the LEANR core, which is the core that works on noise reduction on more coarse grain, which helps in low light. That’s a new block which was added in Snapdragon 865 into the ISP, and that’s one thing.
The reverse gain map is something else. That is something else I mentioned at the round tables, but that is to reverse the effects of lens shading. So as you know, if you have a handset and it’s got a small lens; the center of the lens is gonna be bright and the edges are gonna be more vignetted; meaning they’re gonna be darker.
And so in years past in the ISP, what we’ve had is we’ve applied a static reverse gain map to get rid of those dark edges. And so that’s been in the ISP for quite some time. What we added in Snapdragon 865 though, is the ability for that gain map to change dynamically given the particular image frame, because if you apply a lot of gains to the edges what happens is the edges can get clipped, especially if you’re looking at bright light scenes outside, like blue sky can kind of turn white or the edges will clip due to a lot of gain.
So in the Snapdragon 865, that reverse gain map is not static; it’s dynamic. So we’re looking at the image and we say, “okay these parts of the image are being clipped and they shouldn’t be” so we can roll off the gain map naturally so that you don’t get bright fringes or halo effects or this sort of thing from correcting the lens shading. So that’s different from noise reduction, and they’re two different cores.
Low light photography and aggressive noise reduction
Idrees Patel: So one thing I wanted to ask about was low light photography. Like in the past few years, there have been a lot of [OEM-implemented] night modes, but one thing I’ve been noticing is that many device makers go for aggressive noise reduction, which reduces detail, to the point where even luminance noise is removed.
So my question is that is Qualcomm advising any device makers not to do that and is it something that their processing pipelines do, or is it something influenced by the ISP in the SoC.
Judd Heape: A lot of that has to do with tuning, and if you don’t have multi-frame, or I would say a very good image sensor is not available, with a high sensitivity or optics with low f numbers. One way to get rid of noise in low light in particular is to apply more noise reduction, but what happens when you apply more noise reduction is that you lose details, so sharp edges become blurry. Now, you can get rid of that if you apply these multi-frame techniques. Or if you apply AI techniques, which can sort of figure out where edges of objects and faces are, and that sort of thing. So applying just brute force noise reduction in this day and age is not really the way to handle it because you end up losing detail.
What you want to do is do multi-frame techniques or AI techniques so that you can still apply noise reduction to more like interior areas of objects while keeping nice clean edges or keeping sharp edges on objects. So that’s what I would say: using either AI or multi-frame is the way to do the noise reduction and improve imagery in low light going forward.
Idrees Patel: Yes, and that’s exactly what I wanted to hear. [It’s] because that’s the main thing that separates great smartphone cameras from middle-tier or budget-tier cameras.
Judd Heape: Yeah.
Idrees Patel: Great smartphone cameras know when to apply noise reduction and when not to.
Judd Heape: Exactly. Yeah, and like I said, the camera tuning is really done by our customers or OEMs, and some OEMs prefer a softer image with less noise. Some prefer to reveal more detail with maybe a little bit more noise.
And so it’s a trade-off and so you have limitations. And it’s like I said the best thing to do, is [to] get a better image sensor with higher sensitivity, bigger pixels or lower f-number optics, because then you get more light in from the start, this is always better. But if you can’t do that, then instead of just cranking up the noise reduction and losing detail, what you want to do is to use multi-frame or AI techniques.
AV1 decoding and encoding
Mishaal Rahman: So this is kind of a little bit separate from the other discussions we’re having about camera quality. One of the things that some people in the open source media codec community have been wondering is when Qualcomm will support AV1 decoding and possibly encoding. I know that one is a little bit of a stretch but Google is requiring 4K HDR and 8K TVs on Android 10 to support AV1 decoding and Netflix, YouTube, they’re starting the rollout of videos encoded in AV1. So it looks like a slow uptick of AV1 encoded videos. So we’re wondering when at least the decoding support will be available in Spectra.
Qualcomm’s statement: Per your question on AV1 – we have nothing to announce today. However, Snapdragon is currently capable of AV1 playback via software. Qualcomm is always working with partners on next-generation codecs via software and hardware making Snapdragon the leader in HDR codecs including capture and playback in HEIF, HLG, HDR10, HDR10+, and Dolby Vision. Of course, we realize to bring the best CODEC experiences to our customers, including support of high resolution and lowest power, that implementing these in HW is desired.
Video recording – motion compensation
Mishaal Rahman: So I don’t know if Idrees has any more questions but I did have one question about something that I read back at the Snapdragon Tech Summit. It’s about the motion compensated video core. I heard that there’s like improvements in the motion compensation engine, to reduce the noise when video recording. I was wondering if you can expand on what exactly it’s been improved and what’s been done.
Judd Heape: The EVA (Engine for Video Analytics) engine has been improved with a more dense motion map core so that the EVA engine, you know, for example is always looking at the incoming video and it has a core in there that’s doing motion estimation. What we’ve done is we’ve made that core a lot more accurate where it does it on almost a per pixel level rather than kind of like a more coarse block level and so we’re getting a lot more motion vectors out of the EVA engine in Snapdragon 865 than we did in previous generations. And what that means is that the video core doing encoding can use those motion vectors to be more accurate about the encode, but the ISP on the camera side also uses that information for noise reduction.
So as you know, for generations we’ve had motion compensated temporal filtering, which is really the active noise reduction during video, which averages frames over time to get rid of noise.
The problem with that technique, though, is if you have movement in the scene. Movement ends up just getting rejected from noise reduction because it can’t be handled or it gets smeared, and you get these ugly trails and artifacts on moving things. So, in motion compensated temporal filtering, what we’ve done in the past since we didn’t have this dense motion map for local motion, we have – simply only handled cases when you’re moving the camera, it’s quite easy because everything’s moving globally.
But if you’re shooting something and you have an object moving WITHIN the scene, what we did before [was that] we just ignored those pixels because we couldn’t process them for noise, because it was a locally moving object. And therefore, if you averaged frame-by-frame, the object was in a different place every frame so you couldn’t really process it.
But on Snapdragon 865, because we have the more dense motion map and we have the ability to look at the motion vectors on almost a pixel by pixel basis, we’re actually able to process those locally moved pixels frame by frame for noise reduction, whereas before we couldn’t. I think I mentioned a metric in the talk. I don’t remember the number (it was 40%) but it was a large percentage of pixels on average for most videos that can now be processed for noise whereas in the previous generation, they couldn’t be. And that’s really in part to having the ability to understand local motion and not just global motion.
Video recording – HDR
Idrees Patel: Another question I have is about HDR video. This year, I am seeing many more device makers offer HDR10 video recording. So is it something that was promoted with the Snapdragon 865, or has it been there since a few generations.
Judd Heape: Oh yeah, so as we talked about it at Tech Summit, we’ve had HDR10, which is the video standard for HDR on the camera encode side for a few generations now, since Snapdragon 845, I believe, and we’ve constantly improved that.
So last year, we talked about HDR10+, which is 10-bit HDR recording, but instead of with static metadata it has dynamic metadata, so the metadata that’s captured by the camera during the scene is actually recorded in real time, so that when you play it back the playback engine understands if it was a dark room or a bright room, and it can compensate for that.
We also last year at Tech Summit talked about Dolby Vision capture, which is Dolby’s alternative to HDR10+. It’s very similar where they actually produce the dynamic metadata as well. So Snapdragon today can support all three of these formats: HDR10, HDR10+, and Dolby Vision capture. And so there’s really no constraint, our OEMs can choose whichever method they prefer. We’ve had customers using HDR10 for a while now, and we have last year and this year more and more customers picking up HDR10+. And I think in the future, you’ll see some adoption of Dolby Vision Capture as well.
So yeah, we’ve been promoting that heavily. HDR is really important to us, both on the snapshot side and on the video side. And like I said, we’ve been committed to the HDR10 and HDR10+ and now Dolby Vision formats, you know since Snapdragon 845 and now even recently Snapdragon 865 for Dolby Vision.
Mishaal Rahman: Also, I actually wasn’t sure if any vendors implemented Dolby Vision recording yet, but I guess that answers that question. [That’s] something we’ll see in the future.
Judd Heape: Of course – I can’t comment on which vendors are interested and sort of thing. That would be a question for Dolby; it’s their feature and so if you wanted more information about that, I would suggest contacting Dolby. But to date, as far as I know, there’s been no handset that’s yet come out with Dolby Vision Capture.
Idrees Patel: Because you need display support as well. I’ve noticed that smartphone displays support HDR10 and HDR10+ but not Dolby Vision.
Judd Heape: Yeah actually, but Dolby Vision playback has been supported on Snapdragon in the past. It can work with a given display and the display doesn’t have to necessarily meet any specific criteria to be Dolby Vision compliant except that Dolby will grade the display and make sure that it has a certain color gamut, gamma, a certain bit depth, a certain brightness and a certain contrast ratio.
So, you know, you can buy an HDR10 display, but you can also buy a handset that supports Dolby Vision playback, but Doby will have qualified that display to make sure it’s compliant with their strict requirements.
Collaboration with software vendors: Imint, Morpho, and Arcsoft
Mishaal Rahman: I guess just one question for me to follow up on, to do more research with is one company that we’ve talked to recently is Imint. They recently upgraded their Vidhance Stabilization software to work with the Spectra 480. I know you guys work with a lot of companies who also take advantage of the Spectra 480, the processing. I’m wondering if you’re able to disclose more examples of these technologies that have – or the partners that you’ve worked with, just so it’s] something we could follow up on to, learn more about how Spectra 480 is being used in the field.
Judd Heape: We work with a lot of software vendors. Like what we mentioned in the past, Dolby is one of them. There are other ones like you mentioned, Imint/Vidhance for EIS (Electronic Image Stabilization). We also mentioned Morpho and Arcsoft before, we work with them very closely as well.
As far as how we work with them though, our policy is that we really want to work really closely with these independent software vendors and make sure that whatever they’re doing in software, that they’re able to leverage the hardware in Snapdragon to get the lowest power consumption possible.
So one of the things that we’re doing with these vendors is we’re making sure they have really good access to the HVX engine, or the Hexagon DSP core. They’re also using the EVA engine to get motion vectors and to use the hardware and in EVA engine for image manipulation so that they can perform image movement, translation and de-warping and that sort of thing in a hardware rather than using the GPU to do that.
And so, we really work closely with these ISVs, especially the ones I mentioned in particular, to make sure that they’re not just putting everything and software in the CPU but they’re using things like the DSP and hardware accelerators in the EVA to get better performance and lower power consumption. So that’s really important to us as well because it gives our customers the best possible mixture of features and power consumption.
[Closing Comments from Judd]: I just wanted to say, thank you guys for all the really good questions. They’re really, really detailed. I’ve been at Qualcomm for about three years now and looking at our past, even beyond my tenure here where we started on Spectra before Snapdragon 845, we worked really hard to dramatically improve the ISP, and the camera, and just the overall experience over the past several years. I’m really excited about even what the future brings. And I’m excited about what we’ll announce at future Tech Summits that you guys can get to ask and write about. [Spectra Camera], probably, in my opinion, is one of the most exciting technologies at Qualcomm.
It was great to have a discussion with Judd about Qualcomm’s contributions to smartphone photography. We can have mixed feelings about the company and their patent licensing system, but Qualcomm’s mark on the smartphone industry is felt by everyone, whether you talk about patents, 4G and 5G, Wi-Fi, the Adreno GPUs, the Spectra ISPs, and the Snapdragon chips themselves, which are largely held to be the gold standard in the Android smartphone market.
There are still many pain points that have to be resolved in smartphone photography, but the future is bright as Qualcomm promises that to make more advancements in the vast, growing fields of ML, which powers AI. Let’s see what Qualcomm has to announce in this field at the next Snapdragon Tech Summit.