Apple has been quietly working on generative AI for quite a while now, with company CEO Tim Cook making the announcement in a recent earnings call. Nobody really knows what that means just yet, but a recent research paper published without much fanfare by engineers at the company can give us some clues. Apple's MM1 language model is a multimodal model with up to 30 billion parameters, and can also come packaged as a mixture of experts model (MoE) that goes up to 64 billion parameters.

What's especially interesting is that the researchers also tested a 3 billion parameter model, something that is more than capable of being run locally on a device with a modest amount of RAM. This would be perfect for powering an LLM that could supercharge Siri or even replace it entirely.

MM1 compared to competitors

The research paper gives us some clues of what to expect

In the research paper, the researchers demonstrate how MM1 can reason using images and follow instructions. In one example, MM1 is compared to Emu-Chat-37B and LLaVA-NeXT-34B to analyze beers on a table and calculate the price. MM1 is the only one to get the answer right, and can explain its reasoning for that in a simple manner.

What's especially interesting about the smaller models, though, is that these researchers claim that the 3B and 7B parameter models of MM1 outperform all competing similarly-sized models. From the paper:

On average, MM1-3B-Chat and MM1-7B-Chat outperforms all listed models of the same size, setting a new state of the art for these model sizes

Apple has also built a 3B parameter model using a Mixture of Experts, which enhances its performance further. We first saw Mixture of Experts come to the mainstream with Mixtral 8x7B, lending performance equivalent to GPT 3.5 in a model that only required the hardware for a 47B model. Plus, MoE inference is typically quicker, giving further advantages.

In general, MM1 appears to compete favorably in comparison to the likes of LLaVA, Gemini Nano, and Emu2, particularly when it comes to Visual Question Answering, or VQA. For smaller models especially that are designed to run on mobile devices, that's a massive advantage. This refers to the ability for the model to answer visual inputs, including identifying image contents and reason based on those contents.

MM1 appears to be a strong performer for a number of reasons, and it's unclear if Apple will end up rolling it out to its own devices. Reports from Reuters suggest that the company is currently in talks with Google to utilize its Gemini technology, meaning that MM1 may simply be internal testing at this point. It may be that a future MM2 or MM3, for example, could be used in a future device, but the truth is that we never know with Apple.

What's especially interesting is that Macs and iPhones both have NPUs capable of running an LLM on-device, and Macs are some of the few computs that actually do. Intel and AMD are only starting to focus development on NPUs as part of their chipsets, but Apple's been here ever since its M1 chip first launched. It has a headstart in that department, which may help with whatever Apple plans to do in the future.

Siri is in desperate need of an upgrade

It's always been worse, but now it's further behind

Siri has always been the worst of the digital assistants, but as competitors like Google grow, it falls even further behind. Generative AI is the next frontier of on-device assistants, and with companies like Samsung and Google growing their AI offerings on-device, Apple likely doesn't want to fall behind. The company has always marched to the beat of its own drum, but it still feels some outside pressure, and I imagine that the growing gulf between its on-device offerings are starting to really show their age versus Copilot and Gemini.

It's hard to say when MM1 will come to a device, or if a successor will either. Apple hasn't launched this language model yet; the company has merely published its research paper with results from testing it. It may never see the light of day, and in true Apple fashion, it's possible that's exactly what happens. One thing is for sure though, and that's that Apple wants to be a key player in a growing industry.