Qualcomm says it’s working with Meta to optimize its LLaMA AI models to run on-device.
In the tweet announcing the effort, Qualcomm lists ‘XR’ as one of the device categories.
LLaMA is Meta’s family of open source large language models (LLMs), using a similar transformer architecture as OpenAI’s closed source GPT series.
This week Meta released LLaMA 2, which benchmarks show outperforms all other open source large language models and even comes close to OpenAI’s GPT-3.5, the model powering the free version of ChatGPT.
Getting large language models to run at reasonable speeds on mobile chipsets though would be an enormous challenge, and may not happen any time soon – especially in VR where the system also needs enough overhead to run tracking and rendering at 72 frames per second minimum.
Running even the smallest variant of LLaMA 2, the 7 billion parameter model, for example requires 28GB of RAM at full precision. Lately, tinkerers have been experimenting with running LLMs at lower precision, requiring as little as 3.5GB of RAM, but this affects the output quality significantly, and it still requires considerable CPU and/or GPU resources.
If Qualcomm and Meta can eventually manage to get a LLaMA model running on a Quest headset it would open up a range of breakthrough use cases.
It could enable truly next generation NPCs, virtual characters you can actually have a conversation with, and you’d be able to interact with them to discover information in a game or experience. That could spark entirely new genres of experiences in headsets, more like Star Trek’s holodeck and less like current video games.
But still, there’s no indication that will be possible on-device any time soon. We’ve reached out to Meta and Qualcomm to ask for more specifics about their new partnership, and will update this article if we get a response.