That’s just not true. Most requests are handled on-device. If the system decides a request should go to ChatGPT, the user is promped to agree and no data is stored on OpenAI’s servers. Plus, all of this is opt-in.
I think there’s a larger picture at play here that is being missed.
Getting the weather is a standard feature for years now. Nothing AI about it.
What is “AI” is, Hey Siri, what is the weather at my daughter’s recital coming up?
The AI processing, calculated on-device if what they claim is true, is:
the determination of who your daughter is
What is a recital? An event? Are there any upcoming calendar events that match this concept?
Is the “daughter” associated with this event by description or invitation? Yes? OK, what’s the address?
Submit zip code of recital calendar event involving the kid to the weather API, and churn out a reply that includes all this information…
Well {Your phone contact name}, it looks like it will {remote weather response} during your {calendar eventfrom phone} with {daughter from contacts} on {event date}.
That is the idea between on-device and cloud processing. The phone already has your contacts and calendar and does that work offline rather than educating an online server about your family, events and location, and requests the bare minimum from the internet, in this case nothing more than if you opened the weather app yourself and put in a zip code.
Voice processing is AI and was done by Apple servers. Previously, only the keyword “Hey Siri” was local. Onboard AI chips will allow this to be local. The actual queries will go to the servers. Phones do not have the power to run useful LLM locally- at least not with the near instantaneous response times phone users expect. A 56 Watt 128GB RAM M3 Max does around 8.5 tokens/second.
Perhaps this is why these features will only be available on iPhone 15 Pro/Max and newer? Gotta have those latest and greatest chips.
It will be fun to see how it all shakes out. If the AI can’t run most queries on the phone with all this advertising of local processing…there’ll be one hell of a lawsuit coming up.
EDIT: Finished looking for what I thought I remembered…
Additionally, Siri has been locally processed since iOS 15.
Forgive me, I’m no AI expert to fully compare the needed tokens per second measurement to relate to the average query Siri might handle, but I will say this:
Even in your article, only the largest model ran at 8/tps, others ran much faster, and none of these were optimized for a task, just benchmarking.
Would it be impossible for Apple to be running an optimized model specific to expected mobile tasks, and leverage their own hardware more efficiently than we can, to meet their needs?
I imagine they cut out most worldly knowledge etc/use a lightweight model, which is why there is still a need to link to ChatGPT or Apple for some requests, would this let them trim Siri down to perform well enough on phones for most requests? They also advertised launching AI on M1-2 chip devices, which are not M3-Max either…
Literally not what people are talking about. It’s the “AI” part of the task that doesn’t leave the device (unless it prompts to ask chat gpt). Not that it can magically gleam live info without making any request to the web…
Jeeze, fucking… get your shit straight, making me defend Apple… Fucking do better.
That’s just not true. Most requests are handled on-device. If the system decides a request should go to ChatGPT, the user is promped to agree and no data is stored on OpenAI’s servers. Plus, all of this is opt-in.
Literally impossible.
“Hey Siri, what’s the weather forecast for tomorrow.”
< The Farmer’s Almanac that is in my local model says it will rain tomorrow. >
I think there’s a larger picture at play here that is being missed.
Getting the weather is a standard feature for years now. Nothing AI about it.
What is “AI” is,
Hey Siri, what is the weather at my daughter’s recital coming up?
The AI processing, calculated on-device if what they claim is true, is:
Well {Your phone contact name}, it looks like it will {remote weather response} during your {calendar event from phone} with {daughter from contacts} on {event date}.
That is the idea between on-device and cloud processing. The phone already has your contacts and calendar and does that work offline rather than educating an online server about your family, events and location, and requests the bare minimum from the internet, in this case nothing more than if you opened the weather app yourself and put in a zip code.
Voice processing is AI and was done by Apple servers. Previously, only the keyword “Hey Siri” was local. Onboard AI chips will allow this to be local. The actual queries will go to the servers. Phones do not have the power to run useful LLM locally- at least not with the near instantaneous response times phone users expect. A 56 Watt 128GB RAM M3 Max does around 8.5 tokens/second.
https://www.nonstopdev.com/llm-performance-on-m3-max/
Perhaps this is why these features will only be available on iPhone 15 Pro/Max and newer? Gotta have those latest and greatest chips.
It will be fun to see how it all shakes out. If the AI can’t run most queries on the phone with all this advertising of local processing…there’ll be one hell of a lawsuit coming up.
EDIT: Finished looking for what I thought I remembered…
Additionally, Siri has been locally processed since iOS 15.
https://www.macrumors.com/how-to/use-on-device-siri-iphone-ipad/
I’m not guessing. I linked to the article about the M3 which is much more powerful than the a17 pro in the 15 pro and has the same NPU.
Forgive me, I’m no AI expert to fully compare the needed tokens per second measurement to relate to the average query Siri might handle, but I will say this:
Even in your article, only the largest model ran at 8/tps, others ran much faster, and none of these were optimized for a task, just benchmarking.
Would it be impossible for Apple to be running an optimized model specific to expected mobile tasks, and leverage their own hardware more efficiently than we can, to meet their needs?
I imagine they cut out most worldly knowledge etc/use a lightweight model, which is why there is still a need to link to ChatGPT or Apple for some requests, would this let them trim Siri down to perform well enough on phones for most requests? They also advertised launching AI on M1-2 chip devices, which are not M3-Max either…
The “AI” parts are what they’re saying happens on the device. This isn’t a gotcha.
Literally not what people are talking about. It’s the “AI” part of the task that doesn’t leave the device (unless it prompts to ask chat gpt). Not that it can magically gleam live info without making any request to the web…
Jeeze, fucking… get your shit straight, making me defend Apple… Fucking do better.