All articles

Essays

AI is leaving the cloud: the data behind the shift to on-device compute (2024–2029)

June 14, 2026 · 6 min read

For fifteen years, compute moved one way — into the cloud. Now the vector is reversing: a growing share of AI inference runs not on remote servers but on the user's own device — laptop, phone, or browser. This piece is a neutral look at how real that shift is: what the largest analyst forecasts show, and the technical and economic reasons behind it. Every figure links to a primary source.

What "on-device AI" means

To be precise, let's separate the terms:

  • On-device (edge) inference — the model runs on the device itself; data isn't sent to an external server.
  • NPU (Neural Processing Unit) — a dedicated block in the processor that accelerates AI at low power. The arrival of NPUs in mainstream hardware is what made local AI practical.
  • SLM (Small Language Model) — a compact language model (a few billion parameters) that can run on a phone or laptop without reaching a data center.

"AI is leaving the cloud" doesn't mean abandoning the cloud entirely — it means the center of gravity is moving: more tasks are handled locally, while the cloud remains for the heavy cases.

Data has already moved to the edge

The broadest indicator is where data gets created and processed at all. Back in 2019, less than 10% of enterprise data was processed outside a traditional centralized data center or cloud. Per Gartner's forecast, that share reaches 75% by 2025.

Share of enterprise data created and processed outside a traditional data center, %
Share of enterprise data created and processed outside a traditional data center, %

Source: Gartner — What Edge Computing Means for Infrastructure and Operations Leaders. 2019 vs. forecast for 2025.

This is the infrastructure backdrop: data physically moves closer to its source — the person and their devices. AI inference follows data by the same logic as any processing: computing where data is born is cheaper and faster than shipping it across the network.

Hardware: AI PCs and NPUs become the default

The main practical driver is the mass arrival of NPUs in consumer hardware. So-called AI PCs (computers with a dedicated neural accelerator) are turning from a premium niche into the standard configuration within two or three years.

Per IDC, AI PCs make up about 40% of worldwide PC shipments in 2025 and approach 60% by 2027. Counterpoint Research forecasts that NPU laptops will cross half of global shipments as soon as 2026.

AI PC (with NPU) share of worldwide PC shipments, %
AI PC (with NPU) share of worldwide PC shipments, %

Sources: IDC — Worldwide PC Forecast, Counterpoint Research — AI PCs to surpass half of global shipments in 2026, Computerworld.

Shipments are a flow. The installed base is even more telling: by IDC's estimate, the share of AI PCs among computers in use grows from 5% in 2023 to 94% by 2028. Within a few years, an "ordinary" computer can run AI locally by default.

AI PC share of the active installed base of personal computers, %
AI PC share of the active installed base of personal computers, %

Source: IDC. Installed-base estimate, 2023 and forecast for 2028.

Silicon and smartphones: toward local AI by default

At the silicon level the trend is even sharper. Gartner forecasts that by 2029, integrated on-device AI will be present in more than 99% of PC microprocessors — up from roughly 15% in 2024. A neural accelerator stops being an option and becomes part of the processor's baseline architecture.

Share of PC microprocessors with integrated on-device AI, %
Share of PC microprocessors with integrated on-device AI, %

Source: Gartner — Top Predictions for IT Organizations and Users. 2024 and forecast for 2029.

Smartphones point the same way. Gartner expects that by 2027 compact models will run advanced generative AI directly on the phone with no cloud reliance. And it's not just a forecast — the largest platforms already ship such products:

PlatformWhat runs on the device
Apple IntelligenceThe core model runs locally; only some heavy tasks go to the cloud (Private Cloud Compute)
Microsoft Copilot+ PCA PC class with a mandatory NPU; a range of AI features compute locally
Google Gemini NanoThe model runs fully on-device and offline; Android is the first OS with a built-in on-device model

When the three companies that shape the platforms for billions of people all move compute onto the device at once, it stops being a niche idea and becomes the direction of the whole industry.

Why this shift is durable

Several independent forces drive the trend — and none of them look temporary:

  1. Privacy as a requirement. When inference runs locally, data never leaves the device. For medicine, finance, law and personal notes that settles a legal and reputational question, not just convenience. Google explicitly calls Gemini Nano the most private option precisely because data doesn't go to its servers.
  2. The economics of inference. Paying per request on someone else's GPUs scales poorly. A device the user already bought computes with no marginal cost per request.
  3. Latency and offline. A local model answers instantly and works without a connection — critical for real-time assistants.
  4. Hardware caught up. An NPU in every new laptop and phone removed the main technical barrier: five years ago running models locally was expensive and slow.

What it changes

The shift to on-device AI changes not only where models run but the architecture of the products built around them. If data isn't required to leave the device, then a single centralized database holding every user's data stops being inevitable. And such a database is always a double risk: a single point of failure and a single point of access (for an attacker, for a leak, for an outside request). The more compute moves to the device, the fewer reasons to pool sensitive data in one place.

For users and companies this means something simple: privacy will increasingly be a consequence of architecture rather than a promise in a policy. And the winners will be products designed for a world where AI lives next to the person, not on someone else's server.


Disclosure: this piece was prepared by the maxOS team — we build call tools with on-device processing, so the topic is close to us. The data belongs to the independent sources listed above; we've tried to present it without distortion. Related: where your voice goes and why we pay to improve local models.

AI is leaving the cloud: the data behind the shift to on-device compute (2024–2029) — maxOS