The Power Bottleneck: When Edge AI Forces the Move to ASIC

8 Jan

Most teams start with off-the-shelf silicon for a simple reason: it gets you moving. You can build a prototype quickly, prove the concept, and keep your options open while you learn what customers actually want.

Then you add edge AI, and the product stops being “a device that runs code” and becomes “a power budget that must deliver an experience”. That is the pivot point. Once AI is in the loop, power becomes the bottleneck, and off-the-shelf often stops being the best long-term answer.

AI makes power critical, and that is the cleanest reason to move from off-the-shelf to an ASIC

Off-the-shelf is built for flexibility, not your constraints

General-purpose chips and modules are designed to serve lots of customers and lots of applications. That flexibility is why they are such a good starting point. You get mature software ecosystems, known supply channels, and predictable development workflows.

The catch is that flexibility comes with overhead. Off-the-shelf parts often include subsystems you do not need, support broader operating ranges than your device will ever see, and assume a generic architecture that may be a poor fit for your actual duty cycle. In early prototypes, this is fine. In a shipping product where battery life, heat, and size are non-negotiable, that overhead turns into cost.

With edge AI, that cost tends to show up first as power.

Why AI turns power into the main requirement

A lot of teams assume the challenge with AI is “more compute”. Sometimes it is. But in many real devices, the bigger problem is everything around compute, especially data movement.

Edge AI systems move data repeatedly through sensor interfaces, memory, processing blocks, and communication links. Each transfer costs energy. Each chip-to-chip boundary adds overhead. Each time you wake up a block that was not designed for your exact power modes, you burn budget you never get back.

This is why AI features that look healthy on a dev kit can fall apart in the field. Battery life collapses. Thermal limits trigger throttling. Latency becomes inconsistent across temperature or supply conditions. The board grows because the device needs more power regulation, more memory, more margin. And the design becomes harder to validate because there are more parts and more failure modes.

The core point is simple. AI pushes you into a regime where the physics of power and heat dominates. When that happens, the product is no longer constrained by whether the code runs. It is constrained by whether the system can deliver the AI experience inside the energy and thermal envelope you promised.

Making the Power Cost of Edge AI Concrete

One reason the edge AI power problem is often underestimated is that “edge” sounds small. In practice, published measurements from real devices show that even modest inference workloads can consume multiple watts when implemented on general-purpose platforms.

For example, commonly used edge AI systems operate across a wide power range during inference:

The NVIDIA Jetson Orin Nano Series runs in the 7–25 W power envelope on typical workloads depending on configuration. siliconhighway.com
Benchmarking studies show the Jetson Nano consumes approximately 7 W under load, while a Raspberry Pi 5 with a Coral TPU consumed around 8.3 W, and the Jetson Orin NX can reach ~10.6 W under similar test conditions. Georgia Southern Scholars
By contrast, ultra-low-power devices such as Coral Edge TPUs are designed for energy-efficient inference and often run at only a few watts, making them attractive for battery-powered systems. rasimmax.com

These figures come from publicly accessible benchmarks and vendor documentation. The important point is not the precise number for every product, but that AI inference on edge AI silicon ranges from a few watts to tens of watts depending on architecture and configuration.

Edge AI does not automatically mean low energy usage. Architecture matters.

Where the Power Actually Goes

In most general-purpose edge AI platforms, AI compute and data movement dominate power consumption, not just peak arithmetic intensity. While sensors and control logic typically draw hundreds of milliwatts to a few watts, the AI accelerator and memory subsystem often account for the majority of system power during active inference. Georgia Southern Scholars

This is why two devices running the same neural network might differ radically in battery life: the difference lies not in the model, but in the memory hierarchy, dataflow, and compute architecture.

Once AI is added, the system’s power profile is no longer shaped by average CPU load. It is shaped by worst-case inference paths and memory traffic patterns.

The real benefit of an ASIC is removing waste

It is tempting to think of an ASIC as a way to get “more performance”. In practice, the decisive advantage is usually that an ASIC lets you remove waste in the system.

You stop paying for features you do not use. You shorten signal paths. You reduce the energy lost in interconnects and chip-to-chip links. You simplify power management because the chip is designed around your real operating modes, not a generic set. And you integrate functions that would otherwise be spread across multiple devices, reducing component count and board complexity.

That integration matters twice. It reduces power, because there is less overhead pushing bits around the board. It also reduces practical product cost, because fewer components and fewer interfaces usually means fewer things to buy, fewer things to qualify, and fewer things that can go wrong.

This is where the AI angle becomes the cleanest argument for custom silicon. AI makes power the bottleneck. The best lever you have for power is integration and workload-specific optimisation. ASIC is the mechanism that makes that lever available.

Speed Versus Power Is the Hidden Tradeoff

From a silicon perspective, system power follows a familiar relationship:

P ∝ C × V² × f

Where power increases with switching capacitance, supply voltage, and clock frequency. General-purpose edge AI platforms are built to support a wide range of workloads and peak performance scenarios. As a result, they often operate at higher voltages and frequencies than fixed workload systems would require.

By contrast, workload-specific silicon can reduce supply voltage and clock frequencies while still meeting latency targets, cutting overall energy per inference. This is why an optimized edge AI ASIC might achieve real-time inference at 1–2 W, while a general-purpose platform could draw 10–20 W or more to meet the same inference latency in practice. siliconhighway.com +1

The difference is not magic. It is the result of designing for “just enough performance’’ rather than maximum flexibility.

A sane way to decide: prove the bottleneck, then choose the minimum ASIC that fixes it

The biggest mistake teams make is treating ASIC as a binary jump. Either you stay off-the-shelf forever, or you design a monster chip that tries to do everything.

There is a more practical path. You start by proving where the power is going and what specifically is preventing you from shipping. When you do that honestly, the solution usually becomes clearer. Sometimes the right move is a different off-the-shelf part, or a better board architecture. But when the numbers keep telling the same story, you know you are past the point where firmware optimisation will save you.

Once you have proven the bottleneck, you can scope the ASIC to solve the bottleneck, not to satisfy an abstract dream of custom silicon. That often means focusing on the pieces that drive energy consumption and system complexity: the data path, the memory architecture, the interfaces, and the power management around the duty cycle.

This is also where the economic reality belongs. Custom silicon has upfront cost and lead time. It can be the right decision, but it needs to be the right decision for your expected volume and your product lifetime. The payoff is not magic. The payoff is that per-unit cost and per-unit power improve when you stop shipping unnecessary capability.

How to save money and reduce risk: start from a reference design

If you want the ASIC path to be a cost saver rather than a science project, do not start from a blank sheet.

Starting from a reference design changes the character of the effort. You are no longer inventing the entire architecture while also trying to meet schedule. You are taking a known-good foundation and customising the pieces that actually matter for your product. That means fewer unknowns in integration, earlier validation, and fewer late-stage surprises.

It also lets you pick your battles. If the power bottleneck is dominated by a particular data path or interface set, you can focus your custom work there. If the differentiator is security or a specific communications stack, you can scope around that. The point is to avoid paying NRE to customise parts of the chip that do not move the outcome.

This is how teams make the economics work. They use a reference design to compress the timeline, lower the number of new decisions, and reduce the risk of expensive rework.

The factor that is now impossible to ignore: geopolitics and compliance

A decade ago, many teams treated geopolitics as someone else’s problem. Today, if your product lifecycle is measured in years, supply stability and compliance are engineering inputs.

Export controls, shifting trade relationships, and regional industrial policy can affect where you can source IP, where you can manufacture and test, and which markets you can serve without friction. Even if your device is not a headline-grabber, the ecosystem it depends on is. That matters more for silicon than for most other components, because lead times are long and qualification cycles are slow.

The practical takeaway is not to predict the future. It is to design a program that is resilient to change. That means thinking early about manufacturing options, packaging and test strategy, second-source planning where possible, and a compliance approach that matches the markets you want to serve.

When you make an ASIC decision, you are making a multi-year commitment. The world has a habit of changing mid-commitment.

When off-the-shelf is still the right call

None of this is an argument that everyone should build custom silicon. Off-the-shelf remains the right answer when your AI feature is optional, when your power budget has real headroom, when volumes are too low to justify upfront investment, or when speed to market is the overriding priority.

Off-the-shelf is a superpower for learning. ASIC is a superpower for scaling.

The point is knowing when you have crossed the line.

The question that decides it

Do not ask “should we build an ASIC”. Ask a more specific question:

Is our AI experience constrained by a power budget that off-the-shelf silicon cannot meet without unacceptable compromises?

If the honest answer is yes, the next step is not to daydream about a custom chip. It is to quantify the bottleneck, scope the minimum silicon change that removes the waste causing it, and reduce risk by starting from a reference design. When you do that, the ASIC path stops being a moonshot and becomes a straightforward engineering decision, driven by the same thing that drives every good product decision: the constraints that actually matter.

Citations

NVIDIA Jetson Orin Nano Series power envelope (7–25 W typical) from Jetson datasheet. siliconhighway.com
Benchmarked power consumption of Jetson Nano (~7 W), Raspberry Pi 5 + Coral (~8.3 W), and Jetson Orin NX (~10.6 W). Georgia Southern Scholars
Comparison of edge AI device power characteristics highlighting Coral (lower power) vs Jetson platforms. rasimmax.com

Jack Parsons