Close

What constitutes a voltage regulation module (VRM)? Introduction to Power Delivery for Chips

AIAccelerators are increasingly demanding electricity. The thermal design power (TDP) of the Nvidia H100 is 700 watts (W), while the world's most commonly installed data center CPU, Intel Skylake/Cascade Lake has a TDP of less than 200W. Next-generation chips will require more power to support higher computing densities. This will require more than 200 kilowatts of power at the rack level, compared to the 15-20 kilowatts currently available in traditional CPU server racks.

As power increases, more challenges need to be addressed. In particular, higher power results in disproportionate transmission and conversion losses, ie wasted power. Since power costs are one of the largest expenses in a data center, reducing power loss is critical to improving total cost of ownership (TCO). As a result, we are now seeing power delivery networks being redesigned from the rack level to the chip level to address this issue in high-power computing workloads such as AI training and inference.

The main purpose of advanced power transfer architecture is to improve efficiency. Today we’re going to dive into the technical and competitive landscape of this topic. Power transfer company Vicor has benefited the most from this trend. Over the past decade, Vicor has grown from a supplier of common power components to a company involved in advanced data center power applications, in a variety of hyperscale data center rack-level power solutions and Nvidia, Design wins in AI accelerators from Google, AMD, Cerebras, Tesla, and Intel.

However, due to the dynamic nature of the electricity market, Vicor's fortunes have changed rapidly in recent times. Over a year ago, we exclusively discovered and disclosed that Monolithic Power Systems successfully replaced Vicor as the supplier of the Nvidia H100 GPU. In addition, the relationship with Vicor's second-largest customer has become turbulent. Additionally, there have been many changes to hyperscale data center rack power solutions, including several new competitors (MPS, Delta, Renesas, ADI< /u>).

The story has been in flux, with Vicor's future role becoming a battleground. Recent news, Vicor's lawsuit against a competitor, the deployment of hyperscale data centers, and incredible statements from the company's leadership have given strong support to both parties.

Today we will cover the basics of power delivery, Vicor's technology leadership, our evaluation of Vicor's factored power architecture and vertical power delivery technology, important design wins, including whether Vicor is in The details in H100 or TPUv5, the potential of Vicor in the automotive industry, and the long-term impact on them. We'll also share our thoughts on their 4 main competitors (MPS, Delta, Renesas, and ADI), as well as the ongoing legal battle.

Introduction to Power Delivery for Chips

Electricity is generated on an alternating current (AC) grid at voltages up to hundreds of thousands and transmission. Computing and memory chips require stable and clean low-voltage direct current (DC) power. Too much voltage can overload and damage the chip's delicate circuitry. Too low a voltage and the chip's circuits won't switch properly. The transformer, power supply unit (PSU), and voltage regulation module (VRM) are tasked with delivering the correct type of power to the chip. Efficient power delivery becomes more challenging as power demands increase.

f7ab0e86-343a-11ee-9e74-dac502259ad0.jpg

in In a circuit like a GPU or CPU, there are four main values: power, current, voltage, and resistance. Power (P) is a measure of energy used per unit of time, usually expressed in watts (W). Current is a measure of the amount of electrons flowing, or in other words, the rate at which electrons flow. Current (I or A) is usually expressed in Amperes (A). Voltage (V) is the potential difference between two points. You can think of voltage as the pressure pushing electrons through a circuit.

Voltage is usually given in units of volts (V). Finally, we have resistance (R), usually given in ohms (Ω), which indicates how difficult it is for an electric current to flow through a material. In order to use these values, we need Ohm's Law, and we will focus on two different forms of Ohm's Law. The first form is P = I * V, which simply means that power equals current times voltage. The second form is P = R * I^2, which means that power is also equal to resistance times current squared.

Silicon operates at low voltages of about 1V DC or less. In pursuit of power efficiency, designs are moving to lower clock speeds and lower operating voltages to operate at more efficient portions of the performance/power curve.

However, delivering power at low voltages and high currents results in large power losses (I^2R) from the resistance of the power lines. The key to minimizing power loss is to deliver power at higher voltages and lower currents, then step down the voltage as close to the active silicon as possible.

What constitutes a voltage regulation module (VRM)?

The VRM is an important combination of components that converts the input voltage from the system power supply unit (PSU) to the correct voltage to power the SoC. Typically, we see VRMs on the PCB that contains the chip, although in some rare cases these components may be located on the chip itself, or even integrated in the silicon. A modern VRM consists of three main parts: capacitor, inductor, and power stage. A capacitor stores electrical energy and then releases it at a constant rate, smoothing the delivery of power to the processor. Inductors are used to resist current changes and prevent large current surges from killing the processor.

f7c1c400-343a-11ee-9e74-dac502259ad0.jpg

Finally , and arguably the most important part of the VRM is the power stage, which converts the input voltage (say 12 volts) from the power supply unit (PSU) to the voltage required by the processor. On a CPU, the required voltage is usually traditionally 1.2 to 1.8 volts, while on a GPU or a large FPGA, ASIC or AI accelerator, this ranges from 0.8 to 1.0 Volts.

Higher power, lower efficiency

As voltages used to power SoCs decrease in future architectures and process technologies, in order to maintain the same power, The current needs to increase by the same factor that the voltage decreases. For example, let's look at an AMD Genoa CPU that draws 240 watts and operates at 1.2 volts. Decreasing the input from 12 volts to 1.2 volts (10x reduction) to feed the chip means that the current needs to increase from 20 amps at 12 volts to 200 amps at 1.2 volts (10x increase) to maintain the same power level.

In comparison, a 700-watt GPU operates at 0.8 volts. If you drop the input from 12 volts to 0.8 volts (15x reduction) to feed the chip, the current needs to increase from 60 amps at 12 volts to 875 amps at 0.8 volts (15x increase). GPUs draw much higher currents than CPUs that consume less power. Higher current means higher resistive losses, which can be known from the formula P = R * I^2 (loss equals resistance times current squared).

As the voltage dropped to 0.8 volts, the resistance deteriorated dramatically: the current increased by a factor of 15, resulting in an exponential increase in resistive losses of 225 times. This illustrates that efficiency loss has become a significant problem in recent generations of data center chips. This problem will only get worse as voltages continue to shrink and packages get larger, with higher demand for advanced packaging.

The rise of 48V

In order to solve this problem, people began to use higher input voltage. For a long time, 12 volt direct current (DC) power has been the standard voltage provided by electronics power supply units (PSUs). In the past, 12 volts worked well enough due to the lower power, resulting in negligible losses in efficiency. Efficiency is taking a double whammy as industry demand for higher-power, lower-voltage SoCs increases. These efficiency losses outweigh the benefits of relatively cheap and ubiquitous 12-volt components.

Going from 12 volts to 48 volts means only 1/4 the current is needed, so losses will be reduced by a factor of 16 (4^2). This is why many companies are starting to switch to 48 volt power delivery networks. But what's the point if you end up going down to 1 volt?

The distance to drop the 48 volts down to the SoC voltage is closer, so the wire length is shorter. Longer wire lengths result in greater resistive losses. Therefore, only dropping the 48 volt input voltage as close as possible to the point of load results in lower overall resistive losses.

Google was the first hyperscale cloud service provider to adopt 48-volt power in its data centers around 2016, and pushed for 48-volt power in the OpenCompute consortium to standardize.

The Rise of Vicor

In response, chip companies and original equipment manufacturers (OEMs) put voltage regulation modules with 48 volt inputs on their motherboards (VRM). The main beneficiary is Vicor. While there was an established 48-volt ecosystem earlier in telecom equipment, that was a negative voltage, whereas data centers require positive voltages. Vicor is a major supplier of 48-volt VRMs for computing use cases.

To achieve this change, the power supply unit converts the 380 volts AC received by the racks to 48 volts DC. With data centers providing 48 volts at the racks, this has also prompted server motherboards to start using 48 volts input to be able to accept this 48 volt input and step it down. Or, for a traditional 12 volt motherboard to work, an intermediate component is needed to step down 48 volts to 12 volts. Basically, you either needed 48 volts out or 48 volts in, and Vicor was the first to the market.

In 2018, Vicor achieved its first win in mainstream commercial silicon with a design that won Nvidia's V100 SXM3 refresh program. This project utilizes Vicor's 48 Volt Voltage Regulator Module (VRM) components. Then came the A100, the entire product line using Vicor components for the VRM. Google also adopted Vicor's product for the TPU (tensor processing unit) in a similar time frame as the V100. This further cements Vicor's dominance in the 48 volt space and shows that Vicor is the way of the future for high-performance power delivery.

However, Vicor was subsequently replaced by Monolithic Power Systems (MPS) in the H100, breaking the previous record, first reported by SemiAnalysis. The scoop caused Vicor's stock price to drop more than 20% the day after the release, and another 30% over the following year, largely because of Nvidia's large contribution to Vicor's revenue. Until today, Vicor has not yet shipped a large amount to Nvidia's H100 project, which is being promoted on a large scale.

Last week, Vicor's chief executive filed a lawsuit against rivals while claiming its products were re-entering customers' base platforms, resulting in a huge short squeeze. To be clear, the CEO told analysts more than a year ago that they would re-enter the customer base platform, but orders have yet to come.