What are the most important specifications for the latest mobile and IoT devices? Performance is crucial, but battery life and heat management are also crucial. No matter how powerful a chip is, if it drains its battery in an hour or becomes as hot as a hand warmer, it won't survive in the market.
Gone are the days when RTL designers simply verified "functionality" and moved on. Now, designing for power consumption has become a critical skill.
In this article, we will cover Clock Gating and Operand Isolation, two of the most effective low-power design techniques that can be applied at the RTL stage.
1. The Power Consumption Formula: What Can We Reduce?
The power consumption of a semiconductor is broadly defined as the sum of dynamic power and static power.
- Static Power (Leakage): This is the current that leaks out even when the transistor is turned off. This is highly dependent on process technology and voltage (multi-voltage cell), making it difficult for RTL designers to control.
- Dynamic Power (Switching): This is the power generated when a signal changes from 0 to 1 or 1 to 0 (toggle). This is a key target that RTL designers must reduce.
The formula for dynamic power is:
- alpha (Activity Factor): How often does the signal change? (A key target for RTL designers!)
- C (Capacitance): Load capacity
- V: Operation Voltage
- f: Operation Frequency
We should aim to make unnecessary switching (α) to 0 through RTL code.
2. The King of Low Power: Clock Gating
The biggest power drain in digital circuits is the clock. The clock tree, spread across the chip, oscillates hundreds of millions of times per second, burning power even when the data doesn't change.
(1) Problem situation: MUX Feedback Loop
In RTL, to hold the value of a register, it is often coded as follows:
// Update values only when data is valid, otherwise keep them
always @(posedge clk) begin
if (load_en) q <= data_in;
else q <= q; // In reality, a MUX is created and the output (q) is fed back to the input.
endWhen this code is synthesized, a Multiplexer (MUX) is created. The problem is that even when load_en is 0 and the value does not change, the clock pin of the flip-flop continues to be supplied with a clock.
(2) Solution: Integrated Clock Gating (ICG)
Clock gating is the process of completely cutting off the clock supplied to the flip-flop when the data does not need to change.
However, simply blocking the clock with an AND gate increases the risk of glitches and malfunctions. Therefore, in practice, a special cell called Integrated Clock Gating (ICG) is used.
- Structure: Latch (Level-sensitive) + AND Gate
- Principle: Latch holds the Enable signal in advance during the low period of the clock, preventing glitches from occurring even if Enable changes in the middle of the high period of the clock.
(3) RTL Coding Guide
Fortunately, modern synthesis tools (such as Design Compiler) automatically insert ICG cells if we write our code correctly. This is called Auto Clock Gating.
// [Good] Code style that enables Auto Clock Gating
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
q <= 0;
end else begin
if (load_en) begin // This conditional statement becomes the enable signal of ICG.
q <= data_in;
end
// If there is no else clause, the tool inserts ICG by judging it as 'maintaining the value'.
end
endTip: A single if (en) conditional statement can stop the clock for an entire 32-bit register, resulting in significant power savings.
3. Preventing Heavy Operations: Operand Isolation (Data Gating)
The next largest power consumers after the clock are the datapaths, such as multipliers and adders.
(1) Problem situation
assign result = a * b; // 곱셈기The code above causes numerous gates within the multiplier to switch whenever a or b changes. Even if the result value isn't currently being used (e.g., a different value is being selected by a MUX), the multiplier wastes power when the inputs change.
(2) Solution: Fix the input to 0
This is a technique that prevents the values of inputs (a, b) from changing when the operation result is not needed. This is called Operand Isolation.
// [Concept Code]
wire [31:0] a_gated, b_gated;
// If the valid signal is 0, fix the input to 0 (using AND gate)
assign a_gated = a & {32{valid}};
assign b_gated = b & {32{valid}};
assign result = a_gated * b_gated;This way, when valid is 0, the switching activity inside the multiplier is 0, saving power. While modern synthesis tools often perform this automatically, for complex logic, it's often best for designers to intentionally code it.
4. Conclusion: Beyond “working chips” to “good chips”
A single line of code created by an RTL designer moves millions of transistors. Meticulously designing the load_en signal to improve clock gating efficiency and controlling data flow to prevent unnecessary computations—this is the essence of high-quality RTL.
References: Semiconductor Engineering