[RTL] Fan-in and Fan-out: The Hidden Causes of Timing Issues and Solutions

When doing RTL design, we often encounter cases where the chip is functionally perfect, but does not operate due to Timing Violation (Setup/Hold) occurring during the synthesis or P&R stage.

If you open the STA (Static Timing Analysis) report, you'll see warnings like "High Fan-out Net" and "Transition Time Violation." These are the Fan-in and Fan-out issues we'll be discussing today.

These concepts, which are easy to understand simply as the number of inputs/outputs of logic gates, will explain how they create physical delays on actual silicon wafers, and how RTL designers can address this in code.

Fan-in, Fan-out 설명
Fan-in, Fan-out explanation

1. Fan-in: What if you hear too many voices at once?

Fan-in refers to the number of input signals accepted by one logic gate.

  • Definition: The number of previous stage gates connected to one gate input.
  • Metaphor: A situation where multiple people (Input) talk to one person (Gate) at the same time.

(1) Physical problem: Series-connected transistors

Consider NAND and NOR gates in a CMOS process. As the number of inputs increases, the internal PMOS or NMOS gates are stacked in series.

  • Increased resistance: Transistors connected in series increase the resistance (R).
  • Slowdown: As resistance increases, it becomes more difficult for current to flow, slowing down the switching speed of the gate (increasing RC Delay).

(2) RTL design perspective: Logic Depth

For RTL designers, High Fan-in usually appears in the form of complex conditional statements (Logic Depth).

// [High Fan-in Example] 
// When numerous conditions are combined with AND/OR, a huge complex gate is created.
assign valid = (cond1 & cond2 & cond3 & ... & cond16);

Logic with such a large number of inputs cannot be synthesized into a single, large gate by a synthesis tool. Instead, it is broken down into multiple levels of smaller gates, forming a tree structure. This increases the logic depth (the number of times the gates are passed through), leading to setup time violations.

2. Fan-out: What if you carry too much weight on your own?

Fan-out refers to the number of inputs to the next gate that a logic gate output must drive. In practice, this is a much more frequent and serious issue than fan-in.

  • Definition: The number of loads connected to one output.
  • Metaphor: Imagine a situation where a teacher (Driver) has to lecture to hundreds of students (Load) by voice. It would take a long time for the voice (signal) to reach the rear.

(1) Physical problem: Increased capacitance

There is parasitic capacitance at the input of every gate, and capacitance also exists in the wiring.

DelayCloadDrive_StrengthDelay \propto \frac{C_{load}}{Drive\_Strength}

A high fan-out ratio means a very large capacitor (Cload) to charge. A large water tank (Capacitor) takes a long time to fill (Voltage High) even when water is turned on at the faucet (Gate). This increases the transition time (Rise/Fall Time) and slows down the overall chip speed.

(2) Representative High Fan-out Nets

In RTL design, the following signals necessarily have High Fan-out:

  1. Clock: Connected to tens of thousands of F/Fs (resolved with CTS)
  2. Reset: Reset the entire chip (solved with Reset Tree)
  3. Global Enable: Control signal to turn the entire module on and off

3. RTL design techniques to solve high fan-out problems

Synthesis tools (Design Compiler) or P&R tools can solve this problem to some extent by inserting buffers, but the fundamental solution must come from the RTL code level.

(1) Register Duplication

If one flip-flop has to handle 100 loads, the method is to create two flip-flops that perform the same operation and divide them into 50 loads each.

[Bad Case: High Fan-out]

reg global_en;
// 1000 modules look at only one global_en (Fan-out = 1000)
always @(posedge clk) begin
    if (global_en) data_out <= data_in;
end

[Good Case: Manual Duplication]

// 1. Direct replication in RTL (creating physically different registers)
reg global_en_1, global_en_2; 

always @(posedge clk) begin
    global_en_1 <= source_logic;
    global_en_2 <= source_logic; // Logically the same
end

// 2. Load Balancing
always @(posedge clk) begin
    if (global_en_1) block_A_data <= ...; // 500 driving
    if (global_en_2) block_B_data <= ...; // 500 driving
end
  • Note: Just because you name them differently in RTL doesn't mean the synthesis tool will always separate them. The tool might think, "Huh? This is the same logic?" and merge them together. To prevent this, use an attribute like (*keep = "true"*) or configure the synthesis script to set_dont_touch.
`define	keep_attib			(* keep = "true" *)

`keep_attib    reg r_sram_en;

(2) Setting the Max Fanout Constraint

If you find it difficult to directly modify the code, you can impose constraints on the synthesis tool. For example, "All nets in this design must have a fan-out of 30!" The tool will automatically insert buffers or clone logic.

  • disadvantage: As buffers are added, the area increases, and latency may slightly increase due to the latency of the buffer itself.

(3) Creating a tree structure

For signals where timing is not critical, like a reset signal, but the fan-out is extremely high, create a buffer tree to spread the signal in stages.

4. Conclusion: “How far does my signal go?”

An RTL engineer isn't someone who simply thinks about 0 and 1 logic. They need to be able to imagine how many gates a single assign statement or reg declaration they've written will drive in a real circuit.

  1. If fan-in is high: Reduce logic depth by splitting logic or adding pipelines.
  2. If fan-out is high: Duplicate registers or utilize constraints of the synthesis tool.

If you address the high fan-out issue upfront at the RTL stage, you will dramatically reduce the chances of receiving a call from your back-end engineer saying, “The timing isn’t right.”

References: wiki

Similar Posts