When doing RTL design, there will inevitably come a moment when you fall into a ‘dilemma’.
“I added a pipeline register to adjust the timing, but the data flow control got messed up and the performance (throughput) dropped.”
If the structure simply involves data flow, simply adding a register is sufficient. However, if the downstream module is busy and needs to tell the upstream module to stop (backpressure), the situation is different.
In this article, we will understand the Valid-Ready Handshake protocol, which is the basis of standard interfaces such as AXI4-Stream, and learn about the RTL design principles of the Skid Buffer, a secret weapon for resolving timing violations.
1. Fundamentals of Data Flow Control: Valid-Ready Handshake
First, we need to define a protocol for securely exchanging data between the two modules. The industry standard AMBA AXI4-Stream interface uses this approach.
- Source (Sender): The side sending the data. It sends a Valid signal and Data.
- Destination (Receiver): The side that receives data. It sends a Ready signal indicating that it is ready to receive.
Core Rules (The Handshake)
Data transfer occurs only when Valid and Ready are both ‘1’ (High).
- Valid=1, Ready=0: The sender has sent data, but the receiver is busy and cannot receive it. The sender must hold on to the current data until Ready becomes 1 (Wait).
- Valid=0, Ready=1: The receiver is ready to receive, but has no data to send. It is waiting.
- Valid=1, Ready=1: Handshake, data is transmitted. New data can be sent on the next clock.
2. Problem: Timing issue with Ready signal
Let's imagine this simple protocol as a long chain: Module A -> Module B -> Module C.
The problem is that Data and Valid flow forward, but the Ready signal flows backward.
- When module C becomes busy (Ready_C=0), module B must also stop (Ready_B=0), which will then stop module A (Ready_A=0).
- That is, the Ready signal is directly connected to the combinational path through all modules to the front end.
If this path becomes too long, a setup time violation occurs, preventing the operating frequency from being increased. However, simply inserting a register (flip-flop) into the Ready signal would cause the timing to be off by one clock, causing data loss by sending additional data when it should have already stopped.
3. Solution: What is a Skid Buffer?
Skid Buffer is a small buffer circuit that acts as a ‘pipeline register’ while also fully supporting the ‘Valid-Ready handshake’.
How it works: "Receive first, then think about it."
Skid Buffer places a register between the input and output to eliminate timing issues. But what happens if the destination suddenly sends Ready=0?
- Skid Buffer has an internal auxiliary storage (Shadow Register).
- When the rear end tells the front end (Source) to stop (Ready_out=0), instead of telling the front end (Source) to stop immediately, it first saves (Skid) one piece of data to the internal storage.
- And then on the next clock, it sends Ready_in=0 to the shear to stop it.
In other words, it is a principle that prevents collisions by leaving enough space for the ‘skid distance’ when a car suddenly stops.
4. Skid Buffer Types and Trade-offs
① Half-Bandwidth Skid Buffer (Simple Register)
This is the simplest form. When data arrives, it is stored in a register and sent.
- Characteristic: After exporting data, the register must be emptied before receiving the next data.
- Disadvantage: Valid and Ready ping-ponge, halving maximum throughput to 50%. This is not suitable for high-speed designs that require data processing every clock cycle.
② Full-Bandwidth Skid Buffer (True Skid Buffer)
This is the real Skid Buffer we need to use in practice.
- Structure: Main Register + Skid Register + MUX
- Operation:
- Normally, data is transmitted through the main register every clock (100% Throughput).
- When the backend is stopped (Ready=0), incoming data is temporarily taken refuge in a backup register.
- When the back-up is reopened, the backed-up data is sent first and the device returns to normal mode.
- Merit: There is no performance degradation whatsoever while breaking the timing path of the Ready signal (Pipeline).
5. Core Verilog Implementation (Full-Bandwidth)
If you try to implement it yourself, the state machine (FSM) becomes complex. It usually has the following logical structure:
// Skid Buffer conceptual pseudocode
module skid_buffer (
input clk, rst_n,
// Upstream (Input)
input i_valid,
output o_ready,
input [31:0] i_data,
// Downstream (Output)
output o_valid,
input i_ready,
output [31:0] o_data
);
// Internal state: 0=Empty, 1=Busy(Normal), 2=Full(Skid Active)
reg [1:0] state;
reg [31:0] main_reg, skid_reg;
// Output Logic
assign o_valid = (state != 0); // Valid if data exists
assign o_ready = (state != 2); // You can receive it if it's not full (key!)
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
state <= 0;
end else begin
case (state)
0: if (i_valid) begin // Empty -> Load
main_reg <= i_data;
state <= 1;
end
1: begin // Normal Mode
if (i_ready && i_valid) main_reg <= i_data; // Pass-through
else if (!i_ready && i_valid) begin // Back-pressure! -> Skid
skid_reg <= i_data; // Save to Skid
state <= 2;
end
else if (i_ready && !i_valid) state <= 0; // Drain
end
2: if (i_ready) begin // Full -> Output Skid
main_reg <= skid_reg; // Move Skid to Main
state <= 1;
end
endcase
end
end
assign o_data = main_reg;
endmodule6. Conclusion: When should we use it?
Skid buffers are not a panacea. They take up more space due to the MUX and additional registers. They should be used strategically in the following situations:
- Long Path Breaking: When the distance between modules is long and the routing delay is large.
- Timing Closure: When a Setup Violation occurs in the Ready signal path.
- Modular Design: When designing IP, when you want to break the timing dependency with the outside world (place Skid Buffer on the Output side).
References: ZipCPU