[RTL] RTL Valid-Ready Handshake와 Skid Buffer

When doing RTL design, there will inevitably come a moment when you fall into a ‘dilemma’.

“I added a pipeline register to adjust the timing, but the data flow control got messed up and the performance (throughput) dropped.”

If the structure simply involves data flow, simply adding a register is sufficient. However, if the downstream module is busy and needs to tell the upstream module to stop (backpressure), the situation is different.

In this article, we will understand the Valid-Ready Handshake protocol, which is the basis of standard interfaces such as AXI4-Stream, and learn about the RTL design principles of the Skid Buffer, a secret weapon for resolving timing violations.

Related articles

✅[RTL] Fan-in and Fan-out: The Hidden Causes of Timing Issues and Solutions

1. Fundamentals of Data Flow Control: Valid-Ready Handshake

First, we need to define a protocol for securely exchanging data between the two modules. The industry standard AMBA AXI4-Stream interface uses this approach.

Source (Sender): The side sending the data. It sends a Valid signal and Data.
Destination (Receiver): The side that receives data. It sends a Ready signal indicating that it is ready to receive.

Core Rules (The Handshake)

Data transfer occurs only when Valid and Ready are both ‘1’ (High).

Valid=1, Ready=0: The sender has sent data, but the receiver is busy and cannot receive it. The sender must hold on to the current data until Ready becomes 1 (Wait).
Valid=0, Ready=1: The receiver is ready to receive, but has no data to send. It is waiting.
Valid=1, Ready=1: Handshake, data is transmitted. New data can be sent on the next clock.

2. Problem: Timing issue with Ready signal

Let's imagine this simple protocol as a long chain: Module A -> Module B -> Module C.

The problem is that Data and Valid flow forward, but the Ready signal flows backward.

When module C becomes busy (Ready_C=0), module B must also stop (Ready_B=0), which will then stop module A (Ready_A=0).
That is, the Ready signal is directly connected to the combinational path through all modules to the front end.

If this path becomes too long, a setup time violation occurs, preventing the operating frequency from being increased. However, simply inserting a register (flip-flop) into the Ready signal would cause the timing to be off by one clock, causing data loss by sending additional data when it should have already stopped.

3. Solution: What is a Skid Buffer?

Skid Buffer is a small buffer circuit that acts as a ‘pipeline register’ while also fully supporting the ‘Valid-Ready handshake’.

How it works: "Receive first, then think about it."

Skid Buffer places a register between the input and output to eliminate timing issues. But what happens if the destination suddenly sends Ready=0?

Skid Buffer has an internal auxiliary storage (Shadow Register).
When the rear end tells the front end (Source) to stop (Ready_out=0), instead of telling the front end (Source) to stop immediately, it first saves (Skid) one piece of data to the internal storage.
And then on the next clock, it sends Ready_in=0 to the shear to stop it.

In other words, it is a principle that prevents collisions by leaving enough space for the ‘skid distance’ when a car suddenly stops.

4. Skid Buffer Types and Trade-offs

Half-Bandwidth Skid Buffer (Simple Register)

This is the simplest form. When data arrives, it is stored in a register and sent.

Characteristic: After exporting data, the register must be emptied before receiving the next data.
Disadvantage: Valid and Ready ping-ponge, halving maximum throughput to 50%. This is not suitable for high-speed designs that require data processing every clock cycle.

Full-Bandwidth Skid Buffer (True Skid Buffer)

This is the real Skid Buffer we need to use in practice.

Structure: Main Register + Skid Register + MUX
Operation:
- Normally, data is transmitted through the main register every clock (100% Throughput).
- When the backend is stopped (Ready=0), incoming data is temporarily taken refuge in a backup register.
- When the back-up is reopened, the backed-up data is sent first and the device returns to normal mode.
Merit: There is no performance degradation whatsoever while breaking the timing path of the Ready signal (Pipeline).

5. Core Verilog Implementation (Full-Bandwidth)

If you try to implement it yourself, the state machine (FSM) becomes complex. It usually has the following logical structure:

// Skid Buffer 개념적 의사 코드 (Conceptual Code)
module skid_buffer (
    input clk, rst_n,
    // Upstream (Input)
    input  i_valid,
    output o_ready,
    input  [31:0] i_data,
    // Downstream (Output)
    output o_valid,
    input  i_ready,
    output [31:0] o_data
);
    // 내부 상태: 0=Empty, 1=Busy(Normal), 2=Full(Skid Active)
    reg [1:0] state; 
    reg [31:0] main_reg, skid_reg;

    // Output Logic
    assign o_valid = (state != 0); // 데이터가 있으면 Valid
    assign o_ready = (state != 2); // 꽉 차지 않았으면 받을 수 있음 (핵심!)

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            state <= 0;
        end else begin
            case (state)
                0: if (i_valid) begin // Empty -> Load
                       main_reg <= i_data;
                       state <= 1; 
                   end
                1: begin // Normal Mode
                       if (i_ready && i_valid) main_reg <= i_data; // Pass-through
                       else if (!i_ready && i_valid) begin // Back-pressure! -> Skid
                           skid_reg <= i_data; // Save to Skid
                           state <= 2;
                       end
                       else if (i_ready && !i_valid) state <= 0; // Drain
                   end
                2: if (i_ready) begin // Full -> Output Skid
                       main_reg <= skid_reg; // Move Skid to Main
                       state <= 1;
                   end
            endcase
        end
    end
    assign o_data = main_reg;
endmodule

// Skid Buffer conceptual pseudocode
module skid_buffer (
    input clk, rst_n,
    // Upstream (Input)
    input  i_valid,
    output o_ready,
    input  [31:0] i_data,
    // Downstream (Output)
    output o_valid,
    input  i_ready,
    output [31:0] o_data
);
    // Internal state: 0=Empty, 1=Busy(Normal), 2=Full(Skid Active)
    reg [1:0] state; 
    reg [31:0] main_reg, skid_reg;

    // Output Logic
    assign o_valid = (state != 0); // Valid if data exists
    assign o_ready = (state != 2); // You can receive it if it's not full (key!)

    always @(posedge clk or negedge rst_n) begin
        if (!rst_n) begin
            state <= 0;
        end else begin
            case (state)
                0: if (i_valid) begin // Empty -> Load
                       main_reg <= i_data;
                       state <= 1; 
                   end
                1: begin // Normal Mode
                       if (i_ready && i_valid) main_reg <= i_data; // Pass-through
                       else if (!i_ready && i_valid) begin // Back-pressure! -> Skid
                           skid_reg <= i_data; // Save to Skid
                           state <= 2;
                       end
                       else if (i_ready && !i_valid) state <= 0; // Drain
                   end
                2: if (i_ready) begin // Full -> Output Skid
                       main_reg <= skid_reg; // Move Skid to Main
                       state <= 1;
                   end
            endcase
        end
    end
    assign o_data = main_reg;
endmodule

6. Conclusion: When should we use it?

Skid buffers are not a panacea. They take up more space due to the MUX and additional registers. They should be used strategically in the following situations:

Long Path Breaking: When the distance between modules is long and the routing delay is large.
Timing Closure: When a Setup Violation occurs in the Ready signal path.
Modular Design: When designing IP, when you want to break the timing dependency with the outside world (place Skid Buffer on the Output side).

Related articles

✅[RTL] Fan-in and Fan-out: The Hidden Causes of Timing Issues and Solutions

References: ZipCPU

[RTL] RTL Valid-Ready Handshake and Skid Buffer

1. Fundamentals of Data Flow Control: Valid-Ready Handshake

Core Rules (The Handshake)

2. Problem: Timing issue with Ready signal

3. Solution: What is a Skid Buffer?

How it works: "Receive first, then think about it."

4. Skid Buffer Types and Trade-offs

Half-Bandwidth Skid Buffer (Simple Register)

Full-Bandwidth Skid Buffer (True Skid Buffer)

5. Core Verilog Implementation (Full-Bandwidth)

6. Conclusion: When should we use it?

[FPGA] Solving Timing Violations: False Path and Multicycle Path

AMBA - 1 APB overview

[Verilog] Simulation Environment Settings (EDA Playground, Icarus Verilog)

[Verilog] FSM (Finite State Machine) RTL Design Principles

AMBA - 3 AXI overview

[FPGA] Block memory module Setup and Usage guide

Sitemap

Category

Information

1. Fundamentals of Data Flow Control: Valid-Ready Handshake

Core Rules (The Handshake)

2. Problem: Timing issue with Ready signal

3. Solution: What is a Skid Buffer?

How it works: "Receive first, then think about it."

4. Skid Buffer Types and Trade-offs

Half-Bandwidth Skid Buffer (Simple Register)

Full-Bandwidth Skid Buffer (True Skid Buffer)

5. Core Verilog Implementation (Full-Bandwidth)

6. Conclusion: When should we use it?

Similar Posts

Sitemap

Category

Information