[RTL] Asynchronous FIFO design

In the last RTL CDC article, we learned about the 2-FF Synchronizer that synchronizes single-bit signals and the dangers of multi-bit signals.

Now, it's time to design an Asynchronous FIFO, the "ultimate" building block of CDC theory and one of the most widely used blocks in SoC and FPGA design. It's a crucial gateway for safely transferring large amounts of data between different clock domains.

Beyond simply copying and using code, we will delve into the internal principles of “why on earth do we need to convert pointers to Gray Code” and “what are the criteria for creating Full/Empty flags”.

1. RTL design challenges of asynchronous FIFOs

Async FIFO

The principle of FIFO (First-In-First-Out) is simple.

  • Write Pointer (wptr): Address to write data to. Increments by 1 when data arrives.
  • Read Pointer (rptr): Address from which data will be read. Incremented by 1 when data is output.
  • Empty: wptr == rptr
  • Full: When wptr has completed one lap and caught up with rptr.

In a synchronous FIFO, both pointers share the same clock, making comparison easy. However, in an asynchronous FIFO, the Write Clock domain and Read Clock domain are different.

I need to bring the other party's pointer to my domain and compare them, but this causes two fatal problems.

  1. Metastability: The value may change at the moment the opponent's pointer is sampled.
  2. Data Incoherency (Multi-bit Skew): Pointers are composed of multiple bits. For example, when changing from binary 0111 to 1000, four bits change simultaneously. Due to the wiring delay (skew), the receiver may see an inaccurate value, such as 1111 or 0000, for a split second.

The key to solving this problem is Gray Code.

2. Solution: Gray Code

Gray code has the property that only one bit changes when changing to an adjacent number.

  • Binary: 0111 (7) -> 1000 (8) (4 bit change -> Danger!)
  • Gray: 0100 (7) -> 1100 (8) (1 bit change -> safe!)

If Gray code is used, even if metastability or skew occurs, the receiver will only perceive it as either "before the value changed (7)" or "after the value changed (8)". This prevents pointer jumps because erroneous values ​​in the middle cannot occur.

So the basic principles of Asynchronous FIFO design are:

“Convert the pointer to my domain into Gray code and pass it to the other party.”

3. The Secret to Creating Full/Empty: Pessimistic Design

The most important concept in CDC design is pessimistic design. 보수적(Pessimistic) 설계.

There is a delay of at least 2-3 clocks while the other party's pointer passes through the 2-FF Synchronizer. In other words, the other party's pointer that I am looking at now is "not the current value, but the past value from a little while ago."

(1) Empty Generation (Read Domain)

  • Emptyis created in the Read domain.
  • Condition: rptr == synchronized_wptr
  • The Write Pointer keeps increasing, but it may not appear to be increasing in the Read domain due to synchronization delays.
  • Result: In reality, even if 1 or 2 pieces of data are entered and the Empty is resolved, it can still be judged as Empty.
  • Is it okay? Yes! It's safe because stopping a read when it determines there's no data is only a performance degradation (throughput), not a functional failure (underflow).

(2) Full Generation (Write Domain)

  • Fullis created in the Write domain.
  • Full conditions in Gray Code are a bit complicated.
    • MSB must be different (one wheel difference)
    • The 2nd MSB must also be different (due to the symmetry of Gray Code)
    • The remaining bits must be the same
  • The Read Pointer continues to read data and free up space, but the Write domain may still appear full due to synchronization delays.
  • Result: In reality, there is space, but it is judged to be full and stops writing.
  • Is it okay? Yes! It's safe to load Full early, as the goal is to prevent overflow.

4. Verilog Implementation (Full Source Code)

Here are five complete modules of code that you can use right away in practice.

(1) Asynchronous FIFO Top Module

module async_fifo #(
    parameter DSIZE = 8,  // Data Size
    parameter ASIZE = 4   // Address Size (Depth = 2^4 = 16)
)(
    input  wire             wclk, winc, wrst_n,
    input  wire             rclk, rinc, rrst_n,
    input  wire [DSIZE-1:0] wdata,
    output wire [DSIZE-1:0] rdata,
    output wire             wfull,
    output wire             rempty
);

    wire [ASIZE-1:0] waddr, raddr;
    wire [ASIZE:0]   wptr, rptr, wq2_rptr, rq2_wptr;

    // 1. Dual Port RAM
    fifomem #(DSIZE, ASIZE) u_mem (
        .wdata(wdata), .waddr(waddr), .wclk(wclk), .wclken(winc & ~wfull),
        .rdata(rdata), .raddr(raddr), .rclk(rclk), .rclken(rinc & ~rempty)
    );

    // 2. Synchronizers (2-FF)
    sync_r2w u_sync_r2w (.wclk(wclk), .wrst_n(wrst_n), .rptr(rptr), .wq2_rptr(wq2_rptr));
    sync_w2r u_sync_w2r (.rclk(rclk), .rrst_n(rrst_n), .wptr(wptr), .rq2_wptr(rq2_wptr));

    // 3. Write Control Logic (Binary -> Gray, Full Logic)
    wptr_full #(ASIZE) u_wptr_full (
        .wclk(wclk), .wrst_n(wrst_n), .winc(winc),
        .wq2_rptr(wq2_rptr),
        .wfull(wfull), .waddr(waddr), .wptr(wptr)
    );

    // 4. Read Control Logic (Binary -> Gray, Empty Logic)
    rptr_empty #(ASIZE) u_rptr_empty (
        .rclk(rclk), .rrst_n(rrst_n), .rinc(rinc),
        .rq2_wptr(rq2_wptr),
        .rempty(rempty), .raddr(raddr), .rptr(rptr)
    );

endmodule

(2) FIFO Memory (Dual Port RAM)

module fifomem #(
    parameter DSIZE = 8, // Data Width
    parameter ASIZE = 4  // Address Width
)(
    input  wire             wclk, wclken,
    input  wire [ASIZE-1:0] waddr,
    input  wire [DSIZE-1:0] wdata,
    input  wire             rclk, rclken,
    input  wire [ASIZE-1:0] raddr,
    output wire [DSIZE-1:0] rdata
);
    // 2^ASIZE depth memory array
    reg [DSIZE-1:0] mem [0:(1<<ASIZE)-1];

    // Write Logic
    always @(posedge wclk) begin
        if (wclken) mem[waddr] <= wdata;
    end

    // Read Logic (No Reset needed for memory cells usually)
    assign rdata = mem[raddr];

endmodule

(3) Synchronizers (2-FF)

A 2-stage flip-flop for safe pointer passing.

// Read Pointer to Write Domain Synchronizer
module sync_r2w #(
    parameter ASIZE = 4
)(
    input  wire             wclk, wrst_n,
    input  wire [ASIZE:0]   rptr,
    output reg  [ASIZE:0]   wq2_rptr
);
    reg [ASIZE:0] wq1_rptr;

    always @(posedge wclk or negedge wrst_n) begin
        if (!wrst_n) begin
            wq1_rptr <= 0;
            wq2_rptr <= 0;
        end else begin
            wq1_rptr <= rptr;
            wq2_rptr <= wq1_rptr;
        end
    end
endmodule

// Write Pointer to Read Domain Synchronizer
module sync_w2r #(
    parameter ASIZE = 4
)(
    input  wire             rclk, rrst_n,
    input  wire [ASIZE:0]   wptr,
    output reg  [ASIZE:0]   rq2_wptr
);
    reg [ASIZE:0] rq1_wptr;

    always @(posedge rclk or negedge rrst_n) begin
        if (!rrst_n) begin
            rq1_wptr <= 0;
            rq2_wptr <= 0;
        end else begin
            rq1_wptr <= wptr;
            rq2_wptr <= rq1_wptr;
        end
    end
endmodule

(4) Read Control Logic (Empty Generation)

This is the logic for converting Binary to Gray and generating the Empty flag.

module rptr_empty #(
    parameter ASIZE = 4
)(
    input  wire             rclk, rrst_n, rinc,
    input  wire [ASIZE:0]   rq2_wptr, // Synced Write Pointer (Gray)
    output reg              rempty,
    output wire [ASIZE-1:0] raddr,
    output reg  [ASIZE:0]   rptr      // Read Pointer (Gray)
);
    reg  [ASIZE:0] rbin;
    wire [ASIZE:0] rbin_next, rgray_next;

    // 1. Binary Counter Update
    // Accept rinc only if it is not empty
    assign rbin_next = rbin + (rinc & ~rempty);
    assign raddr     = rbin[ASIZE-1:0];

    // 2. Binary to Gray Conversion
    assign rgray_next = (rbin_next >> 1) ^ rbin_next;

    // 3. Empty Flag Generation
    // Condition: Read Pointer(Gray) == Synced Write Pointer(Gray)
    wire rempty_val = (rgray_next == rq2_wptr);

    always @(posedge rclk or negedge rrst_n) begin
        if (!rrst_n) begin
            rbin   <= 0;
            rptr   <= 0;
            rempty <= 1; // It should be Empty in Reset state
        end else begin
            rbin   <= rbin_next;
            rptr   <= rgray_next;
            rempty <= rempty_val;
        end
    end
endmodule

(5) Write Control Logic (Full Generation)

This is the logic for converting Binary to Gray and generating the Full flag.

module wptr_full #(
    parameter ASIZE = 4
)(
    input  wire             wclk, wrst_n, winc,
    input  wire [ASIZE:0]   wq2_rptr, // Synced Read Pointer (Gray)
    output reg              wfull,
    output wire [ASIZE-1:0] waddr,
    output reg  [ASIZE:0]   wptr      // Write Pointer (Gray)
);

    reg  [ASIZE:0] wbin;
    wire [ASIZE:0] wbin_next, wgray_next;

    // 1. Binary Counter Update
    // Accept winc only when not full
    assign wbin_next = wbin + (winc & ~wfull);
    assign waddr     = wbin[ASIZE-1:0];

    // 2. Binary to Gray Conversion
    assign wgray_next = (wbin_next >> 1) ^ wbin_next;

    // 3. Full Flag Generation
    // Condition (Gray Code): MSB and 2nd MSB must be different, and the rest must be the same
    wire wfull_val = (wgray_next == {~wq2_rptr[ASIZE:ASIZE-1], wq2_rptr[ASIZE-2:0]});

    always @(posedge wclk or negedge wrst_n) begin
        if (!wrst_n) begin
            wbin  <= 0;
            wptr  <= 0;
            wfull <= 0;
        end else begin
            wbin  <= wbin_next;
            wptr  <= wgray_next;
            wfull <= wfull_val;
        end
    end
endmodule

5. Conclusion and Summary

Asynchronous FIFO may look like a data storage container on the surface, but it hides a sophisticated strategy to solve the CDC problem.

  1. Using Gray Code: Prevents glitching when multi-bit pointers cross domains.
  2. Pessimistic Design We allow Full/Empty to appear slightly earlier due to synchronization delays, but we never allow it to appear later. (Safety First)
  3. 2n Depth: To simplify the Full/Empty logic by taking advantage of the symmetry of Gray Code, the depth of FIFO is usually set to a power of 2 (16, 32, 1024, etc.).

References: VLSI verify

Similar Posts