Round Robin Arbiter | Architecture and SystemVerilog RTL Code

Updated: May 16, 2024

If you have gone through my fixed priority arbiter article, you must have noticed one biggest drawback of that design. The presence of static priority for the blocks (requestor) means that in some conditions it is possible that a block having low priority might not get serviced at all. This is called starvation, and it might be ok in some designs, but in some designs, we might need to avoid that. One way to avoid it is by rotating the priority. Once a requesting block gets serviced, the highest priority is given to the next block. When that block gets serviced, highest priority is given to the next block, and the cycle goes on in a round-robin fashion. That is why this arbiter is called a Round Robin Arbiter and it is used to avoid starvation in a design. The maximum amount of time a requestor must wait depends on the number of requestors in this design.

There are various designs for a Round Robin Arbiter. This paper from Matt Weber is my go-to article whenever someone askes me for a good resource on Arbiters. I will try to explain the designs for round-robin arbiter mentioned in the paper.

Check out this code below:

always_comb begin
	case (pointer_reg)
		2'b00 :
			if (req[0]) grant = 4'b0001;
			else if (req[1]) grant = 4'b0010;
			else if (req[2]) grant = 4'b0100;
			else if (req[3]) grant = 4'b1000;
			else grant = 4'b0000;
		2'b01 :
			if (req[1]) grant = 4'b0010;
			else if (req[2]) grant = 4'b0100;
			else if (req[3]) grant = 4'b1000;
			else if (req[0]) grant = 4'b0001;
			else grant = 4'b0000;
      		2'b10 :
			if (req[2]) grant = 4'b0100;
			else if (req[3]) grant = 4'b1000;
			else if (req[0]) grant = 4'b0001;
			else if (req[1]) grant = 4'b0010;
			else grant = 4'b0000;
		2'b11 :
			if (req[3]) grant = 4'b1000;
			else if (req[0]) grant = 4'b0001;
			else if (req[1]) grant = 4'b0010;
			else if (req[2]) grant = 4'b0100;
			else grant = 4'b0000;
	endcase // case(req)
end

This code is for a RR Arbiter having a 4-bit request bus. In the case statement, we are rotating the priority every time a requestor gets serviced (notice the first if condition in all the 4 cases). The priority is rotated according to the variable pointer_reg, and pointer_reg changes whenever a requestor gets serviced. For example, if block number 0 is serviced, the highest priority will then be block number 1. When block number 1 is serviced, the highest priority will be given to block number 2. With this explanation, we can come up with the logic for evaluating pointer_reg value as follows:

logic [1:0] pointer_req, next_pointer_req;
  
  always @(posedge clock) begin
    if (reset) pointer_req <= '0;
    else       pointer_req <= next_pointer_req;
  end
  
  always_comb begin
    assign next_pointer_req = 2'b00;
    casez (gnt)
      4'b0001: next_pointer_req = 2'b01;
      4'b0010: next_pointer_req = 2'b10;
      4'b0100: next_pointer_req = 2'b11;
      4'b1000: next_pointer_req = 2'b00;
    endcase
  end

Just like fixed priority arbiter, this design is not scalable. The code for higher number of input requests will be tedious to write and error-prone. There are other designs mentioned in the paper that are scalable and quite easy to parameterize.

The Rotate + Priority + Rotate Design:

Picture taken from the same paper by Matt Weber

Take the design “Rotate + Priority + Rotate” for example. It uses a fixed priority arbiter and right or left shifts the input to mimic rotating the priorities. Assume the first input in 4’b1011, there is no shifting this time since this is the first input. The output of this input when we send it to fixed priority arbiter will be 4’b0001, assuming LSB has the highest priority. Assume the next input to be 4’b0011. Normally, the output for this case will also be 4’b0001, but not in the RR arbiter case. Since this is the second input, we are left shifting it by 1, so that the input to fixed priority arbiter now become 4’b0001, and the output become 4’b0001 as well. But this is not our final output since we had the input shifted “left” by 1 before calculating the output grant signal and now we must rotate it “right” by 1 to get the final output, so that the output now becomes 4’b0010. This is an amazingly simple but effective design. The logic for pointer which decides the left and right rotation can easily be coded in the same way I mentioned earlier. This design is not very fast as a lot of operations are needed to generate the grant output, but it takes very minimal area when synthesized.

The MUXed parallel priority arbiters:

Picture taken from the same paper by Matt Weber

As it is evident from the diagram, this design is quite big. This design also uses a fixed priority arbiter, one fixed priority arbiter for every input requestor. It only gets bigger with increase in number of input requestors. But on the bright side, it is extremely fast. Since the output grant for all the priorities are already calculated and we are just selecting one output from them. Let us try to code this design for a 4-bit input request bus.

module muxed_rr_arb(
  input logic clock,
  	          reset,
  input logic [3:0] req,
  output logic [3:0] gnt
);
  
  logic [3:0] mux_ip0, mux_ip1, mux_ip2, mux_ip3;
  
  //Instantiate fixed priority arbiter and calculate the grant output for shifted priorities
  fixed_pri_arbiter inst0 (.req(req),    .gnt(mux_ip0));
  fixed_pri_arbiter inst1 (.req(req>>1), .gnt(mux_ip1));
  fixed_pri_arbiter inst2 (.req(req>>2), .gnt(mux_ip2));
  fixed_pri_arbiter inst3 (.req(req>>3), .gnt(mux_ip3));
  
  //Select line pointer calculation
  logic [1:0] pointer_req, next_pointer_req;
  
  always @(posedge clock) begin
    if (reset) pointer_req <= '0;
    else 	 pointer_req <= next_pointer_req;
  end
  
  always_comb begin
    assign next_pointer_req = 2'b00;
    casez (gnt)
      4'b0001: next_pointer_req = 2'b01;
      4'b0010: next_pointer_req = 2'b10;
      4'b0100: next_pointer_req = 2'b11;
      4'b1000: next_pointer_req = 2'b00;
    endcase
  end
  
  //Final output
  always_comb begin
    case (pointer_req)
      2'b00: gnt = mux_ip0;
      2'b01: gnt = mux_ip1;
      2'b10: gnt = mux_ip2;
      2'b11: gnt = mux_ip3;
    endcase
  end
  
endmodule

You can easily parameterize this design by using a generate for loop for instantiating the fixed priority arbiters.

Two Simple Priority Arbiters with a Mask:

This is the last design mentioned in the paper and it gives the best of both worlds.

Picture taken from the same paper by Matt Weber

This design uses two fixed priority arbiters. One is used for calculating grant output for the request input as it is, another is used for calculating the grant output with masked request input. Masked request is calculated by ANDing the input request with mask variable whose value changes every time a request is serviced. The calculation of mask variable is also provided in the paper as follows:

always@(/*AUTOSENSE*/pointer_reg) begin
	case (pointer_reg) // synopsys full_case parallel_case
		2'b00: req_mask <= 4'b1111;
		2'b01: req_mask <= 4'b1110;
		2'b10: req_mask <= 4'b1100;
		2'b11: req_mask <= 4'b1000;
	endcase
end

This is a highly effective solution, and it gives the best performance out of all the designs mentioned in the paper. The RTL code for this design is discussed in Interview QnA page. I will also discuss how we can parameterize this design in the same article.

Weighted RR Arbiter:

This is also a Round Robin Arbiter but with a small change. The change is that the input requestors have weights associated with them. Normal RR Arbiter cycles over the requestors and gives one service opportunity to each of the requestor, whereas weighted RR arbiter offers each requestor a fixed number of opportunities as specified by their weights. For example, let us assume requestor 0 has a weight value of 3 and requestor 1 has a weight value of 2. If in first cycle, input is 4’b0011, the requestor 0 will be serviced. Now that the requestor 0 is serviced, its weight gets reduced to 2. Requestor 1 still has weight value of 2. Assume the input is same in second cycle, i.e., 4’b0011. Ideally this time, requestor 1 should have gotten serviced according to Round-Robin scheme, but since requestors 0 and 1 have same weights and we are assuming LSB having highest priority, again requestor 0 will get serviced and its weight gets reduced to 1. Now if we see the same input in third cycle, the requestor 1 has more weight value and hence requestor 1 will get serviced this time. A possible implementation of weighted RR arbiter is discussed in the Interview QnA page.

Let me know what you think of this article. See you in the next one!

SiliconCrafters

High Quality VLSI Design Articles

Round Robin Arbiter | Architecture and SystemVerilog RTL Code

Recent Posts

Never Miss a Post. Subscribe Now!