How To Fill A Register File In Vhdl

In that location are many ways to create a shift register in VHDL, though not all of them are equal. Yous tin dramatically reduce the number of consumed resources by choosing the right shift register implementation for your needs and FPGA compages.

A shift register implements a FIFO of fixed length. Every time a new chemical element enters the queue, information technology shifts the existing ones one place further abroad from the input. To understand the nuts of the shift annals, I recommend viewing the VHDL tutorial near the std_logic_vector.

This article will but consider the shift register, even though there exist data structures that utilise fewer resource for larger FIFOs. Read about how to create a band buffer FIFO in cake RAM to acquire nearly such a solution that'due south not a shift register.

While whatsoever shift register is suitable for creating generic, smaller buffers, at that place are methods of efficiently creating larger ones. Many FPGAs have logic elements that can double as specialized shift register primitives. You tin improve performance in magnitudes by existence mindful of how yous write your VHDL code.

The i-scrap shift annals with generic depth

Let'south outset have a wait at different methods of creating a one-chip shift register. The input and output of this shift register is a unmarried bit, a std_logic value. The depth is configurable through a generic abiding.

We volition utilise the entity declaration shown below for all of the following examples involving i-bit shift registers. To go along it uncomplicated, we're going to use the same entity for multiple architectures, even though the rst and enable inputs are unused by some of them.

entity shift_reg_1_width is   generic (     sr_depth : integer   );   port (     clk : in std_logic;      rst : in std_logic; -- Optional     enable : in std_logic; -- Optional      sr_in : in std_logic;     sr_out : out std_logic   ); end;

The 3 implementations that follow volition synthesize into the aforementioned logic. These are the resources consumed for Xilinx, Intel (Altera), and Lattice FPGAs with the sr_depth generic prepare to 128.

Xilinx Vivado (Zynq): four LUTs (LUTRAM) + 2 FFs
Intel Quartus II (Cyclone V): 11 ALMs + 1 BRAM
Lattice IceCube2 (iCE40): 128 FFs

Even though the shift annals should require 128 flip-flops (FFs), we see that the resource usage reported by Vivado and Quartus is far less. Instead of using expensive FFs, the synthesis tools have used special built-in features of the logic blocks.

Lookup tables (LUTs) used in such a style are counted every bit "LUTRAM" in the Vivado resource usage written report. In the Intel FPGA, adaptive logic modules (ALMs) and one block RAM (BRAM) is used instead of flip-flops. Intel uses a technology that they telephone call ALTSHIFT_TAPS for implementing RAM-based shift registers.

The Lattice iCE40 FPGA, which I used in this example, doesn't have an alternative pick for packing 1-bit shift annals, so information technology's implemented entirely in 128 FFs. However, nosotros shall come across later in this commodity that the Lattice device tin pack wider shift registers into block RAM.

Slicing the vector

The well-nigh straightforward way to create a shift register is to apply vector slicing. Insert the new element at one stop of the vector, while simultaneously shifting all of the others one identify closer to the output side. Put the code in a clocked procedure and tap the terminal bit in the vector, and you have your shift register.

architecture slicing of shift_reg_1_width is    indicate sr : std_logic_vector(sr_depth - two downto 0);  brainstorm    procedure(clk)   begin     if rising_edge(clk) then        sr <= sr(sr'high - 1 downto sr'low) & sr_in;       sr_out <= sr(sr'high);      terminate if;   stop process;  end architecture slicing;

Using a for-loop

Merely like vector slicing, the for-loop also works in all revisions of the VHDL linguistic communication. This arroyo requires an additional line of lawmaking to assign the input to the vector. Remember that every iteration of the for-loop is executed in zero time in RTL code considering there's no look-statement inside of it. Therefore, this code is logically equivalent to the previous example.

architecture for_loop of shift_reg_1_width is    signal sr : std_logic_vector(sr_depth - 2 downto 0);  begin    process(clk)   begin     if rising_edge(clk) then        for i in sr'high downto sr'low + 1 loop         sr(i) <= sr(i - ane);       cease loop;        sr(sr'low) <= sr_in;       sr_out <= sr(sr'loftier);      terminate if;   cease process;  end architecture for_loop;

Using the shift_left role

While the previous examples piece of work for vectors likewise as arrays of any kind, using the shift_left role just works with bit vectors. The definition of the shift_left function and it'south complementary shift_right part appears in the ieee.numeric_std package. It requires an unsigned vector as the first parameter, and that's why it'due south not suitable for arrays of capricious data types.

architecture ieee_shift_left of shift_reg_1_width is    indicate sr : unsigned(sr_depth - 2 downto 0);  begin    process(clk)   brainstorm     if rising_edge(clk) so        sr <= shift_left(sr, 1);       sr(sr'depression) <= sr_in;       sr_out <= sr(sr'loftier);      end if;   stop process;  finish compages ieee_shift_left;

Enable input

ilinx FDCE-flip-flop — Xilinx FDCE flip-flop archaic

Most FPGA architectures have flip-flops with an optional enable (E) or clock enable (CE) input. This functionality can't be utilized by whatsoever other logic when yous are using it for the shift register. Thus, the boosted enable input won't consume extra resources.

Wrap the lawmaking that's responsible for shifting with an if enable = '1' then statement. Then, input and output from the shift register will nevertheless occur on the ascent edge of the clock, but only when the enable input is asserted.

The code below shows the previous case with the enable input added to the implementation.

architecture with_enable of shift_reg_1_width is    bespeak sr : unsigned(sr_depth - two downto 0);  begin    process(clk)   begin     if rising_edge(clk) and so        if enable = '1' then         sr <= shift_left(sr, 1);         sr(sr'low) <= sr_in;         sr_out <= sr(sr'high);       end if;      stop if;   end process;  terminate architecture with_enable;

The drawback of having a reset value

You should be cautious about adding reset values to the shift annals vector or output. The problem is that it prevents the synthesis tool from packing the shift register into LUTs or BRAM. Consider the case below, which is the same as the first one in this article, but with synchronous reset added.

compages slicing_with_rst of shift_reg_1_width is    signal sr : std_logic_vector(sr_depth - 2 downto 0);  brainstorm    process(clk)   begin     if rising_edge(clk) then       if rst = 'i' then         sr <= (others => '0');         sr_out <= '0';        else         sr <= sr(sr'loftier - one downto sr'low) & sr_in;         sr_out <= sr(sr'loftier);        cease if;     finish if;   terminate process;  end architecture slicing_with_rst;

At present, the resource usage for both Xilinx and Intel FPGAs has jumped to 128 FFs. For Intel FPGAs, it makes the most sense to measure resource usage in ALMs, but they contain the same number of FFs as the other FPGAs used.

Xilinx Vivado (Zynq): 128 FFs
Intel Quartus 2 (Cyclone V): 33 ALMs
Lattice IceCube2 (iCE40): 128 FFs

The synchronous reset has forced the synthesis tool to implement the shift annals entirely in FFs. Therefore, you should ask yourself if y'all need to be able to reset the entire shift register at in one case.

Using a counter to reset the output

After reset, everything that's in the shift register is invalidated. Usually, you don't intendance what the invalid information is, the purpose of the reset is to avoid passing it on to downstream modules. An alternative to resetting the entire shift register is to monitor where the valid data starts inside of it.

While invalid data is at the output, you forrad your reset value. And so, when valid information reaches the output, you lot outset sampling the existent output from the shift register. As long as you don't tap the shift register other than at the output, the beliefs will exist indistinguishable from true synchronous reset.

The code below uses a counter indicate to implement synchronous reset of the shift register output.

architecture rst_counter of shift_reg_1_width is    signal sr : std_logic_vector(sr_depth - 2 downto 0);   indicate rst_counter : integer range 0 to sr_depth - 1;  brainstorm    process(clk)   brainstorm     if rising_edge(clk) then       if rst = '1' then         rst_counter <= 0;         sr_out <= '0';        else          sr <= sr(sr'high - 1 downto sr'depression) & sr_in;          if rst_counter = sr_depth - 1 then           sr_out <= sr(sr'high);         else           rst_counter <= rst_counter + 1;           sr_out <= '0';         end if;        end if;     finish if;   end process;  cease architecture rst_counter;

As we tin encounter from the resource usage below, Xilinx has gone back to using LUTRAM and Intel to ALMs and a single BRAM primitive. The implementation uses a bit more logic for the new counter, just overall the saving is substantial. The exception is the Lattice device, which consumes more than than before considering it doesn't accept primitives for optimizing this kind of shift register.

Xilinx Vivado (Zynq): 12 LUTs (4 LUTRAM) + ix FFs
Intel Quartus Ii (Cyclone Five): 17 ALMs + 1 BRAM
Lattice IceCube2 (iCE40): 18 LUTs + 135 FFs

Shift register with generic depth and width

Let's keep to await at an implementation of a shift annals with configurable data width also. The code below shows the entity used in the following examples. It has 2 generic inputs, one for setting the depth and i for the width of the data elements.

entity shift_reg_generic_width is   generic (     sr_depth : integer;     sr_width : integer   );   port (     clk : in std_logic;     rst : in std_logic; -- Optional      sr_in : in std_logic_vector(sr_width - one downto 0);     sr_out : out std_logic_vector(sr_width - 1 downto 0)   ); end;

In the examples below, we will assign 128 to the sr_depth generic and 16 to the sr_width generic.

Without reset

In the code below, nosotros have converted the slicing case without reset to have configurable width as well every bit depth. We synthesize it with an input width of 16, meaning that it can store 16 times as many bits as the first case in this commodity. Let's encounter if the resources usage is multiplied by 16 too.

architecture slicing of shift_reg_generic_width is    type sr_type is array (sr_depth - 2 downto 0)     of std_logic_vector(sr_width - one downto 0);   signal sr : sr_type;  begin    process(clk)   begin     if rising_edge(clk) then        sr <= sr(sr'high - i downto sr'depression) & sr_in;       sr_out <= sr(sr'high);      finish if;   end procedure;  cease architecture slicing;

The resource utilization is listed below. Xilinx has gone from four to 64 LUTs, and from 2 to 32 FFs, 16 times the resource exactly. Intel, on the other hand, has risen from 11 ALMs to 20 ALMs, however using one BRAM. The reason for the small-scale increase is considering the BRAM can accommodate far more information than we are asking for in the beginning place, information technology just needs a chip more control logic.

Xilinx Vivado (Zynq): 64 LUTs (LUTRAM) + 32 FFs
Intel Quartus II (Cyclone 5): twenty ALMs + ane BRAM
Lattice IceCube2 (iCE40): 18 LUTs + one BRAM + 25 FFs

Finally, the Lattice FPGA has managed to pack the wider shift register into block RAM. Its resource usage is now on par with Xilinx and Intel.

With smart reset counter

The concluding case in this article is a shift annals with generic width and depth, using synchronous reset. The code below shows the implementation which uses the reset counter that we discussed earlier in this article.

architecture counter_rst of shift_reg_generic_width is    blazon sr_type is array (sr_depth - 2 downto 0)     of std_logic_vector(sr_width - 1 downto 0);   signal sr : sr_type;  signal rst_counter : integer range 0 to sr_depth - 1;  begin    process(clk)   begin     if rising_edge(clk) then       if rst = '1' then         rst_counter <= 0;         sr_out <= (others => '0');        else          sr <= sr(sr'high - 1 downto sr'low) & sr_in;          if rst_counter = sr_depth - 1 and so           sr_out <= sr(sr'loftier);         else           rst_counter <= rst_counter + 1;           sr_out <= (others => '0');         end if;        end if;     finish if;   cease procedure;  end compages counter_rst;

Nosotros can see from the listing below that the Xilinx FPGA needs eight additional regular LUTs and seven FFs for implementing the counter reset. Intel Quartus II yet somehow reports the same resource usage equally without reset. The Lattice FPGA consumes 24 more LUTs and 31 more FFs for implementing the counter, but the shift annals all the same fits in i BRAM.

Xilinx Vivado (Zynq): 72 LUTs (64 LUTRAM) + 39 FFs
Intel Quartus II (Cyclone V): 20 ALMs + 1 BRAM
Lattice IceCube2 (iCE40): 42 FFs + 1 BRAM + 56 FFs

Effort out all the different shift registers in ModelSim for free. Use the form below to download the ModelSim project with all the code. Accept it up and running inside minutes!

Controlling the RAM usage with synthesis attributes

The synthesis tools volition try to use the FPGA primitives that give the highest device utilization. The cheapest resource is BRAM, then comes distributed RAM or LUTRAM, and finally, FFs which are the most valuable.

But you can override the automatic option by using a synthesis attribute, also known as a pragma or compiler directive. The different FPGA vendors have their own sets of VHDL attributes. To specify a desired primitive blazon, yous define the attribute in the architecture region of the VHDL file, referencing your shift register array or vector by proper name.

Xilinx Vivado

The Xilinx UG901 user guide lists all synthesis attributes that are recognized by Vivado. The shreg_extract attribute and the srl_style attribute are the ones that command shift annals synthesis.

Setting the shreg_extract aspect to "no" disables all shift register optimization. This setting acts like a principal switch, overriding other SRL synthesis settings. Yous can also assign "aye" to shreg_extract, but this is the default setting anyway.

attribute shreg_extract : cord; -- attribute shreg_extract of sr : signal is "yeah"; attribute shreg_extract of sr : signal is "no";

Remember to replace sr with the name of your shift register signal.

The next attribute of interest is srl_style. It'due south a asking to the synthesis tool to implement the shift annals in a specific type of archaic. Notation that this is not a magic pill. For example, you can't force the synthesis tool to implement the shift register in cake RAM when you insist on having reset values. It'south non possible.

attribute srl_style : string; --aspect srl_style of sr : point is "annals"; --attribute srl_style of sr : point is "srl"; --attribute srl_style of sr : betoken is "srl_reg"; --attribute srl_style of sr : betoken is "reg_srl_reg"; attribute srl_style of sr : signal is "block";

The possible values are:

register: Simply use registers (aka flip-flops)
srl: Use only SRL structures
srl_reg: Apply an SRL structure with 1 trailing register
reg_srl: Use an SRL structure with 1 preceding register
reg_srl_reg: SRL construction with preceding and trailing registers
block: Employ block RAM

Intel Quartus II

Shift register optimization in Intel FPGAs can be turned on or off by setting one of the post-obit synthesis attributes. The default value is auto.

attribute altera_attribute : cord; -- attribute altera_attribute of sr : --   signal is "-proper noun AUTO_SHIFT_REGISTER_RECOGNITION always"; -- attribute altera_attribute of sr : --   signal is "-proper name AUTO_SHIFT_REGISTER_RECOGNITION off"; attribute altera_attribute of sr :   point is "-proper noun AUTO_SHIFT_REGISTER_RECOGNITION car";

Refer to the Quartus Prime Settings File Reference Manual for a more detailed explanation of this and other synthesis attributes.

Lattice iCEcube2 / Synplify Pro

Lattice iCECube2 uses Synopsis Synplify Pro as its synthesis engine. Therefore, these attributes volition also work with other vendors that use Synplify Pro likewise.

attribute syn_srlstyle : string; -- attribute syn_srlstyle of sr : signal is "registers"; -- attribute syn_srlstyle of sr : bespeak is "distributed"; attribute syn_srlstyle of sr : betoken is "block_ram";

The aspect lets you select between cake RAM, registers, and distributed RAM (if available on the flake).

Final remarks

Shift registers are user-friendly for implementing small FIFOs, among other things. All the same, when creating big FIFOs, you should consider using a different data structure like a ring buffer or an AXI FIFO, which is more suitable for cake RAM.

I should add together that it'due south difficult to compare resources usage between vendors. The compages is dissimilar, LUTs have different sizes, and the reports from the identify and road (PAR) tools come in various formats.

You need to take into consideration the target architecture when creating a shift register. That'due south what you should have away from this article, not that ane FPGA is improve than the other.

I'chiliad from Norway, just I live in Bangkok, Thailand. Before I started VHDLwhiz, I worked as an FPGA engineer in the defence industry. I earned my master'due south degree in informatics at the University of Oslo.