

# Design guidelines for timing closure

**Thomas Zerrer** 



#### Slowest speed grades for Xilinx devices:

| Product Version | FPGA Family         | Link Speed / Link Width |                   |                |                      |                    |
|-----------------|---------------------|-------------------------|-------------------|----------------|----------------------|--------------------|
|                 |                     | Gen 1, 2.5 Gbps         | Gen 2, 5 Gbps     | Gen 3, 8 Gb    | ps                   | Gen 4, 16 Gbps     |
|                 |                     | X1 / X2 / X4 / X8       | X1 / X2 / X4 / X8 | X1 / X2 / X4   | X8                   | X1 / X2 / X4 (***) |
| 64-Bit          | Artix 7             | -1                      | -2 for X1/X2*     |                |                      |                    |
| 64-Bit          | Kintex 7            | -1 / -2 for X8**        | -1 / -2 for X4    |                |                      |                    |
| 256-Bit         | Artix 7             | -1                      | -2                |                |                      |                    |
| 256-Bit         | Kintex 7            | -1                      | -1 / -2 for X8    |                |                      |                    |
| 256-Bit         | Virtex 7            | -1                      | -1 / -2 for X8    | -1 / -2 for X4 | -3 / -2 <sup>x</sup> |                    |
| 256-Bit         | Ultrascale          | -1                      | -1                | -1             | -2 / -1 <sup>X</sup> |                    |
| 256-Bit         | Ultrascale+ / MPSoC | -1                      | -1                | -1             | -1                   | -1                 |

Table 1

If more channels are used, it might be possible that a higher speedgrade has to be selected. Contact Smartlogic in this case for a recommendation.

- This table lists the minimum speedgrade required for the IP Core and for the Xilinx Hard IP
- Speedgrades with X only meet timing when using a maximum of 2 DMA Write and 2 DMA Read interfaces. Speedgrades without X were experimentally tested with 9 Write and 8 Read interfaces.
- The following Link Speed / Link width combinations need special attention in order to achieve timing closure:

64-Bit core : Gen2-X4 256-Bit Core : Gen3-X8

<sup>(1)</sup> Gen 2 – X4 is not supported for the 64 Bit version for Artix, use the 256 Bit version instead. Artix does not support x8 links.

<sup>(\*\*)</sup> Gen 2 – X8 is not supported for the 64-Bit version for Kintex, use 256-Bit version instead.

<sup>(\*\*\*)</sup> Gen 4 is supported by Xilinx only for specific devices. Please check device datasheets, if Gen4 is supported.

<sup>(</sup>X) Speedgrade is supported but with limitations (Maximum of 2 Read and 2 Write channels). See Chapter 5.5 for details.

This table has been validated with 8 independent read and 9 independent write channels except for speedgrades marked with (X).

### Slowest speed grades for Intel devices :

| Product Version | FPGA Family | Link Speed / Link Width |                   |              |                      |                    |
|-----------------|-------------|-------------------------|-------------------|--------------|----------------------|--------------------|
|                 |             | Gen 1, 2.5 Gbps         | Gen 2, 5 Gbps     | Gen 3, 8 Gl  | ops                  | Gen 4, 16 Gbps     |
|                 |             | X1 / X2 / X4 / X8       | X1 / X2 / X4 / X8 | X1 / X2 / X4 | X8                   | X1 / X2 / X4 (***) |
| 256-Bit         | Arria 10    | -3                      | -3                | -3           | -1 / -2 <sup>x</sup> |                    |
| 256-Bit         | Cyclone 5   | -8                      | -7*               |              |                      |                    |
| 256-Bit         | Stratix 10  | -3                      | -3                | -3           | -2                   | -2                 |
| 256-Bit         | Cyclone 10  | -6                      | -6*               |              |                      |                    |

Table 2

If more channels are used, it might be possible that a faster speedgrade has to be selected. Contact Smartlogic in this case for a recommendation.

- This table lists the minimum speedgrade required for the IP Core and the Intel Hard IP
- Speedgrades with X only meet timing when using a maximum of 2 DMA Write and 2 DMA Read interfaces. Speedgrades without X were experimentally tested with 9 Write and 8 Read interfaces.
- The following Link Speed / Link width combinations need special attention in order to achieve timing closure:
  256-Bit Core: Gen3-X8

<sup>(\*)</sup> Cyclone 5 and Cyclone 10 do not support x8 links.

<sup>(\*\*\*)</sup> Gen 4 is supported by Intel only for specific devices. Please check device datasheets, if Gen4 is supported.

<sup>(</sup>X) Speedgrade is supported but with limitations (Maximum of 2 Read and 2 Write channels). See Chapter 5.5 (UG) for details.

This table has been validated with 8 independent read and 9 independent write channels except for speedgrades marked with (X).





The upstream side of the core has up to 16 axi stream interfaces. The number is user configurable and each axi stream interface has its own data fifo with adjustable depth at compile time.

The FIFO can be built up with either BlockRAMs or with distributed RAMs.

| User mode library Function | Typical FIFO depth | Timing            | Comment                                                                                                                         |
|----------------------------|--------------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------|
| Distributed / MLAB RAM     | 4-6                | Fast clock to out | Use this RAM type for timing critical designs                                                                                   |
| BRAM                       | 9+                 | Slow clock to out | For Timing critical designs this RAM type is not recommended. You may try depths of 9 but this is not guaranteed to meet timing |





- In case the distributed / MLAB fifo depth of 6 is not sufficient, the user may add an additional AXI Stream FIFO in the datapath in front of the S\_AXIS interface of the core
- In case the User clock is below 250 MHz, the timing for this FIFO is relaxed and it should be possible to build this FIFO with BRAMs. A further advantage of this FIFO is, that it is a single clock domain FIFO
- Suitable FIFOs can be found in the IP catalog of the FPGA vendor
- In case that no AXI Stream FIFO is available, it is possible to instantiate a FIFO, where the inverse of the empty is connected as tready. The read input is ready, when the FIFO is not empty. Make sure, that the FIFO is configured as "Fallthrough" FIFO.





- The TDEST Inputs of each AXI Stream interface can be used to reduce the number of physical interfaces of the IP Core, while maintaining the number of destination databuffers in host memory.
- Sometimes it is overlooked, that each AXI Stream slave interface can reach ALL destination data buffers (up to 64).
- Therefore it is possible to reduce the number of interfaces by designing a FIFO mux structure within user logic. This will greatly improve timing closure in the critical 250 MHz pathes.
- Note: The TDEST inputs are only available in the HCC IP Core. The Flex IP Core does not have the TDEST inputs for s\_axis interfaces. For exact Timing see chapter 2.1 of the User guide.



| Feature                                                   | Parameter name                                     | Recommended value | comment                                                |  |
|-----------------------------------------------------------|----------------------------------------------------|-------------------|--------------------------------------------------------|--|
| Number of Upstream interfaces (s_axis)                    | Write_Data_Interfaces_in_use_g                     | 1-9               | Higher values may be                                   |  |
| Number of downstream interfaces (m_axis)                  | Read_Data_Interfaces_in_use_g                      | 1-8               | possible but are not guaranteed                        |  |
| RAM elements for s_axis data FIFOs                        | DMA_Write_Fifo_params_c.dFIFO_bram in dma_pkg.vhd  | false             |                                                        |  |
| RAM elements for m_axis data FIFOs                        | DMA_Read_Fifo_params_c.dFIFO_bram in dma_pkg.vhd   | false             |                                                        |  |
| Disable address fifo almost empty interrupts (upstream)   | DMA_Read_Implement_irq_sg_ae_regs_c in dma_pkg.vhd | false             | In this case the user has no ringbuffer                |  |
| Disable address fifo almost empty interrupts (downstream) | DMA_Read_Implement_irq_sg_ae_regs_c in dma_pkg.vhd | false             | support. True might be possible but is not guaranteed. |  |

It is also recommended to enable physical optimizations and higher P&R efforts within Vivado / Quartus Xilinx recommendation : Don't use Vivado 2020.1, use Vivado 2020.2 or higher where possible

## IP settings in case of 250 MHz and speedgrades marked with X



Table 1 marks some speedgrades with "X". In this case, the following settings are valid

| Feature                                                   | Parameter name                                        | Recommended value | comment                                                |  |
|-----------------------------------------------------------|-------------------------------------------------------|-------------------|--------------------------------------------------------|--|
| Number of upstream interfaces (s_axis)                    | Write_Data_Interfaces_in_use_g                        | 1-2               | Higher values might                                    |  |
| Number of downstream interfaces (m_axis)                  | Read_Data_Interfaces_in_use_g                         | 1-2               | be possible but are not guaranteed                     |  |
| RAM elements for s_axis data FIFOs                        | DMA_Write_Fifo_params_c.dFIFO_bram in dma_pkg.vhd     | false             |                                                        |  |
| RAM elements for m_axis data FIFOs                        | DMA_Read_Fifo_params_c.dFIFO_bram in dma_pkg.vhd      | false             |                                                        |  |
| Disable address fifo almost empty interrupts (upstream)   | DMA_Read_Implement_irq_sg_ae_regs_c<br>in dma_pkg.vhd | false             | In this case the user has no ringbuffer                |  |
| Disable address fifo almost empty interrupts (downstream) | DMA_Read_Implement_irq_sg_ae_regs_c<br>in dma_pkg.vhd | false             | support. True might be possible but is not guaranteed. |  |

It is also recommended to enable physical optimizations and higher P&R efforts within Vivado / Quartus



#### Software Settings that ensure low FIFO depths :

In case of several s-axis stream interfaces the channels are transmitted in round robin fashion, where each channel is allowed to transmit the amount of data contained in its associated IncrementLineOffset register. If the incrementLineOffset registers are set with high values, the FIFO buffers need also higher capacity, since they have to survive without reaching a "Full" until the time they are selected and can be emptied.

In case of more than one active interface, we advise the following settings:

- The IncrementLineOffsets for DMA Write should be set to 0x200.
  Note for Video Applications: If 0x200 does not match a complete line, use "stream" mode
- Channels that are only transmitting data from to time to time can be set to lower values but it must be a power of 2.
- Higher priority channels or channels that have the double data amount as others might be set to 0x400

Example: Video Data Transmission Y (16-Bit), Cr (8-Bit), Cb (8-Bit) and Audio data:

Y Channel 0x400

Cr Channel 0x200

Cb Channel 0x200

Audio channel 0x100 or 0x200