Abstract

The current trend for many application requiring data converters is to get closer and closer to a full SDR (Software Defined Radio) system. While SDR architecture brings many benefits in terms of flexibility and SWaP-C (Size, Weight, Power and Cost) it often translates into higher bandwidth capability and is directly linked to the data converter sampling speed with the Shannon-Nyquist theorem. And this complicates the interface between FPGA (Field Programmable Gate Array) and data converter. Indeed the speed at which FPGA process information is very limited compared to the amount of data generated by high-speed data converter. Of course, this is dealt through massive parallel processing. However transmitting and receiving this huge amount of data has become the system bottleneck as data needs to be transmitted in larger and larger quantities, faster and faster. This paper covers and compares the two means of interfacing at high-speed between FPGA and data converter currently used today: high-speed LVDS parallel interface and high-speed serial interface. It considers multiple aspects ranging from RF with trace length and signal integrity to the system level with cost and ease of development. It starts by introducing these 2 types of interfaces, comparing them and identifying their benefits and drawbacks. Then it discusses the FPGA design of a high-speed parallel interface at 1.5Gbps. It focuses on a transmission from an FPGA to a DAC (Digital to Analog Converter) using the example of an Arria V FPGA from Altera interfacing with an EV12DS460A from e2v. Before concluding, it covers a high-speed serial interface FPGA design at 6Gbps using the ESIstream (Efficient Serial Interface) protocol. It focuses on a transmission from an ADC (Analog to Digital Converter) to an FPGA using the example of an EV12AD500A from e2v interfacing with a Virtex 7 from Xilinx.

Keywords— ADC; DAC; FPGA; EV12AD500; EV12DS460; Arria V; Virtex 7; ESIstream; serial interface; parallel interface;

I. Introduction

The use of high-speed data converters is vastly increasing as more and more applications are looking toward them as a solution to improve by an order of magnitude, the performance and capabilities of their system; from communication (ground and satellite based) to high-energy physics (accelerator, synchrotron) application, including defence (electronic warfare, radar, radar jamming), industrial (cellphone test line, tank container monitoring), test and measurement (oscilloscope, spectrum analyzer, mass spectrometer), and earth observation (synthetic aperture radar) applications. Each of these application domains brings its own constraints and requirements among which is the choice of interface at these high-speed. The traditional way of interfacing with a data converter has been using a parallel interface as it is straight forward in terms of PCB and firmware design – each bit of the sample has its own path. However, at high-speed – above 1 GHz –, many parameters negligible at lower speed, start to limit the performance of the interface. Thus serial interface options have started to appear about 10 years ago and are now preferred in most of the applications. This paper aims at explaining how both interfaces work through a comparison of their benefits and drawbacks followed by two examples, one of each interface solution.

II. Comparison of parallel and serial interface

A parallel interface is defined simply as an interface using a certain number of lanes to transmit data plus a lane to transmit the clock between transmitter (TX) and receiver (RX) as can be seen in Figure 1.a; and a serial interface, as an interface using a certain number of lanes to transmit encoded data between TX and RX through high-speed transceivers comprising a serializer on the TX side and its counterpart, a deserializer, on the RX side as shown in Figure 1.b.

Figure 1: (a): Schema of a parallel interface; (b): Schema of a serial interface

A few differences can be noted from the definition of these two types of interface:

  • The clock signal is not transmitted for the serial interface. Indeed, it is not transmitted directly but recovered on the RX side through a CDR (Clock and Data Recovery) system which brings a few advantages mentioned later in this paper;
  • For the serial interface an encoding/decoding of the data is mandatory. It should be noted that encoding/decoding can be applied and useful in the case of parallel interface but is not mandatory for it to function. While for a serial interface, not having encoding/decoding results in BER (Bit Error Rate) loss for the transmission due to multiple effects introduced later in this paper.

Historically, parallel interface used to be the only solution available due to its easiness and direct implementation approach. With each digital clock cycle a bit value is transmitted on a lane. And the RX having access to the clock signal it recovers the data easily. This is true at low data rate, but when the data rate starts increasing, many difficulties add complexity to developing such an interface. To satisfy the increasing demand for bandwidth, the parallel interface solution is quickly limited in terms of data rate per lane and left only with the option of increasing the number of lanes to increase the bandwidth. Today parallel interface is being, in most cases, replaced by serial interface because of the bandwidth capabilities such interface allows reaching. A simple comparison of the data rate achievable for these two interfaces on the main FPGA manufacturers that are Xilinx and Altera/Intel show the benefit of the serial interface. Today parallel interface in FPGA are limited at 1.6Gbps; while high-speed serial transceiver can reach 32Gbps and even higher.

Looking on the digital side, it is visible that the parallel interface is a lot easier compared to the serial interface which needs encoding/decoding and the transceiver stages – even if parallel interface today offer some of these functions as well to improve performance. This translates into a huge latency advantage for the parallel interface which makes it vital for application like electronic warfare where a few nanoseconds can be the difference between being spotted or remaining invisible to enemies’ radar system. These stages used for the serial interface means that it also requires more resources from the FPGA (LUT and FIFO or elastic buffer). It is generally small enough not to be an issue but could complicate the closing of the timing within the FPGA when the application requires a large quantity of resources or high-speed digital design.

In terms of RF consideration, high-speed serial interface, running faster, need more care. Following the Shannon-Nyquist theorem, a transmission at Gbps contains frequency up to  . The insertion loss of a medium over the frequency band from DC to  is not flat. It can be assimilated to a low pass order with a response depending on  and the transmission medium used. If the overall attenuation at high frequency is too high, bit would be seen wrongly by the reception stage. This relates to the Inter-Symbol Interference (ISI) effect. Solution built in the transceiver exists to cope with this effect: emphasis and equalization but careful PCB layout on the fast serial lane helps.

Considering the timing, a few factors need to be taken into account. First of all timing uncertainties are the limiting factor for parallel interface speed. When timing uncertainties are small enough compared to the data period minus the meta-stable zone (setup and hold time) at the RX input, it is possible to configure a parallel interface – this relates to the opening of the eye at the RX input. Equation (1) below shows this relation when using an SDR (Single Data Rate) interface and equation (2) when using a DDR (Double Data Rate) interface:

With  the clock period;  the input setup time;   the input hold time;  the packages, PCB and bit to bit skews;   the clock and data jitter;   the PVT (Process- Voltage- Temperature) variations;   the rise and fall time of the data and  the clock duty cycle distortion. This induces a few constraints to respect. Firstly, the bit to bit skew should be reduced as much as possible; this can be done either by matching the PCB trace length of the data and/or adding controlled delay independently for each bit lanes. Secondly, the alignment between the clock and the data at the RX input needs to be controlled to avoid the data to arrive within the meta-stability zone. This can be achieved either by delaying the clock or the data compared to the other. The timing aspect is where the main complexity comes from for high-speed parallel interface; and it is limiting the maximum speed achievable.

On the other hand, a high-speed serial interface RX recovers the clock from the data stream – using the CDR. This means that the clock recovered has had the same timing effect as the data up to the CDR stage. Thus the timing effect on the data and the clock cancels each other allowing much faster speed. This is a simple way of translating the benefit of the serial interface and CDR; for more detailed information multiple articles, paper and presentation are available and discuss the CDR benefits, architecture and function. And increasing the data rate per lanes allows sending the same amount of data in a smaller number of lanes saving PCB real estate and easing PCB layout. The protocol is also used to align digitally multiple serial lanes together which mean that there is no need to match the trace lengths for a serial interface.

Finally at system level, high-speed transceivers are more costly resources. Even if their wide adoption has allowed a price decrease, system targeting low cost will prefer working with parallel interface or slow serial interface.

To conclude this section, nowadays, serial interface is the interface of choice in most of the application thanks to the benefits it brings in term of bandwidth capabilities but parallel interface are still necessary and used for latency constrained application. Low cost or low speed application could consider both solutions depending on different factors like development time, reuse of already developed sub-system and other application requirements.

III. Example of parallel interface between FPGA and  DAC

A.    System overview

This example of a parallel interface is based on the evaluation board or the EV12DS460A. It is composed of the EV12DS460A, a 12 bits, 6GSps DAC [1] and the Arria V from Altera/Intel [2]. The Figure 2 below show a block diagram of the system.

Figure 2: Block diagram of the FPGA-DAC interface

In this example, the FPGA corresponds to the TX and the DAC is the RX. The blue path corresponds to the data path from the FPGA to the DAC conversion core. The green path is the clock network. And the red part concerns the control and settings. Due to the high sampling speed of the DAC at 6GSps, 4 input ports (Port A to D in Figure 2) at 1.5Gbps are used to interface with the FPGA and an internal mux (MUX 4:1) provides the data to the conversion core at 6GSps.

To prevent meta-stability issue at the DAC input a solution is implemented with a detection/correction loop. First of all, the fast clock is provided to the DAC (CLK), it is then divided by 4 internally to the DAC (DIV /4) and goes through a digitally controlled delay (τ) controlled through the PSS setting. The output of the digitally controlled delay (DSP) is transmitted to the FPGA clock system (PLL) which then generates the clock used to output the data from the FPGA. At the input of the DAC, a detection system is continuously checking if there is a meta-stability issue which is indicated through the TVF bit to the FPGA. Upon such detection, the FPGA CONTROL resets the PSS bit to change the delay between data at the DAC input and the clock that recovers them. This method is used to identify where the meta-stability is and help setting the interface.

B.    Layout consideration

The DAC having a resolution of 12 bits plus its 4:1 MUX, it means that there are 48 bits to interface. Moreover, at this speed, differential signaling is necessary and a total of 96 PCB traces are layout between FPGA and DAC through multiple layers. Adding to that, in order to limit the package skew which is different for each bit, every group of 12 bits (24 traces with differential signaling) is layout to the same bank of the FPGA. This adds a slight complexity to the layout but is required to reach these high-speed with a parallel interface.

As mentioned in section II, the traces lengths of the data lanes also need to be matched to limit the bit to bit skew. And this is complicated to achieve with 96 traces. In Figure 3, a photo of the board is shown. The meanders are generally the solution used to match the traces length (same principle as an interface to a DDR3 memory for example).

Figure 3: Electrical board

Note: The distance between the FPGA and the DAC is due to one of the board requirement. This board has been used to characterize the DAC and as such it was necessary to be able to control the DAC temperature without affecting the surrounding component. In a real application, even though the requirement to match the trace length can impact the distance FPGA-DAC it would still be much closer.

This shows the complexity that can be brought to the layout phase when working with parallel interface with huge bandwidth (in this case 72 Gbps are transmitted between FPGA and DAC). Nonetheless the latency achieved is impressive as there is only 3 clock cycles – or 500ps at 6GSps - between the DAC input and analog output.

C. FPGA consideration

For the FPGA, a lot of consideration is necessary before choosing it. When working with high-speed parallel interface, it is necessary to spend time to look at the timing uncertainties and make sure that the FPGA and the right speed grade are targeted.

The SERDES component is used to output the data from the FPGA at the required speed of 1.5Gbps. It is a serializer (same principle as for a serial interface but at lower speed) which takes in this example chunks of 8 bits at 187.5MHz and output them at 1.5Gbps. This also allows isolating the FPGA fabric from the fast transmission and being able to do some processing within the FPGA at respectable clock speed. Depending on the processing being done a serialization by a factor of 4 instead of 8 could have been possible but would have required more care to respect the timings within the FPGA. However this would be a way of improving latency performance. The SERDES component is in four parts each one linked to one of the port of the DAC. As such, the reset signal needs to be synchronously sent to all SERDES block to ensure that the data are aligned between the different ports of the DAC.

Another feature can be added to help correct for bit to bit skew which allow adding independent digital delay on each of the bit path. Both static and dynamic solution exists depending on the FPGA. The benefit of this feature is that it can help open the eye even more and thus allows for faster transmission speed. However this can be hard to implement as to be effective it should be tuned independently for each board/system and thus is not viable for an industrial scale application without automation.

IV. Example of ESIstream serial interface between ADC and FPGA

A. System overview

This example of a serial interface is based on the evaluation system of the EV12AD500A. It is composed of the EV12AD500A, a 12 bits dual-channel 1.5GSps ADC [3] and the Virtex 7 from Xilinx [4]. The Figure 4 below shows a block diagram of the system.

Figure 4: Block diagram of the ADC-FPGA interface

The serial link in this example is running at 6Gbps and there are 8 links used to transmit 48Gbps. Within these data transmitted 13.5% are overhead due to the protocol. Comparing to the parallel interface, and extrapolating if they both transmitted the same amount of useful data, the gain in the number of lanes to layout is 75%.

B. Serial interface protocol

It has been mentioned earlier that without protocol a serial interface would see huge BER loss. This is due in part to the CDR and in part to the AC coupling. The CDR objective is to recover the clock form the data stream. In order to do that, it needs to see edges in the data otherwise it can’t identify the clock. Hence, long series of successive ‘1’ or ‘0’ need to be avoided for a serial interface. The interface between TX and RX has to be AC coupled; at these high-speed a small difference in the common-mode added to all the other effect could contribute to a degradation of the BER. This AC coupling means that if the data stream transmitted is not DC balanced (as many ‘1’ and ‘0’ transmitted), the AC coupling capacitor will get loaded and corrupt the data seen by the reception stage. The protocol’s objective is to prevent both long series of ‘0’ or ‘1’ and a non-DC balance transmission from happening.

The ESIstream protocol used in this example is one of the available serial interface protocols like JESD204B, Interlaken and many others. It brings interesting benefits such has optimized efficiency and latency, simplicity and ease of design. Documentation and example design on this protocol can be found on its dedicated website [5].

C.    Design consideration

As mentioned earlier, when working with serial interface, the encoding/decoding process requires digital resources to be implemented. Generally speaking the objective is to reduce this impact as much as possible so that the resources can be saved for the intended processing of the application. In this objective, the FPGA design needs to be oriented toward resource optimization while still being able to transmit the data and detect whenever an issue occur. And the ratio between optimization and issue detection is very dependent upon the transmission data rate. A transmission at 6Gbps does not require the same detection and correction process than a transmission at 32Gbps to reach a similar BER.

Looking in more detail at the FPGA design for the interface in case of the serial interface, Figure 5 below shows the architecture of the reception stage in the Virtex 7.

Figure 5: FPGA RX architecture

To understand this architecture, some information on the ESIstream protocol is required. This protocol is using a 2 stages encoding and 2 overhead bits process. One is a scrambling process, the other one a disparity process. The scrambling process is using a Linear Feedback Shift Register (LFSR) to generate the scrambling/de-scrambling pseudo-random sequence. A synchronization sequence is used to initialize the link and align the TX encoding and RX decoding through the frame alignment component. Finally the decoding takes place where the descrambling is realized and the disparity processed. For more information you can refer to the ESIstream website [5].

In terms of resources and latency required by this specific implementation they can be found in Figure 6 below.

Figure 6: Resource and latency figure for serial interface

The values concern the FPGA implementation used in this example for the RX. The TX values correspond to an FPGA equivalent to the ADC. The difference is that the ADC process allows it to work faster than the FPGA and thus the ADC latency is 10ns for the TX compared to the 30ns reached using an FPGA.

It can be noted that in terms of latency; with serial interface it extend to a total of 60ns (using the ADC value for the TX). Compared to the parallel interface presented earlier there is two orders of magnitude loss that would be incompatible with some application.

V. Conclusion

This paper has presented the difference between parallel and serial interface and which key factors to consider before choosing; accompanied by 2 examples to introduce a quantitative comparison. Even though serial interface have become the ubiquitous solution in the past years, a few application still have requirement forcing them to use the parallel interface. This is bound to keep evolving and already today modulated serial interface [6] starts to appear to reach the next step in high-speed data transmission bringing more problematic to solve.

References

[1]   e2v EV12DS460 product page: 
http://www.e2v.com/products/semiconductors/dac/ev12ds460/

[2]   Altera Arria V product page: 
https://www.altera.com/products/fpga/arria-series/arria-v/overview.html

[3]   e2v EV12AD500 product page:
http://www.e2v.com/products/semiconductors/adc/ev12ad500/

[4]   Xilinx Virtex 7 prduct page: 
https://www.xilinx.com/products/silicon-devices/fpga/virtex-7.html

[5]   ESIstream protocol website:
www.ESIstream.com

[6]   PAM-4 Design Challenges and the Implications on Test, Keysight Technologies


Authors:

Marc Stackler, e2v semiconductors division
#309,Building1, Coastal business center, Xinghua Road,
Shekou, Nanshan District, Shenzhen, 518000 China
marc.stackler@e2v.com
Tel: +86 186 8034 2372

Andrew Glascott-Jones, e2v semiconductors division
andrew.glascott-jones@e2v.com

Romain Pilard, e2v semiconductors division
romain.pilard@e2v.com

Nicolas Chantier, e2v semiconductors division
nicolas.chantier@e2v.com