Debugging High-Speed SERDES Issues in Multi-board Interconnect Systems

This paper is an Outstanding Paper Award winner from EDI CON USA 2017.

High-speed SERDES interfaces operating at data rates in excess of 5 Gbps are common today and their implementation on single printed circuit boards, and backplane systems is well understood. Design of such systems has been facilitated through the use of link budgets, s-parameter metrics, and channel operating margin (COM) parameters [1-2]. It is possible in most applications to have a high level of confidence in their implementation for error-free operation.

The process typically consists of obtaining s-parameters for all the elements that comprise the high-speed link. For reasons of computational ease, it is a common practice to partition a complex driver to receiver interconnect into smaller sections. S-parameters are computed or measured, with each section in isolation. For example, device package, breakout, PCB traces in device pin field, PCB routing outside pin fields, PCB routing approaching connectors, AC coupling capacitors, vias and board-to-board connectors are the most common sections. The s-parameters of each individual section are checked for reciprocity, passivity and causality and cascaded together to produce a composite link s-parameter data file. These values are then compared to s- parameter metrics, namely, insertion loss (IL), insertion loss deviation (ILD), return loss (RL) and insertion loss to crosstalk ratio (ICR). If the link values are close to the region termed as the high confidence region, eye diagram simulation using IBIS –AMI models or a computation of COM (dB) will confirm error free link operation. If the channel s-parameters are either marginal, or violate generally accepted requirements, it may be easy to identify the section or sections responsible for this and take corrective action.

A fundamental assumption made in computing s-parameters of a number of sections in cascade is that the reference plane of each “section” must have a continuous low impedance connection to that of the neighboring section. This is true for a single multi-layer PCB where solid ground planes are used as reference for all PCB routing. It is also nearly true for backplane systems where the back plane connectors have a large number of “ground pins.” In certain cases, this assumption can only be approached approximately. This occurs while dealing with high speed memory cards, for example the non-volatile memory express solid state drive (NVME SSD). Also, in some custom designs, signal pins take precedence over ground pins for improving functionality. Connector types and the number of ground pins available may be minimal and, consequently, a low impedance connection to the ground plane can be difficult or impossible or to achieve. In such circumstances, it becomes important to ensure that a ground plane resonance does not occur in the frequency range of interest and to reduce the mechanisms by which such a resonance can be excited effectively.

The occurrence of a ground plane resonance will show up as a dip in the single-ended insertion loss curve provided that a significant part of the complete path is either simulated or measured. It will not show up if the individual sections are separately simulated or measured and then cascaded together. This will likely violate the requirement on ILD, and will already be an indication of poor link performance. Secondly, the presence of skew between the P and N members of a differential pair, can affect the amplitude and position of these resonance dips in the differential response. Printed circuit board fiber weave skew [3-4] can make this effect worse. The end result is an erratic behavior of the link performance that can vary from board to board and within a board.

In this paper, a typical SERDES design flow is described to highlight key parameters involved [1]. Next, the theory of networks in cascade [3] is re-visited. Examples to illustrate its areas of in-applicability are presented. Measurements and simulations were carried out on a PCB that was designed to support multiple SERDES interfaces at various data rates. One of the multi-board interfaces that showed errors is investigated in detail to illustrate ground plane resonance and its impact on the eye diagram and the bit error rate. Ansys HFSS and Keysight ADS were used in all simulations and measurements were done using the built-in BER and eye diagram display utility of the SERDES device.

Serdes Design Flow and Debug Strategy

Some examples of statements after a SERDES design is tested:
•   The system works at 10Gbase KR (10 Gbps) but fails to operate in the extended mode (11.25 Gbps)
•   The System runs error free at the PCI Express Gen3 Data rate but fails to run at Gen4
•   All lanes except two run error free at 25 Gbps
•   Eight out of ten boards appear to run error free
•   It appears to work fine when I press hard on the connector
•   We added ground planes and a lot of ground vias and the problem went away

To address these issues, it is important to understand SERDES design flow. At the receiving device, the eye width and height is influenced by the following channel s-parameter characteristics [2] (as illustrated in Figure 1).

Differential channel insertion loss (IL):
This is simply the loss of signal power arising from the insertion of the channel. Losses occur due to reflection, absorption and radiation and all of them contribute to the insertion loss.

Return Loss (RL):
This is the loss of signal power arising from reflections only and is caused by impedance discontinuities in the channel. Differential return loss takes precedence, although some standards also specify constraints on the common mode return loss as well as mode conversion loss.

Skew between P and N members of a differential pair:
This is the time delay between the P and N portions of a differential interconnect. This can arise due to a physical difference in the path length or the velocity of propagation of the two parts P and N, such as routing length, and connector pin delay. While these can be easily corrected on a PCB, fiber weave effect [3-4] is a predominant cause of skew.

Insertion loss deviation (ILD):
The IL of a lossy transmission line increases with frequency in a logarithmic fashion. Deviations of this straight line behavior (on a log scale) occur due to impedance mismatch and other factors. It is important to limit such deviations. ILD is defined as the maximum deviation of IL from the best fit attenuation vs. frequency characteristic.

Insertion Loss to Crosstalk Ratio (ICR):
This is the ratio of IL to the total crosstalk at the receiver. Total cross talk is computed by taking the power sum of the coupled differential s-parameter values, namely FEXT (far end cross talk) and NEXT (near end cross talk) values from all aggressors.

Figure 1: Illustration of the main s-parameters of a SERDES link

In addition to the above dominant parameters, mode conversion parameters such as differential to common mode loss, etc. will also require attention for compliance purposes.

Channel physical properties shown in Figure 2 have a direct impact on the s-parameters and the Eye diagram. The PCB traces category includes (1) trace type (microstrip, stripline, edge coupled or broadside coupled) which affects IL, (2) trace impedance which affects ILD and RL, (3) trace coupling (loose, tight)which affects IL, RL and ILD, (4) trace thickness and surface roughness which affects IL, (5) trace coating which affects IL, RL and skew, (6) trace bends which affect IL and skew, (7) trace spacing which affects ICR, (8) trace reference planes which affect ILD. PCB material affects IL and Skew. PCB vias affect RL, ILD and ICR. AC coupling capacitors affect IL, ILD and RL. The PCB stackup, connectors and the device pin breakout affect all five parameters.

With a good knowledge of options available and a given cost budget, a typical design flow is shown in Figure 3. The maximum data rate and signalling type determines the Nyquist frequency. For binary signalling, it is simply half of the maximum data rate. For example, for a data rate of 16 Gbps, the Nyquist frequency is 8 GHz. For most applications that involve medium to long links, the maximum frequency of interest can be limited to twice the Nyquist frequency or even less. This is beacause a lossy interconect will act as a low pass filter. In an exceptional low loss situation, the frequency of interest can extend to several times the Nyquist frequency.

Figure 2: Illustration of PCB factors that affect s-parameters and Eye opening

Figure 3: Illustration of a typical SERDES design flow.

For a chosen transmitting and receiving device, one can determine a maximum allowable insertion loss with an adequate margin. This information could be obtained from the device manufacturer or from a time domain simulation using IBIS – AMI models. A typical value is 25 dB although some of the more sophisticated devices with multiple levels of pre-emphasis, amplification and equalization can extend this to 40 dB. Once this number is known, an estimation or computation of the IL of the actual system is carried out and compared with the Max IL. It is important to include the entire interconnect from TX die to Rx die.

The package of a large ASIC, AC coupling capacitor and a typical connector can already amount to an insertion loss of 3 dB or more at 10 GHz, and, therefore, neglecting them will lead to an under-estimation of IL. If this requirement is not met with an adequate margin, it will become necessary to break the link at a convenient location and use a re-timer or a re-driver. Use of a re-driver can overcome the insertion loss limitation, but will contribute to jitter. A re-timer is better choice as the signal gets re-generated. The insertion loss is the most critical of all the parameters and all attempts must be made to reduce it where possible. Many designs fail due to excessive insertion loss and this is the first clue to consider while debugging.

Next, the ILD parameter should be checked. It is closely related to the differential return loss. Poor return loss arises due to discontinuities in the interconnect path. PCB vias and connectors are predominant causes and need careful attention in implementation. Via stub reduction by back drilling or use of blind/buried vias, via transition optimization by ground via and anti-pad optimization become necessary. Further, reference plane resonance as described later can contribute to excessive ILD and the system should be checked to ensure that ILD is within bounds specified.

Cross talk is a next parameter of interest. It is its magnitude relative to the IL that is of importance. In general, cross talk between traces on the PCB is easier to control. Crosstalk in connectors and between PCB vias is usually dominant and should be verified to ensure the ICR requirement is met. Lastly other parameters such as skew, mode conversion losses and eye diagram simulation results should be checked for adequate margin.

While debugging a SERDES performance issue, each of the steps in Figure 2 need to be examined to find the cause.

Re-examining Cascaded Network s-parameters

For an electrical circuit comprising lumped elements, it is known that the chain matrix (ABCD) representation of a cascade connection is simply the product of the chain matrices of each element [5]. This is termed “cascade theory” in this paper. This fact has been extended to apply to uniform transmission lines and is an excellent approximation where a lumped element representation holds. It has even been extended to apply to any structure for which an s-parameter file is available. Simulation of complex multi-board interconnects is invariably carried out by cascading chain-matrices of individual subsections. This is again an excellent approximation in most practical situations.

There are some exceptions to this convenient approach. One situation arises while dealing with the connection of two transmission lines of similar impedance but with vastly different physical dimensions [6]. Wherever there is an abrupt discontinuity between two sections, their s-parameters must not be computed in isolation. The discontinuous region must be simulated as one entity.

Another important exception is illustrated in Figure 4. Here, a uniform 50 Ohm microstrip transmission line (17 mils wide, 2 mils thick) with an air dielectric substrate (4 mils height) is first simulated on a continuous rectangular plane to obtain a reference (red curve). The reference plane is then sectioned into 3 parts. The reference plane in part A has the same width as that in part C, while its width in part B is 20 mils. The s-parameters of each part are computed separately using 3D EM software and cascaded. This result is indistinguishable from the red curve. Next, the entire structure is simulated as a whole. The IL result shown by the blue curve in Figure 4 displays a resonance noticeable at ~6 GHz.

Figure 4. Illustration of a resonance in the Insertion loss profile

Figure 5. Illustration of the “ground plane impedance”

For a closer examination, the microstrip trace of Figure 4 is removed and the ground plane structure comprising sections A, B and C is simulated. A lumped gap excitation is placed at the intersection of A and B. This is to obtain an “impedance” of parts B and C with respect to A. The return loss is plotted in Figure 5 for 3 different cases. In the first case (blue curve), as used in Figure 4, three resonance dips can be seen and only of them affects insertion loss the most. From this, we can conclude that multiple resonances are always present, but not all are harmful.

In the second example a copy of section B is placed close to it to provide an additional bridge between A and C. This is to mimic an additional ground connection. Result shown in Figure 5 by the green curve still shows the same resonances but at a much lower amplitude. This should be taken to imply that adding another ground path will lower the “impedance.”

In the last example, multiple copies of section B are introduced between A and C to mimic a number of discrete ground connections. The result shown by the red curve in Figure 5 now shows a very low impedance which is what is desired.

It is therefore clear that the reference plane of a section must have a continuous low impedance connection to that of a neighboring section for “cascade theory” to be applicable. This is a situation that arises with the use of connectors where the ground pins are the only means for interconnecting reference planes of two PCBs.

Measurement and Simulation

The test setup consisted of a multi-board system comprising a PCI Express PCB and an NVM Express SSD module as shown in Figures 6-7. The module comprises 4 Tx and 4 Rx differential ports, a differential clock input and a number of other signals. All high speed pins are located on one side of the connector. A physical loopback board that simply connected the Tx and Rx lanes together was designed with the same form factor for convenience in testing the interface. The interface was operated at a data rate of 16 Gbps. A prbs-31 bit stream was used as the data, and driver Tx amplitude was set to its maximum value with no pre- or post-emphasis. The receiver DFE (decision feedback equalizer) in the receiver was turned off.

Ten different systems were tested. In all cases, the measured eye-opening was vastly different on different lanes. One of the lanes showed errors on most of the boards. It was also possible to make is error free by an appropriate combination of Tx drive strength, pre- and post- emphasis levels with the receiver DFE turned ON. It also showed no errors with a prbs-7 pattern. Another lane failed on a few boards. All lanes however showed error free operation at a lower data rate of 10 Gbps. Briefly, operation at 16 Gbps was not consistent.

The one lane that failed consistently on most of the PCBs was simulated. It showed an asymmetric routing at the input to the connector which was its main distinguishing characteristic compared to the other lanes that were more reliable. In order to overcome “cascade theory” limitations, the conventional mainboard plus connector plus daughter cascade connection was not used. Instead, the loopback module, connector and a small part of the main board was simulated as one entity (Figure 7) to give a s-parameter model (termed B).

Figure 6. Illustration of the test multi-board system.

Figure 7. Close up view of the NVM Express Module.

The remaining interconnect, including the device, was simulated separately to obtain an s-parameter model (termed A). The entire link s-parameter model was created by cascading the s-parameter model A and B together.

Computed return loss of this differential pair is shown in Figure 8. The near and far end values are shown in different colors. A generally accepted value of differential return loss < -10dB up to the Nyquist frequency is violated. Computed ILD values shown in Figure 9 also show violation of the generally accepted value for 2 dB at ~6.4 GHz. Both these numbers already indicate that this system in not in “high confidence” region.

Figure 8: Computed differential return loss (near and far end)

Figure 9: Computed differential return loss deviation. Computed Insertion loss of this selected lane is shown in Figure 10. Multiple resonances can be observed with the sharp one at ~6.4 GHz being of interest as it occurs before the Nyquist frequency of 8 GHz.

Figure 10. Computed Insertion loss of one multi- board differential pair.

For this differential pair, one of the connector ground pins was connected to ground through a 0 Ohm resistor. Computed current distribution revealed a high current density on the loopback board ground plane and this connector ground pin at this resonant frequency.

Simulations of the ground plane of the loop back board showed the presence of resonances in the frequency range of interest. However, based on the amplitude of the resonance and its location, it was not possible to correlate this with the dip in the insertion loss observed. In a second simulation, the ground pins adjacent to the differential net that traverses the two boards was simulated with a common mode signal. This simulation showed several resonances close to frequencies where the dips in the insertion loss were observed. This observation indicates that even though a ground plane resonance may be present, it is its excitation by a common mode current that is the likely cause of erratic behavior.

Also shown in Figure 10 is the computed single ended insertion loss. It is interesting to note that even if a resonance is present, if the two nets P and N are identical in magnitude and are exactly out of phase, there will be no effect on the differential insertion loss. In this case, at ~6.4 GHz, both P and N traces differed in amplitude and were not exactly out of phase. As a result, the minima in the differential insertion loss profile still remained although their magnitudes was somewhat reduced.

The measured eye diagram of this failing lane is shown in Figure 11. The eye opening is clearly not adequate for error free operation. Figure 12 shows the measured eye diagram of a neighboring lane that showed error free operation. This shows a substantially larger open eye area.

Figure 11: Measured Eye diagram of the failing lane

Figure 12: Measured Eye diagram of an error free lane

A computed eye diagram plot of the failing lane using IBIS AMI models, device package models, and s-parameters of Figure 8 is shown in Figure 13. It shows some eye opening with some errors although not as dramatic as the measured result.

Figure 13: Computed Eye diagram of the failing lane.

It has been shown in reference [7] that the skew between P and N traces can degrade SERDES performance. However, it has also been shown that values of P-N skew approaching half of the Unit Interval or more can be compensated by an adaptive receiver. This was established for a channel that has resonance-free insertion loss characteristics.

In the examples considered in the current paper, use of an adaptive receiver alone showed no benefit except in one instance. It actually increased errors in the lanes that worked well without it. It is of interest to examine the effect of P-N skew on a channel with resonant characteristics.

Although ways to model fiber weave effects are available [4], a much simpler approach to predict the end result was used. This was done by introducing a time delay in the single-ended trace P by adding an ideal 50 Ohm transmission line as in [7]. The differential insertion loss curve of Figure 10 is replotted in Figure 14 over a narrower frequency range for clarity. The red curve labelled 0 pS is identical to that of Figure 10, and represents the case of a uniform dielectric substrate. If the P-trace is made longer than the N-trace by 20 pS, the result shown by the solid green curve increases the magnitude of the insertion loss. It gets worse if the skew is increased to 30 pS as shown by the blue curve. If the P-trace is made shorter than the N-trace by 20 pS, the result shown by the dashed green curve decreases the magnitude of the insertion loss and works in favor. Therefore, one can conclude that skew between P and N traces can have a dramatic effect on the differential insertion loss at a resonance frequency. The boards manufactured for the present work used single ply laminates with 1078 weave style. Values of fiber weave skew in the range of 7 pS/inch can be expected [4]. For the trace lengths of the current board, fiber weave skew in the order of ~20 pS can be expected in a worst case. An eye diagram simulation of Fig. 13 with 10 pS of added skew showed a fully closed eye.

Figure 14. Enlarged view of the highlighted region in Figure 10.

Conclusion

In this paper, an attempt was made to explain the cause of variability in multi-board SERDES performance between different lanes and different boards. While it is clear that some of it is also due to the active devices used, the PCB itself is expected to play an important role. Based on the limited experience, it is hypothesized that the variability is caused by a combination of reference plane resonance and P-N skew caused mostly by the fiber weave effect.

Consequently, one mitigation strategy involves two criterion. The first is to ensure a low impedance connection between ground planes of multiple boards by maximizing ground pins in connectors. Where this is not possible, PCB techniques such as use of multiple ground vias on pads and between signals pins will help. Secondly, enforcing strict symmetry of differential routing at the connector input and output will ensure that the common mode content of differential pairs that traverse multiple boards is minimized. This will also include techniques to reduce fiber weave skew [8].

In simulations, it is important to choose sections where both signal and reference planes are approximately continuous. Also, single ended insertion loss data should be examined in addition to the differential insertion loss.

REFERENCES

[1] B. Gore, and R. Mellitz, “An exercise in applying Channel operating margin for 10GBASE-KR Channel Design”, Proceedings of the IEEE EMC Symposium, Rayleigh, NC., 2014, pp. 648-653.

[2] Syed Bokhari, “Signal Integrity considerations for PCB implementation of Multi-Gigabit SERDES Links”, IEEE MTT-S International Conference on Numerical Electromagnetics and Multi-Physics Modeling and Optimization (NEMO), Ottawa, 2015.

[3] E. Bogatin, B. Hargin, V.S.Sai. D. DeGroot, A. Koul, S. Baek, and M. Sapozhnikov, “New Characterization Techniques for Glass Weave Skew (Part 2)” , Proceedings DesignCON 2017.

[4] Lambert Simonovich, “Practical Fibre Weave Effect Modeling” , White paper issue 2, Lamisin enterprises Inc. Oct. 2011. ()

[5] S. Ramo, J.R. Whinnery, T. Van Duzer, Fields and Waves in Communication Electronics, John Wiley and Sons, Singapore: 1994, Chapter 11.

[6] Heidi Barnes, “The Physical Realities of Cascading S-Parameters for Full-Path simulations”, CST 5th North American Users Forum, 2008. (https://www.cst.com/Content/Events/NAUF2008/05-Barnes.pdf

[7] S. Farrahi, V. Kunda, Y. Li, X. Zhang, G. Blando, and I Novak, “Does skew really degrade SERDES performance?”, Proceedings DesignCON 2015, pp.1-19.

[8] J. Loyer, R. Kunze, and Xiaoning Ye, “Fiber Weave Effect: Practical Impact Analysis and Mitigation Strategies”, Proceedings of DesignCon 2007, pp. 1-28.

Debugging High-Speed SERDES Issues in Multi-board Interconnect Systems

Related Resources

Serdes Design Flow and Debug Strategy

Re-examining Cascaded Network s-parameters

Measurement and Simulation

Conclusion

REFERENCES

Related Articles

Report Abusive Comment