This article reviews a lesson I learned many years ago, but put into practice my first week working at my new employer Enterasys Networks in October 2006. My dad’s favorite expression was “Look before you leap." In this article, this saying really becomes "look before you simulate or measure." After working at Digital Equipment Corporation (DEC) for just shy of 20 years, followed by a few startups, I decided to try a middle size networking company.
Walking into a new company felt like walking into a new consulting engagement, except I was simultaneously trying to learn the networking business while acclimating to my new company, colleagues, and design environment. I remembered something I had learned from Eric Bogatin: that blindly setting up a simulation or just running tools is not the best way to design or solve problems. Long before I ever saw it published, I have been living by Eric’s Rule #9.
Rule #9: Never Do a Measurement or Simulation Without First Anticipating What You Expect to See
- If you are wrong there is a reason, either the setup is wrong, or your intuition is wrong. Either way, by exploring the difference you will learn something.
- If you are right, you get that nice warm feeling that you understand what is going on.
The first week of work at my new gig, one of the senior engineers walked into my office with what he believed was a reproducible SI issue on a slow speed serial bus. WS_ADDR<1> was getting errors. He started to draw out this complex network of jellybean components with a lot of loads, stubs, and over a foot of etch. Granted, it was a slow speed, but he stated that it had been working for a long time before it suddenly started to exhibit errors.
I stared at the hand drawn topology as I kept asking for more and more detail; "What kind of part is that, how long is that stub, how many vias, how much etch?" Then I started to think, "How am I ever going to find IBIS or SPICE models for all these ancient components? Even if I can find all these models, it’s going to take forever to set up this simulation." Then suddenly, I remembered one lesson I had learned from two brilliant people I previously worked with, my mentor, John Hackenberg, and our resident Electromagnetic expert, Dr. Michael Tsuk. They both always wanted to physically look at the problem. John was just one of those practical geniuses, while Michael was a Ph.D and was developing software front-end GUI (SIMPEST) for a method of moment full wave field solver that Sakar and Harrington wrote with the help of DEC's research funding. Michael always wanted to see the physical part he was modeling, or writing the software to model, in his hand. He had an amazing grasp of reality and previously was a SI engineer. I could almost characterize it as a rare stroke of brilliance when I asked the HW engineer to bring me out in the lab and physically show me the board with the problem. I was using past lessons in order to try to avoid simulating this nightmare.
The above figure shows a network card with multiple boards, both horizontal and vertical. One vertical board in the top right corner was a switching power supply with a large inductor physically close to vertical stake pins. In the network world, the boards are packed in three dimensions, much tighter than anything I had previously seen in either the server or storage worlds.
When examining the printed circuit board assembly up close, I noticed and pointed out the vertical stake pins physically adjacent to the power supply inductor on the vertical daughter card. I asked the senior hardware designer if that was where the problem was, pointing at the stake pins directly adjacent to the inductor. He pulled out the schematics and confirmed that that was indeed the serial port signal getting the errors, before asking "How did you know?" I then grabbed some copper foil, wrapped the foil around the switching power supply, and soldered it down in a couple spots to ground; magically, the noise was cut in half and the errors were eliminated. What I initially couldn’t understand was how this was ever working in the first place. Then I was informed they had just re-spun the board and decided to stand up a switching power supply that had previously been laying down to make room for something else. This big toroid on the switching power supply daughter card was coupling noise onto the exposed stake pins. Without going through the math, you can use a scope probe with a ground loop as an antenna and feed it directly into the scope to quickly get a feeling for how bad things are. I have also utilized an EFT noise generator with a 2 in. square to test sensitivity to circuits.
My other suggestion is to always try examining the physical structures that you are simulating or measuring, if possible. Try to figure out in your head what the answer should be before you put the scope probe on the signal or run the simulations.
The funny part of this story is that the engineer in the cube right behind me stood up and said, “You’ve done more in one week than the previous SI guy did in a year.” That gave me a good feeling that I understood what was going on. The very next day, a new problem showed up on my desk: a very complex backplane jitter issue that I inherited from a previous SI Engineer. It looked like it was going to take significant time and effort to debug, but we can save that for another time.
The HW engineer followed up with a few course measurements and eventually a manufacturable robust shield was implemented.
Over the years, I have found working in the high-speed servers, storage, and network systems that SI problems can often be on some of the slowest networks like Serial ports and I2C, or the highest speed networks (SERDES) and/or ground return issues (SSO Simultaneous Switching Outputs and/or “ground bounce”).