The Hidden Costs Of Test

Understanding what to test, when to test and what the results really mean can determine the difference between a good chip and bad one—and a lot of money.

popularity

By Ed Sperling
As complexity grows in SoCs, so does the ability to accurately test them. That helps explain why there are so many different types of tests and so much confusion about what to use to perform those tests, when to test, and where in the flows to include those tests. But what’s less well known is that tests done improperly also can give false results, labeling good chips as bad—or in some cases actually killing a good chip.

Some of the problems occur when testing involves SoCs with multiple power islands and voltage rails. The primary reason in many cases is a bad test design. Use cases for smart phone chips, for example—the so-called worst-case scenarios—have multiple power islands turned on at the same time. In a testing scenario, including built-in self test (BiST), all of those islands may go on at the same time if they’re not carefully scheduled.

“You can burn a chip with a bad schedule,” said Yervant Zorian, chief architect at Synopsys. “There are two different modes of operation. In one, the CPU is doing it’s own work. Then, you have a memory scan or BiST when it’s idle. Throughout the life of the chip, 90% of the time it will do a normal function and there will be no problems. But during test mode the activity is creating excitement in the chip.”

That excitement is sometimes maximum power, and there are limits for the amount of power that a chip can handle. The best way to avoid this problem is to test in sequence so that all power islands are not on at the same time, but this requires up-front planning. Test frequently is an afterthought in many designs. In addition, while testing is good, over testing can be very bad.

“There are two levels of failure from test,” said Giri Podichetty, technical marketing engineer at Mentor Graphics. “One is at the soft level. The second is in the power supply. You can get a false failure even if the chip is okay. But you also can get a catastrophic failure if there is too much heat or current.”

The flip side of this is that testing itself is ineffective, allowing bad chips to reach the market. One of the unique challenges at advanced process nodes is the amount of power needed is not scaling at the same rate as the transistors in the design. That can greatly impact test, said Podichetty, because small voltage swings with high leakage can cause significant problems.

“Power integrity can be localized, but it may not be what you expected,” he said. “You also may have the chip running slower and it may be hotter.”

Mix and match
Another challenge is the so-called mix and match approach of chipmakers. Tools and IP are bought from multiple vendors, with some of that IP developed on different process nodes. In addition, not all methodologies are up-to-date because chipmakers will frequently push older tools and methodologies.

With soft IP, much of this can be factored into synthesis. But with hard IP, testing requires a real understanding of the IP and how it will be used.

“Some designs can be tested in their entirety, but others need to use a partitioned test approach,” said Robert Ruiz, senior product marketing manager for test automation products at Synopsys. “You need to attack the problem with different methods. ATPG (automatic test pattern generation) can model faults. You may need to do dynamic bridges that exist for a moment vs. a static bridge. But the challenge for test engineers is to establish the operating parameters of frequency and voltage range. Basically what you’re doing is creating shmoo plots, where frequency is one axis and voltage is on the other. Then you try to push the devices to the corners and determine what’s in the spec and what’s not in the spec.”

Engineers also need to make sure the test program is more sensitive to the power budget than in the past, he said. If the budget is set too low, there’s a danger of under testing.

“Low power is not the best way to handle testing,” he said. “Power-aware is the right approach. Low power actually minimizes the power, so you end up under-testing.”

Conclusions
While test can go a long way toward making chips more reliable, done wrong test also can damage chips or provide false results. In a single die, that can be expensive. In a stacked die, it can be multiple times more expensive.

It’s important to note that each chip is different, each organization doing the testing is different, and the number of combinations of what to test, where to test and what to use is increasing. That’s why there is so much activity in test these days, with all three of the big EDA vendors and many of the smaller ones working to secure a stronger position in this area. Where there is pain there frequently is opportunity, and there appear to be plenty of both in this area.



Leave a Reply


(Note: This name will be displayed publicly)