From the IC design viewpoint, reliability modeling leaves a lot to be desired, and first silicon characterization requires step-stress testing to protect the end customer. That is what my comments were about, related to more and different types of memory being less than robust and almost impossible to model in predictive manner.
So here is some more bad news, or expensive good news, just my opinions.
No easy answers.
So multiple testing, both at probe and final test, can lead to first silicon learning, and then if the mechanisms are studied for each process flow and design-rule node, decisions can be made, but that may often mean moving to a cleaner foundry, rather than any layout or process specification change. Test temperatures can help screen these weak devices, hot or cold, but that is costly. But not much up front device-level testing prior to full IC first silicon will uncover the impact of wafer fabriciation abberations.
But where are the case studies published for these types of issues? Many IEEE Rel Physics papers are all about III V and bleeding edge silicon, not about simply making Silicon CMOS memory devices more robust instead of using costly screens (but I will look at recent papers if someone can suggest a few).
So design for reliability unforetuneatly has to include post-test screening such as raw parametric data analysis at probe, with outliers (relative to rest of same wafer) screened statistically (often weighting outlier distances for multiple tests) OR tested in package form at two temperatures, screening out outliers vs rest of lot for example. Then decision can be made if the post-test outlier screen yield is poor for some foundries vs others, or some process tools vs others, and finally if there are design changes, pre and post change yield of outliers. This is an "automotive market" solution (costly).
With super high device counts, and ultra-clean foundries charging more than tier 2 foundries, commodity consumer products are at the mercy of the end product testing as only screen for these new memory-intensive products that have to assure long battery life but not necessarily long product use life nor nor adverse outdoor environmental issues.
Supply chain war. Anyone have a decision matrix that shows which foundry tier may be dangerous for certain device types and counts for consumer markets?
"Design" includes process-device integration profound knowledge, as always. Modeling without failure statistics is perhaps not useful. Do we share real test results vs field results as memory device counts increase? Or as number of "must be matched" analog device counts increase? Fabless design shops are at serious risk, and IDM's who often get all this "profound knowledge" are not sharing. But that's what makes this industry interesting...and costly for investors at current rate of change. As far as consumer are concerned, staying one or two generations behind the "bleeding edge" may minimize surprises.
So the worlds fastest game machine, may also be the most vulnerable to single bit failures as well as wearout issues from running very hot. And a flash drive beyond 256 bits used for OS, applications, and storage may be better a few years from now, maybe not today? Does anyone have real data? Automotive industry is VERY cautious, and Medical Device people avoid IC's not well screened and well understood. Its the consumer products that get the latest and fastest and perhaps less robust IC's, and you can always get a warranty, but expect to have to use it.
I suspect that memory issues hide behind more problems than we know. My PC, with original memory, has experienced memory errors that defy detection by the internal diagnostics. However, when the memory chips are reseated, the problems have gone away. Furthermore, years ago I experienced compatibility problems with "equivalent" additional memory. I think there is a gray area of performance issues that drag down speed and reliability without being obvious.
My experience is that the many types of MEMORY added to a microcontroller (for example) often require special screening tests that are often so proprietary that there is nothing in the literature rating the various screen effectiveness. While IP cores are often robust for particular flows already tested, the addition of more and more memory and memory types may be responsible for serious field failure issues? Any comments?