Occasionally I'll get asked by a friend or family member about some problem they are having with their computer. One of the toughest issues to deal with is a computer that just exhibits random instability. One moment it is working well and the next it is giving you the BSOD.
Its been common to blame the operating system for these kinds of issues. And with Microsoft's quality record over the last ten years, this is a wise first guess. Microsoft's Windows 2000 shipped with 65,000 predicted defects. Ironically enough, Windows 2000 was one of Microsoft's best releases.
However, the OS is not always the culprit. Sometimes, the causes and symptoms just can't be narrowed down. Things just perform oddly and crash. This is the point when you should start investigating your computer's RAM.
Many folks labor under the misconception that RAM is one of those items in a computer that either works or does not. I.e the quality of RAM is binary and is either good (true) or bad (false). Good RAM always works and bad RAM never works.
RAM Quality Is Not Binary
In order to understand why RAM quality is not binary, you first need to understand how RAM works. To begin with, RAM is more accurately referred to as Dynamic Random Access Memory or DRAM. Wikipedia has a good entry on the operation of DRAM. Each bit is an individual capacitor that is charged/discharged in order to store a value. The charges in these capacitors must be "refreshed" approximately every 64 ms in order to keep a charged capacitor's state from degrading.
This operation is complex and timing sensitive and as a result there are many ways that RAM chips might fail to operate correctly.
Semiconductor manufacturers of RAM deal with this situation using the concept of "yield" and "device test". They test all of the manufactured devices and the percentage of ones that are "good" is defined as the yield. An interesting aspect of semiconductor device manufacturing is that most of the costs are in the device design and the factory. The actual costs of manufacturing individual devices are remarkably low. As a result, low yields can be tolerated in devices that can be sold for a longer period of time since once the front-end costs have been recovered, its almost all profit.
But wait a minute! Didn't we just say that RAM quality wasn't binary? We appear to have a manufacturing process that is guaranteeing that all of our RAM devices are "good" right? Well, that's not actually the truth of the matter; and that truth is shrouded in a bit of mystery.
Making the Grade
Here's the truth that the RAM manufacturers have never made generally known: between the "good" and "bad" categories actually lie at least two more – let's lump them together and call them "ok". In order to raise their yields, manufacturers have defined "ok" to represent devices that sort of work. They just don't work as well as "good" RAM.
In reality, there are at least three "grades" of RAM: Grade A, Grade B and Grade C. Grade A are devices that have passed device test with flying colors and are "good" using our definition above. Grade B and C on the other hand fall into some gray areas – they aren't "bad" but they aren't good enough to be called "good" either. These devices are sold to less reputable manufacturers who put them in lower cost memory boards or cheaper devices.
The Truth is Out There
At this point you may have googled "ram memory grades" or something along those lines trying to confirm this whole blog post. And you've probably come up empty handed. The Wiki page for DRAM doesn't mention the word "grade". There's no web page or blog post out there that documents this situation. You'll just see hints like a random reference to "B-grade" memory in an HP product brochure or the use of the term "server-grade" in a user forum.
The best reference I've found is an obscure testing article from 2003. Here's the most important part:
In order to protect the DRAM manufacturers own image, these down grade transactions are usually kept in low profile with a precondition for the buyer to remark and not to expose the origin of the chips.
In other words, when you buy cheap computer RAM, you are likely buying a name brand (e.g. Samsung) manufacturer's "ok" device repackaged under the name of another company. And that company is bound by the original manufacturer of the device not to reveal the situation in order to protect their reptuation.
It's not worth going through the different failure modes of RAM, but think of it this way. Your computer's CPU is designed with the presumption that the RAM attached to it is going to work flawlessly every time the CPU accesses it. The reality though is that with "ok" memory, this assumption is going to be violated during normal operations - say once every couple of million operations.
Maybe the result is a random garbled character in a text document. Or maybe your operating system crashes. There's no predicting it. And there's no predicting how much time and energy you might waste trying to find the root cause of your problems.
Demand Good Grades
What do you do if you suspect you ended up with "ok" RAM? First, never trust your computer's RAM self-test as it is only good enough to detect "bad" RAM. Instead, use open source memory test tools like Memtest86+; these are free and amazingly thorough. They are the only way to detect "ok" RAM without sophisticated and expensive testing machinery.
The moral of this tale of manufacturing intrigue is to always buy quality RAM. I will only buy RAM from Crucial as in my experience they only sell "good" RAM. If you risk ending up with "ok" RAM you will likely end up wasting considerably more money detecting and correcting the problem then you will ever save by buying cheaper RAM.