The Chesapeake Systems Storage Primer Part 4 – RAID Interlude
- Nick Gold
As I began to write up the next segment of the Storage Primer, discussing external Direct-Attached Storage (DAS) options for the Mac Pro, I realized that I was starting to talk a lot about RAID solutions on the market, without having discussed in more general terms what exactly “RAID” means, and why it’s important. While I have talked about RAID 0 striped volumes and RAID 1 mirrored volumes in past installments, there is quite a bit more to RAID than that, and it really calls for its own article. Please read on to get a more thorough understanding of RAID storage technology!
Keep in mind that some of my explanations here veer toward using layman’s terms, rather than complex technical language, and that is every bit the intent. For more in-depth information, a simple Google search using the term “RAID” will net you more results than you could ever hope to review. With that said, let’s review the various RAID levels, as well as software-based versus hardware-based RAIDs.
RAID “Defined”
RAID typically is understood to mean Redundant Array of Independent Disks. “Drives” is sometimes substituted for “Disks” but refers to the same thing — a hard drive mechanism, using a spinning platter of magnetic material. This is the most common data storage device in the computer industry. Some people mistakenly refer to this as “memory,” but that is really a misnomer, as memory usually refers to solid state silicon chips with no moving parts.
The word “Redundant” in the term RAID is also at times a misnomer, because there is no inherent redundancy in RAID 0 volumes. In this case, the “R” in RAID can actually be used to mean “Rapid”, because RAID 0 volumes emphasize the performance of the storage volume, and not any type of redundancy.
RAID 0: Speed, Size, No Redundancy
RAID 0 volumes are comprised of two or more hard drive mechanisms acting in unison, typically presented to the user and system as a single volume, or drive icon on your desktop. This can actually be partitioned into multiple sub-volumes, but for our purposes let’s think of a RAID 0 as several physical drives all acting together to make up one big volume. In RAID 0, the capacity of all the constituent drives is added together, and this allows for the creation of potentially very large volumes of data that can far exceed what a single physical hard drive is capable of storing.
RAID 0 is also useful, because the performance of all the constituent drives is added together, so the single resulting volume is much faster for data reading and writing operations than a single hard drive. How is this the case? Hard drives actually look like little record players (yes, the old fashioned kind), with a little arm that moves back and forth, across a magnetic platter that is spinning at VERY fast speeds — thousands of rotations per minute (usually 7200, sometimes 5400 for smaller mobile drives, and up to 10,000RPM or greater for enterprise-class drives). At the end of the little arms are data reading and writing mechanisms that alter the magnetic property of the many “sectors” of the hard drive platter. Each little arm can only read or write data to the spinning platter so quickly. But when data can be “splayed” (for lack of a better term) across multiple hard drives at once, with some bits of a file being written to one hard drive platter, and some bits being written to another physical drive altogether (or vice-versa, for data reading purposes), you can see how this parallelization of writing (or reading) operations would lead to a greater net performance, than if you are limited to one little arm and one spinning platter.
But this benefit of RAID 0 volumes can also be its downfall. Because no single file is necessarily only being stored on one physical drive (it may be on two, or three, or four, or more), if any single drive that makes up the RAID 0 volume gets corrupted or experiences a mechanical failure, the entire volume and all data stored on it is lost. In fact, because hard drive failures are relatively common in the computer world, the more drives you have added together into a single RAID 0 volume, the greater the statistical likelihood that any single one of them will fail, and thus, a total data loss. Just to reiterate, if there is, for instance, a 10% chance that any “average” hard drive will fail in the course of five years, if you have a volume that is made up of and fully reliant on two constituent drives, there is twice the likelihood that you will experience a data loss on that volume, than if you had just been storing that data on a single hard drive. We are of course talking in statistical terms here, and ten percent failure rate in five years is something I made up, but it demonstrates the point — for the convenience and performance of RAID 0, it is not a very reliable storage medium in relative terms.
RAID 1: Easy Redundancy (Mirrors)
If you want a relatively inexpensive, simple to set up means of creating a “protected” volume of data, one that does not have to exceed the physical space or performance of a single hard drive mechanism, RAID 1 is your answer. RAID 1 volumes are always made up of a mirrored pair of hard drives — a pair being two, of course! Unlike RAID 0, where bits of a file may be spread across the constituent drives, but not redundantly, RAID 1 makes sure every bit of every file is redundantly written to both member drives of the volume. If one of the drives suffers a physical failure, all of the data still resides on the other drive, and can be pulled off of it without issue. Because of this inherent pairing of drives, the overall volume is always equal in space to only one of the two constituent drives. As far as performance goes, RAID 1 volumes perform about the same as, and sometimes even a little slower than, a single one of the constituent drives.
RAIDs Are Not Backups!
Before going too much further into discussing RAID, let me stop and emphasize that RAID volumes, even when data is redundantly spread across multiple drives, ARE NOT BACKUPS! This point cannot be over-emphasized. If your office burns down, a RAID 1 mirrored volume will not help you. Same goes for flooding, vandalism, accidental deletion or modification of files, etc. etc. etc. A RAID 1 (or 3, or 5, or 6) is simply a more robust storage medium than a single hard drive. It is not a proper backup, that can be kept in a separate location entirely, that can allow you to restore a old file that you deleted, or modified destructively, etc.
The Quasi-Mystical Properties of Parity Data
RAID 0 and 1 should both be pretty easy to conceptualize, even if you are a layman, or “average user” versus a computer guru or technician or programmer. All the other common levels of RAID get quite a bit more difficult for us “mere mortals” to understand. Admittedly, I only barely understand exactly how it works. So without outright resorting to a simile such as “it’s just fairies in your hard drives doing magic,” I will talk about parity data in a rudimentary way that is hopefully useful at least in a cursory sense. Remember that Google is your friend, if you want to get deeper into this subject!
To understand how sophisticated RAID storage systems work, you need to understand a little bit about parity data. And before you can understand parity data, you have to understand a little bit about a logical operation called XOR. XOR as a logical operation works like this: For every “n” bits of data, an additional bit is generated, called a parity bit. This parity bit is unique, in that it was derived specifically from the original bits of data by performing a particular operation on them. So let’s say you have three bits of data, each with a particular value. When these “source” bits are run through an XOR operation, a fourth bit is generated that has a special relationship to the original bits. What’s so special about it?
If any of the now four total bits of data are somehow lost, the remaining three, no matter which ones they happen to be, can be run through another operation that will generate the missing bit of data.
Now follow this one step further — if you have a lot of “bits” of data, which make up many different files, and you run all of them through this XOR operation to generate parity bits for them all, and then you spread all of the bits of each file onto different physical hard drives, if one of the drives in that RAID volume fails, you can still regenerate that missing bit from the remaining bits that are living on the other drives! Yes, I know it sounds like magic, but it’s true. I was frankly never good enough at math or logic to understand how this works, but this is what RAID controllers do — they are tasked, essentially, with managing XOR operations to generate parity bits, spread all the bits across multiple drives in a RAID set, and, if a drive fails, reconstitute the missing bit from all the remaining bits. It does not matter if the drive that fails happens to have been storing one of the original bits, or the parity bit — because any combination of the remaining bits can be used to reconstitute the missing bit.
RAID 3, 5 and 6
RAID 3 has a designated drive in the RAID set that stores all of the parity data that is generated. RAID 5, which is much more common these days, spreads the parity data across all the drives in the set, which is a more efficient way of doing it netting more capacity at the end of the process. If you are curious how much space you will lose to parity data, remember that this involves how many drives are in your RAID set, and how large each of them happens to be. RAID calculators can easily be found online that will tell you how much usable space your resulting volume will be, after applying a particular RAID scheme to a set of drives.
RAID 6 is similar to RAID 5, except that two parity bits are generated for every “n” original bits of data. This takes up more space in your RAID set for parity data of course, but means your RAID set can suffer the loss of up to two different hard drives before any data is actually lost.
RAID levels 3, 5 and 6 have the property of being more high-performance than individual hard drives. While not as fast as RAID 0 volumes, due to the overhead of performing XOR operations to generate parity bits left and right, you do still see a significant aggregation of performance of the multiple drives in the RAID set.
Rebuilding RAID Sets
When a drive has failed in RAID 1, 3, 5, or 6 sets, usually you are notified of this as a user. That however is not the case with software-based RAID 1 mirrored volumes. But for hardware-based RAIDs, and this by definition means any RAID 3, 5 or 6 volume, usually an alarm goes off, a light is flashing, and you are told that a drive has failed. You can almost always still access your data, and the volume as a whole is still mounted and accessible — but it is operating in a reduced performance mode, and, more importantly, if a second (or in the case of RAID 6, third) drive fails, all data will be lost.
So what you need to do, usually, is pull out the failed drive, and replace it with a new drive of (typically) the same make/model/size as the failed drive, and the RAID set will “rebuild” itself. In RAID 1, this simply means that data from the remaining original drive will get mirrored onto the new empty drive. In RAID 3, 5, or 6, this actually means a recalculation of missing bits (either original or parity bits), and a redistribution of data across the new RAID set. Most modern RAIDs can do this overnight, and until that operation is complete, performance will be sluggish compared to normal performance. But, your data will at least be accessible.
Hot and Cold Spares
RAID systems usually have the option of setting up one or more of your drives as a Hot Spare. This drive is not used toward the overall capacity of the RAID set, and does not store either original or parity bits. It is simply used to regenerate the RAID set if one of the real constituent drives of the RAID fails.
A Cold Spare is a spare drive you have sitting on the shelf, not part of the RAID at all, not in a RAID chassis, but available to you if you need to physically pull a failed drive, and replace it with a good one. Obviously this leaves you the capacity of all the drives in the chassis to use toward your volume, but does not offer such an immediate rebuilding of the volume in the case of a failed drive, as having a Hot Spare would allow. And any time between the failure of a drive and the RAID set being fully rebuilt, is time a second drive could die, and this could lead to total data loss.
Hardware Vs. Software RAIDs
Mac OS X, as I stated in a previous Storage Primer article, can create RAID 0 and RAID 1 volumes purely via software. This is an ideal way of setting up RAID 0 and 1 volumes, because you can pop out those hard drives (from inside your Mac Pro, our out of your external drive chassis) and bring them to any other Mac OS X system, mount the drives, and have them come up as the very same RAID 0 or RAID 1 volume. Managing that volume requires very little overhead to the host system. It really is the ideal way of creating a RAID 0 or 1 volume. All you have to use is Disk Utility, and while there are a couple of subtleties you can certainly talk to us at Chesapeake Systems about, it’s quite straightforward. If you are using a RAID 0 or 1 volume that relies on a hardware controller chip, such as the RAID 0 chip used in the G-Technology G-RAID, or the RAID 1 chip used in the G-Safe, you will not be able to get at that data if the controller chip itself fails.
RAID 3 and up, however, is where parity data and XOR operations become involved, and this is a pretty “horsepower-intensive” operation. That is why there is a beast known as a RAID controller, and this is the “brain” that is required to manage RAID 3, 5 or 6 sets, manage hot spares, rebuild RAID sets if a drive fails and is replaced, etc. A RAID controller is similar to a CPU in your computer, except that as opposed to the very broad set of tasks your computer’s CPU has to fulfill, a RAID controller just about solely performs XOR operations. Not a sexy job, but someone (or something) has to do it! Sometimes the RAID controller lives on a PCI card that is inside your Mac Pro, and sometimes it’s actually built into the RAID chassis itself that lives outside of your computer. We’ll discuss these issues in more depth in the next installment of the Storage Primer, but for now just know that RAID 3, 5 and 6 require some sort of separate RAID controller device managing the RAID set.
In Conclusion
If you’ve made it this far, you know quite a bit about the different RAID levels, and essentially how they work — at least, a bit more knowledge than thinking of it as pure “pixie dust”! I will say again, RAIDs are not backups! It is much better to keep your data, for security’s sake, on a RAID 1, 3, 5 or 6 volume than a single hard drive or RAID 0 set, but it is still not a proper backup! A future article will address backup in much complex terms than we have done to-date.
Stay tuned for next time, when we get back to talking about actual solutions for storage. The next article will look at Mac Pro desktop Direct-Attached Storage options that are available on the market, and of course we sell and support at Chesapeake Systems.
Posted: February 24th, 2009 under Technical Articles.
Tags: raid, storage, storage primer
