RAID on data disks: It is generally not necessary to use RAID for the following reasons:
- Data is replicated across the cluster based on the replication factor you've chosen.
- Starting in version 1.2, Cassandra includes takes care of disk management with the JBOD (Just a bunch of disks) support feature. Because Cassandra properly reacts to a disk failure, based on your availability/consistency requirements, either by stopping
the affected node or by blacklisting the failed drive, this allows you to deploy Cassandra nodes with large disk arrays without the overhead of RAID 10.
RAID on the commit log disk: Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need the extra redundancy, use RAID 1.
is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks?