I found using a JBOD SSD setup (one per data directory) to be faster than RAID. JBOD configuration will also allow a disk to fail and the remaining disks to continue serving reads if you set disk_failure_policy: best_effort.
If you do go for a RAID controller watch out for any special read/write caching options. I found these also killed performance.
From: Alain RODRIGUEZ [mailto:email@example.com]
Sent: 08 August 2013 09:30
Subject: Re: lots of small nodes vs fewer big nodes
I advise you to have 7GB RAM better (more would be better if you have a lot of data per node). But the real difference is made when you use SSDs or RAIDs of SSDs, since the bottleneck is the disk throughput in most cases.
We are in the cloud, we tried a lot of configurations and were comfortable only with a nodes RAM > 7 GB (RAM >15 GB with lot of data / node) and see a real enhancement when we switched to SSDs (latency from 20-40 to 3-5 ms), even reducing the number of nodes from 18 to 3. This was quite impressive and I recommend SSD (or RAID of SSDs) since to anyone who can afford it.
2013/8/7 Andrey Ilinykh <firstname.lastname@example.org>
You still have the same amount of RAM, so you cache the same amount of data. I don't think you gain much here. On the other side, maintenance procedures (compaction, repair) may hit your 2CPU box. I wouldn't do it.
On Wed, Aug 7, 2013 at 10:24 AM, Paul Ingalls <email@example.com> wrote:
Quick question about systems architecture.
Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with 3.5GB RAM and 2CPUS?
I'm currently running the former, but am considering the latter. My goal would be to improve overall performance by spreading the IO across more disks. My currently cluster has low CPU utilization but does spend a good amount of time in iowait. Would moving to more smaller nodes help with that? Or would I run into trouble with the smaller ram and cpu?