However, when monitoring the performance of our cluster, we see sustained periods - especially during repair/compaction/cleanup - of several hours where there are >2000 IOPS.
If the IOPS are there compaction / repair / cleanup will use them if the configuration allows it. If there are not there and the configuration matches the resources the only issue will be things take longer (assuming the HW can handle the throughput). 

2) Move some nodes to a SAN solution, ensuring that there is a mix of storage, drives,
IMHO you will have a terrible time and regret the decision. Performance in anger rarely matches local disks and when someone decides the SAN needs to go through a maintenance process say goodbye to your node. Also you will need very good network links. 

Cassandra is designed for shared nothing architecture, itís best to embrace that. 


1) Has anyone moved from SSDs to spinning-platter disks, or managed a cluster that contained both? Do the numbers we're seeing exaggerate the performance hit we'd see if we moved to spinners?
Try to get a feel for the general IOPS used for reads without compaction etc running. 
Also for the bytes going into the cluster on the rpc / native binary interface. 
 
2) Have you successfully used a SAN or a hybrid SAN solution (some local, some SAN-based) to dynamically add storage to the cluster? What type of SAN have you used, and what issues have you run into?
Iíve worked with people who have internal SANS and those that have used EBS. I would not describe either solution as optimal. The issues are performance under load, network contention, SLA / consistency. 


3) Am I missing a way of economically scaling storage?
version 1.2+ has better support for fat nodes, nodes with up to 5TB of data via:

* JBOD: mount each disk independently and add it to adata_file_directories . Cassandra will balance the write load between disks and have one flush thread per data directory, Iíve heard this gives good performance with HDD's. This will give you 100% of the raw disk capacity and mean a single disk failure does necessitate a node rebuild. 
* disk failure: set the disk_failure_policy to best_effort or stop so the node can handle disk failure https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L125
* have good networking in place so you can rebuild a failed node, either completely or from a failed disk. 
* use vnodes so that as the number of nodes grows the time to rebuild a failed node drops. 

I would be a little uneasy about very high node loads with only three nodes. The main concern is how long it will take to replace a node that completely fails. 

Iíve also seen people have a good time moving from SSD to 12 fast disks in a RAID10 config.

You can mix HDD and SSDís and have some hot CFís on the SSD and others on the HDD. 

Hope that helps. 
 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 1/11/2013, at 10:01 am, Franc Carter <franc.carter@sirca.org.au> wrote:


I can't comment on the technical question, however one thing I learnt with managing the growth of data is that the $/GB of tends to drop at a rate that can absorb a moderate proportion of the  increase in cost due to the increase in size of data. I'd recommend having a wet-finger-in-the-air stab at projecting the growth in data sizes versus the historical trends in the decease in cost of storage.

cheers



On Fri, Nov 1, 2013 at 7:15 AM, Dave Cowen <dave@luciddg.com> wrote:
Hi, all -

I'm currently managing a small Cassandra cluster, several nodes with local SSD storage.

It's difficult for to forecast the growth of the Cassandra data over the next couple of years for various reasons, but it is virtually guaranteed to grow substantially.

During this time, there may be times where it is desirable to increase the amount of storage available to each node, but, assuming we are not I/O bound, keep from expanding the cluster horizontally with additional nodes that have local storage. In addition, expanding with local SSDs is costly.

My colleagues and I have had several discussions of a couple of other options that don't involve scaling horizontally or adding SSDs:

1) Move to larger, cheaper spinning-platter disks. However, when monitoring the performance of our cluster, we see sustained periods - especially during repair/compaction/cleanup - of several hours where there are >2000 IOPS. It will be hard to get to that level of performance in each node with spinning platter disks, and we'd prefer not to take that kind of performance hit during maintenance operations.

2) Move some nodes to a SAN solution, ensuring that there is a mix of storage, drives, LUNs and RAIDs so that there isn't a single point of failure. While we're aware that this is frowned on in the Cassandra community due to Cassandra's design, a SAN seems like the obvious way of being able to quickly add storage to a cluster without having to juggle local drives, and provides a level of performance between local spinning platter drives and local SSDs.

So, the questions:

1) Has anyone moved from SSDs to spinning-platter disks, or managed a cluster that contained both? Do the numbers we're seeing exaggerate the performance hit we'd see if we moved to spinners?

2) Have you successfully used a SAN or a hybrid SAN solution (some local, some SAN-based) to dynamically add storage to the cluster? What type of SAN have you used, and what issues have you run into?

3) Am I missing a way of economically scaling storage?

Thanks for any insight.

Dave



--
Franc Carter | Systems architect | Sirca Ltd
franc.carter@sirca.org.au | www.sirca.org.au
Tel: +61 2 8355 2514
Level 4, 55 Harrington St, The Rocks NSW 2000
PO Box H58, Australia Square, Sydney NSW 1215