If the IOPS are there compaction / repair / cleanup will use them if the configuration allows it. If there are not there and the configuration matches the resources the only issue will be things take longer (assuming the HW can handle the throughput).
IMHO you will have a terrible time and regret the decision. Performance in anger rarely matches local disks and when someone decides the SAN needs to go through a maintenance process say goodbye to your node. Also you will need very good network links.
Cassandra is designed for shared nothing architecture, itís best to embrace that.
Try to get a feel for the general IOPS used for reads without compaction etc running.
Also for the bytes going into the cluster on the rpc / native binary interface.
Iíve worked with people who have internal SANS and those that have used EBS. I would not describe either solution as optimal. The issues are performance under load, network contention, SLA / consistency.
version 1.2+ has better support for fat nodes, nodes with up to 5TB of data via:
* JBOD: mount each disk independently and add it to adata_file_directories . Cassandra will balance the write load between disks and have one flush thread per data directory, Iíve heard this gives good performance with HDD's. This will give you 100% of the raw disk capacity and mean a single disk failure does necessitate a node rebuild.
* have good networking in place so you can rebuild a failed node, either completely or from a failed disk.
* use vnodes so that as the number of nodes grows the time to rebuild a failed node drops.
I would be a little uneasy about very high node loads with only three nodes. The main concern is how long it will take to replace a node that completely fails.
Iíve also seen people have a good time moving from SSD to 12 fast disks in a RAID10 config.
You can mix HDD and SSDís and have some hot CFís on the SSD and others on the HDD.
Hope that helps.
Co-Founder & Principal Consultant
Apache Cassandra Consulting
I can't comment on the technical question, however one thing I learnt with managing the growth of data is that the $/GB of tends to drop at a rate that can absorb a moderate proportion of the increase in cost due to the increase in size of data. I'd recommend having a wet-finger-in-the-air stab at projecting the growth in data sizes versus the historical trends in the decease in cost of storage.