incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Storage management during rapid growth
Date Tue, 05 Nov 2013 07:07:40 GMT
> However, when monitoring the performance of our cluster, we see sustained periods - especially
during repair/compaction/cleanup - of several hours where there are >2000 IOPS.
If the IOPS are there compaction / repair / cleanup will use them if the configuration allows
it. If there are not there and the configuration matches the resources the only issue will
be things take longer (assuming the HW can handle the throughput). 

> 2) Move some nodes to a SAN solution, ensuring that there is a mix of storage, drives,
IMHO you will have a terrible time and regret the decision. Performance in anger rarely matches
local disks and when someone decides the SAN needs to go through a maintenance process say
goodbye to your node. Also you will need very good network links. 

Cassandra is designed for shared nothing architecture, it’s best to embrace that. 


> 1) Has anyone moved from SSDs to spinning-platter disks, or managed a cluster that contained
both? Do the numbers we're seeing exaggerate the performance hit we'd see if we moved to spinners?
Try to get a feel for the general IOPS used for reads without compaction etc running. 
Also for the bytes going into the cluster on the rpc / native binary interface. 
 
> 2) Have you successfully used a SAN or a hybrid SAN solution (some local, some SAN-based)
to dynamically add storage to the cluster? What type of SAN have you used, and what issues
have you run into?
I’ve worked with people who have internal SANS and those that have used EBS. I would not
describe either solution as optimal. The issues are performance under load, network contention,
SLA / consistency. 


> 3) Am I missing a way of economically scaling storage?
version 1.2+ has better support for fat nodes, nodes with up to 5TB of data via:

* JBOD: mount each disk independently and add it to adata_file_directories . Cassandra will
balance the write load between disks and have one flush thread per data directory, I’ve
heard this gives good performance with HDD's. This will give you 100% of the raw disk capacity
and mean a single disk failure does necessitate a node rebuild. 
* disk failure: set the disk_failure_policy to best_effort or stop so the node can handle
disk failure https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L125
* have good networking in place so you can rebuild a failed node, either completely or from
a failed disk. 
* use vnodes so that as the number of nodes grows the time to rebuild a failed node drops.


I would be a little uneasy about very high node loads with only three nodes. The main concern
is how long it will take to replace a node that completely fails. 

I’ve also seen people have a good time moving from SSD to 12 fast disks in a RAID10 config.

You can mix HDD and SSD’s and have some hot CF’s on the SSD and others on the HDD. 

Hope that helps. 
 

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 1/11/2013, at 10:01 am, Franc Carter <franc.carter@sirca.org.au> wrote:

> 
> I can't comment on the technical question, however one thing I learnt with managing the
growth of data is that the $/GB of tends to drop at a rate that can absorb a moderate proportion
of the  increase in cost due to the increase in size of data. I'd recommend having a wet-finger-in-the-air
stab at projecting the growth in data sizes versus the historical trends in the decease in
cost of storage.
> 
> cheers
> 
> 
> 
> On Fri, Nov 1, 2013 at 7:15 AM, Dave Cowen <dave@luciddg.com> wrote:
> Hi, all -
> 
> I'm currently managing a small Cassandra cluster, several nodes with local SSD storage.
> 
> It's difficult for to forecast the growth of the Cassandra data over the next couple
of years for various reasons, but it is virtually guaranteed to grow substantially.
> 
> During this time, there may be times where it is desirable to increase the amount of
storage available to each node, but, assuming we are not I/O bound, keep from expanding the
cluster horizontally with additional nodes that have local storage. In addition, expanding
with local SSDs is costly.
> 
> My colleagues and I have had several discussions of a couple of other options that don't
involve scaling horizontally or adding SSDs:
> 
> 1) Move to larger, cheaper spinning-platter disks. However, when monitoring the performance
of our cluster, we see sustained periods - especially during repair/compaction/cleanup - of
several hours where there are >2000 IOPS. It will be hard to get to that level of performance
in each node with spinning platter disks, and we'd prefer not to take that kind of performance
hit during maintenance operations.
> 
> 2) Move some nodes to a SAN solution, ensuring that there is a mix of storage, drives,
LUNs and RAIDs so that there isn't a single point of failure. While we're aware that this
is frowned on in the Cassandra community due to Cassandra's design, a SAN seems like the obvious
way of being able to quickly add storage to a cluster without having to juggle local drives,
and provides a level of performance between local spinning platter drives and local SSDs.
> 
> So, the questions:
> 
> 1) Has anyone moved from SSDs to spinning-platter disks, or managed a cluster that contained
both? Do the numbers we're seeing exaggerate the performance hit we'd see if we moved to spinners?
> 
> 2) Have you successfully used a SAN or a hybrid SAN solution (some local, some SAN-based)
to dynamically add storage to the cluster? What type of SAN have you used, and what issues
have you run into?
> 
> 3) Am I missing a way of economically scaling storage?
> 
> Thanks for any insight.
> 
> Dave
> 
> 
> 
> -- 
> Franc Carter | Systems architect | Sirca Ltd
> franc.carter@sirca.org.au | www.sirca.org.au
> Tel: +61 2 8355 2514 
> Level 4, 55 Harrington St, The Rocks NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
> 


Mime
View raw message