incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cowen <d...@luciddg.com>
Subject Storage management during rapid growth
Date Thu, 31 Oct 2013 20:15:42 GMT
Hi, all -

I'm currently managing a small Cassandra cluster, several nodes with local
SSD storage.

It's difficult for to forecast the growth of the Cassandra data over the
next couple of years for various reasons, but it is virtually guaranteed to
grow substantially.

During this time, there may be times where it is desirable to increase the
amount of storage available to each node, but, assuming we are not I/O
bound, keep from expanding the cluster horizontally with additional nodes
that have local storage. In addition, expanding with local SSDs is costly.

My colleagues and I have had several discussions of a couple of other
options that don't involve scaling horizontally or adding SSDs:

1) Move to larger, cheaper spinning-platter disks. However, when monitoring
the performance of our cluster, we see sustained periods - especially
during repair/compaction/cleanup - of several hours where there are >2000
IOPS. It will be hard to get to that level of performance in each node with
spinning platter disks, and we'd prefer not to take that kind of
performance hit during maintenance operations.

2) Move some nodes to a SAN solution, ensuring that there is a mix of
storage, drives, LUNs and RAIDs so that there isn't a single point of
failure. While we're aware that this is frowned on in the Cassandra
community due to Cassandra's design, a SAN seems like the obvious way of
being able to quickly add storage to a cluster without having to juggle
local drives, and provides a level of performance between local spinning
platter drives and local SSDs.

So, the questions:

1) Has anyone moved from SSDs to spinning-platter disks, or managed a
cluster that contained both? Do the numbers we're seeing exaggerate the
performance hit we'd see if we moved to spinners?

2) Have you successfully used a SAN or a hybrid SAN solution (some local,
some SAN-based) to dynamically add storage to the cluster? What type of SAN
have you used, and what issues have you run into?

3) Am I missing a way of economically scaling storage?

Thanks for any insight.

Dave

Mime
View raw message