cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Experiences with Cassandra hardware planning
Date Mon, 25 Oct 2010 15:25:32 GMT
On Mon, Oct 25, 2010 at 11:21 AM, Eric Rosenberry <> wrote:
> Hey Chris-
> That is tough to say as we started out with no data and have been
> continuously loading data into the cluster.  Initially we had less data than
> the amount of RAM in each node (48 gigs) but we have eventually exceeded
> that and now have many times more data on each node than in the entire
> cluster.
> Some key points though:
> 1. Upon cold start of the cluster (i.e. nothing in file system cache) disk
> i/o was massive even when the total dataset was less than the RAM in one
> system (this same thing holds true in RDBMS systems of course, though many
> of them are smart about pre-loading data)
> 2. We gave up on using Cassandra's row cache as loading any reasonable
> amount of data into the cache would take days/weeks with our tiny row size.
>  We instead are using file system cache.
> 3. After switching to SSD's we thought we might be able to get away with les
> RAM (as we were relying on the SSD's to be fast rather than RAM cache) but
> dropping them to 24 gigs cut the clusters read capacity by 75%.
> 4. When Cassandra is set to replication factor of three and the read replica
> count is one, data still gets read (for read repair) on all three nodes that
> have a copy of the data.  This brings that data into memory on those
> machines so the amount of total cluster memory available to cache actual
> data is not 192 gigs in my example of four nodes, but only 64 gigs minus OS
> and Cassandra overhead (I divided 192 by three since three copies are stored
> in RAM across the cluster).
> -Eric
> On Mon, Oct 25, 2010 at 7:41 AM, Chris Burroughs <>
> wrote:
>> You mention that you consistently found your boxes IO bound despite the
>> large amount of RAM available for caching.  Could you state roughly what
>> the ratio of RAM to on disk data was?

If reading properly it looks like you used Linux Software Raid on top
of the SSD devices. Can you talk about this? I would think that even
with a simple RAID this would drive you CPU high. But it seems you may
not have other options since SSD RAID cards probably do not exist.

View raw message