cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Experiences with Cassandra hardware planning
Date Mon, 25 Oct 2010 15:25:32 GMT
On Mon, Oct 25, 2010 at 11:21 AM, Eric Rosenberry <eric@rosenberry.org> wrote:
> Hey Chris-
> That is tough to say as we started out with no data and have been
> continuously loading data into the cluster.  Initially we had less data than
> the amount of RAM in each node (48 gigs) but we have eventually exceeded
> that and now have many times more data on each node than in the entire
> cluster.
> Some key points though:
> 1. Upon cold start of the cluster (i.e. nothing in file system cache) disk
> i/o was massive even when the total dataset was less than the RAM in one
> system (this same thing holds true in RDBMS systems of course, though many
> of them are smart about pre-loading data)
> 2. We gave up on using Cassandra's row cache as loading any reasonable
> amount of data into the cache would take days/weeks with our tiny row size.
>  We instead are using file system cache.
> 3. After switching to SSD's we thought we might be able to get away with les
> RAM (as we were relying on the SSD's to be fast rather than RAM cache) but
> dropping them to 24 gigs cut the clusters read capacity by 75%.
> 4. When Cassandra is set to replication factor of three and the read replica
> count is one, data still gets read (for read repair) on all three nodes that
> have a copy of the data.  This brings that data into memory on those
> machines so the amount of total cluster memory available to cache actual
> data is not 192 gigs in my example of four nodes, but only 64 gigs minus OS
> and Cassandra overhead (I divided 192 by three since three copies are stored
> in RAM across the cluster).
> -Eric
>
> On Mon, Oct 25, 2010 at 7:41 AM, Chris Burroughs <chris.burroughs@gmail.com>
> wrote:
>>
>> You mention that you consistently found your boxes IO bound despite the
>> large amount of RAM available for caching.  Could you state roughly what
>> the ratio of RAM to on disk data was?
>
>

If reading properly it looks like you used Linux Software Raid on top
of the SSD devices. Can you talk about this? I would think that even
with a simple RAID this would drive you CPU high. But it seems you may
not have other options since SSD RAID cards probably do not exist.

Mime
View raw message