incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Masson <>
Subject Re: Cassandra read throughput with little/no caching.
Date Mon, 24 Dec 2012 12:07:58 GMT
Hi Aaron,

On 23/12/12 20:18, aaron morton wrote:
> First, the non helpful advice, I strongly suggest changing the data
> model so you do not have 100MB+ rows. They will make life harder.

I don't think we have 100MB+ rows. Column families, yes - but not rows.

>> Write request latency is about 900 microsecs, read request
>>         latency
>>         is about 4000 microsecs.
> 4 milliseconds to drag 100 to 300 MB data off a SAN, through your
> network, into C* and out to the client does not sound terrible at first
> glance. Can you benchmark and individual request to get an idea of the
> throughput?

It's large numbers of small requests - 250 writes/sec - about 100 
reads/sec. I might look at some tcpdumps, to see what it's actually doing...

With a total volume of approx 400Mb, split over 3 nodes, it takes about 
30mins to run through the complete data-set. There's near zero disk I/O, 
and disk-wait. It's definitely coming out of the Linux disk cache.

That works out at about 0.2Mb/sec in data crunching terms - and about 
0.6Mb/sec network I/O.

> I would recommend removing the SAN from the equation, cassandra will run
> better with local disks. It also introduces a single point of failure
> into a distributed system.

Understood about the SPoF, but negated by good SAN fabric design. I 
think a single local disk or two is going to find it hard to compete 
with a FC attached SAN with Gb of dedicated DRAM cache, and SSD tiering.
This is all on VMware anyway, so there's no option of local disks.

>> but it's likely in the Linux disk cache, given the sizing of the
>> node/data/jvm.
> Are you sure that the local Linux machine is going to cache files stored
> on the SAN ?

Yes, Linux doesn't care ( and isn't aware) at the filesystem level if 
the volume is 'local' or not, everything goes through the same caching 
strategy. Again, because this is VMware, it appears as a 'local' disk 

In short, disk isn't the limiting factor here.


James M

View raw message