incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: High performance disk io
Date Thu, 23 May 2013 22:49:57 GMT
>  I am currently trying to really study the effect of the width of a row (being in multiple
sstables) vs its 95th percentile read time.
I'd be interested to see your findings. 

Is use 3+ SSTables per read as (from cfhistograms) as a warning sign to dig deeper in the
data model. Also the type of query impacts on the number of SSTables per read, queries by
column name can short circuit and may be served from (say) 0 or 1 sstables even if the row
is spread out. 

> -We don’t change anything and just keep upping our keycache.
> 

800MB is a very high key cache and may result in poor GC performance which is ultimately going
to hurt your read latency. Pay attention to what GC is doing, both ParNew and CMS and reduce
the key cache if needed. When ParNew runs the server is stalled. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/05/2013, at 3:16 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

> I have used both rotation disks with lots of RAM as well as SSD devices. An important
thing to consider is that SSD devices are not magic. You have big-o-notation in several places.

> 1) more data large bloom filters
> 2) more data (larger key caches) JVM overhead
> 3) more requests more young gen JVM overhead
> 4) more data longer compaction (even with ssd)
> 5) more writes (more memtable flushing)
> Bottom line: more data more disk seeks
> 
> We have used both the mid level SSD as well as the costly fusion io. Fit in RAM/VFScache
delivers better more predictable low latency, even with very fast disks the average, 95th,
and 99th, percentile can get by very far apart. I am currently trying to really study the
effect of the width of a row (being in multiple sstables) vs its 95th percentile read time.
> 
> 
> On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt <chris.wirt@struq.com> wrote:
> Hi Igor,
> 
>  
> 
> I was talking about 99th percentile from the Cassandra histograms when I said ‘1 or
2 ms for most cf’.
> 
>  
> 
> But we have measured client side too and generally get a couple ms added on top.. as
one might expect.
> 
>  
> 
> Anyone interested -
> 
> diskio (my original question) we have tried out the multiple SSD setup and found it to
work well and reduce the impact of a repair on node performance.
> 
> We ended up going with the single data directory in cassandra.yaml and mount one SSD
against that. Then have a dedicated SSD per large column family.
> 
> We’re now moving all of nodes to have the same setup.
> 
>  
> 
>  
> 
> Chris
> 
>  
> 
> From: Igor [mailto:igor@4friends.od.ua] 
> Sent: 23 May 2013 15:00
> To: user@cassandra.apache.org
> Subject: Re: High performance disk io
> 
>  
> 
> Hello Christopher,
> 
> BTW, are you talking about 99th percentiles on client side, or about percentiles from
cassandra histograms for CF on cassandra side?
> 
> Thanks!
> 
> On 05/22/2013 05:41 PM, Christopher Wirt wrote:
> 
> Hi Igor,
> 
>  
> 
> Yea same here, 15ms for 99th percentile is our max. Currently getting one or two ms for
most CF. It goes up at peak times which is what we want to avoid.
> 
>  
> 
> We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of thrift. Needed
to be .NET so Hector and Astyanax were not options.
> 
>  
> 
> Do you use SSDs or multiple SSDs in any kind of configuration or RAID?
> 
>  
> 
> Thanks
> 
>  
> 
> Chris
> 
>  
> 
> From: Igor [mailto:igor@4friends.od.ua] 
> Sent: 22 May 2013 15:07
> To: user@cassandra.apache.org
> Subject: Re: High performance disk io
> 
>  
> 
> Hello
> 
> What level of read performance do you expect? We have limit 15 ms for 99 percentile with
average read latency near 0.9ms. For some CF 99 percentile actually equals to 2ms, for other
- to 10ms, this depends on the data volume you read in each query.
> 
> Tuning read performance involved cleaning up data model, tuning cassandra.yaml, switching
from Hector to astyanax, tuning OS parameters.
> 
> On 05/22/2013 04:40 PM, Christopher Wirt wrote:
> 
> Hello,
> 
>  
> 
> We’re looking at deploying a new ring where we want the best possible read performance.
> 
>  
> 
> We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb Heap,
800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb SATA for OS and commitlog
> 
> Three column families
> 
> ColFamily1 50% of the load and data
> 
> ColFamily2 35% of the load and data
> 
> ColFamily3 15% of the load and data
> 
>  
> 
> At the moment we are still seeing around 20% disk utilisation and occasionally as high
as 40/50% on some nodes at peak time.. we are conducting some semi live testing.
> 
> CPU looks fine, memory is fine, keycache hit rate is about 80% (could be better, so maybe
we should be increasing the keycache size?)
> 
>  
> 
> Anyway, we’re looking into what we can do to improve this.
> 
>  
> 
> One conversion we are having at the moment is around the SSD disk setup..
> 
>  
> 
> We are considering moving to have 3 smaller SSD drives and spreading the data across
those.
> 
>  
> 
> The possibilities are:
> 
> -We have a RAID0 of the smaller SSDs and hope that improves performance.
> 
> Will this acutally yield better throughput?
> 
>  
> 
> -We mount the SSDs to different directories and define multiple data directories in Cassandra.yaml.
> 
> Will not having a layer of RAID controller improve the throughput?
> 
>  
> 
> -We mount the SSDs to different columns family directories and have a single data directory
declared in Cassandra.yaml.
> 
> Think this is quite attractive idea.
> 
> What are the drawbacks? System column families will be on the main SATA?
> 
>  
> 
> -We don’t change anything and just keep upping our keycache.
> 
> -Anything you guys can think of.
> 
>  
> 
> Ideas and thoughts welcome. Thanks for your time and expertise.
> 
>  
> 
> Chris
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 


Mime
View raw message