incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Huge query Cassandra limits
Date Wed, 17 Jul 2013 10:12:51 GMT
>  In ours tests,  we found there's a significant performance difference between various
 configurations and we are studying a policy to optimize it. The doubt is that, if the needing
of issuing multiple requests is caused only by a fixable implementation detail, would make
pointless do this study.
if you provide your numbers we can see if you are getting expected results. 

There are some limiting factors. Using the thrift API the max message size is 15 MB. And each
row you ask for becomes (roughly) RF number of tasks in the thread pools on replicas. When
you ask for 1000 rows it creates (roughly) 3,000 tasks in the replicas. If you have other
clients trying to do reads at the same time this can cause delays to their reads. 

Like everything in computing, more is not always better. Run some tests to try multi gets
with different sizes and see where improvements in the overall throughput begin to decline.


Also consider using a newer client with token aware balancing and async networking. Again
though, if you try to read everything at once you are going to have a bad day.

Cheers
  
-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/07/2013, at 8:24 PM, cesare cugnasco <cesare.cugnasco@gmail.com> wrote:

> Hi Rob,
> of course, we could issue multiple requests, but then we should  consider which is the
optimal way to split the query in smaller ones. Moreover, we should choose how many of sub-query
run in parallel.
>  In ours tests,  we found there's a significant performance difference between various
 configurations and we are studying a policy to optimize it. The doubt is that, if the needing
of issuing multiple requests is caused only by a fixable implementation detail, would make
pointless do this study.
> 
> Does anyone made similar analysis?
> 
> 
> 2013/7/16 Robert Coli <rcoli@eventbrite.com>
> 
> On Tue, Jul 16, 2013 at 4:46 AM, cesare cugnasco <cesare.cugnasco@gmail.com> wrote:
> We  are working on porting some life science applications to Cassandra, but we have to
deal with its limits managing huge queries. Our queries are usually multiget_slice ones: many
rows with many columns each.
> 
> You are not getting much "win" by increasing request size in Cassandra, and you expose
yourself to "lose" such as you have experienced.
> 
> Is there some reason you cannot just issue multiple requests?
> 
> =Rob 
> 


Mime
View raw message