cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Large SliceRanges: Reading all results in to memory vs. reading smaller result sub-sets at a time?
Date Thu, 08 Mar 2012 09:19:25 GMT
It is better to get a sensible amount. Moving a few MB's is ok (see thrift_framed_transport_size_in_mb
in cassandra.yaml). 

Long running queries can reduce the overall query throughput. They also churn memory over
on both the server and the client. 

Run some tests on your data, see how long it takes to iterate over all the columns using different
slice sizes. More is not always better. 

Cheers
 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/03/2012, at 11:56 AM, Kevin wrote:

> When dealing with large SliceRanges, it better to read all the results in to memory (by
setting “count” to the largest value possible), or is it better to divide the query in
to smaller SliceRange queries? Large in this case being on the order of millions of rows.
>  
> There’s a footnote concerning SliceRanges on the main Apache Cassandra project site
that reads:
>  
> “…Thrift will materialize the whole result into memory before returning it to the
client, so be aware that you may be better served by iterating through slices by passing the
last value of one call in as the start of the next instead of increasing count arbitrarily
large.”
>  
> … but it doesn’t delve in to the reasons why going about things that way is better.
>  
> Can someone shed some light on this? And would the same logic apply to large KeyRanges?
>  


Mime
View raw message