It is better to get a sensible amount. Moving a few MB's is ok (see thrift_framed_transport_size_in_mb
in cassandra.yaml).
Long running queries can reduce the overall query throughput. They also churn memory over
on both the server and the client.
Run some tests on your data, see how long it takes to iterate over all the columns using different
slice sizes. More is not always better.
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 8/03/2012, at 11:56 AM, Kevin wrote:
> When dealing with large SliceRanges, it better to read all the results in to memory (by
setting “count” to the largest value possible), or is it better to divide the query in
to smaller SliceRange queries? Large in this case being on the order of millions of rows.
>
> There’s a footnote concerning SliceRanges on the main Apache Cassandra project site
that reads:
>
> “…Thrift will materialize the whole result into memory before returning it to the
client, so be aware that you may be better served by iterating through slices by passing the
last value of one call in as the start of the next instead of increasing count arbitrarily
large.”
>
> … but it doesn’t delve in to the reasons why going about things that way is better.
>
> Can someone shed some light on this? And would the same logic apply to large KeyRanges?
>
|