incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cagatay Kavukcuoglu <cagatay.kavukcuo...@gmail.com>
Subject Re: Recommended way to do parallel reads of a large column slice?
Date Tue, 02 Feb 2010 01:32:38 GMT
A large column slice in my case is tens of thousands of columns, each
a few K's in size and independent in processing from others. My plan
was to read slices of a few hundred to a thousand columns and process
them in a pipeline for reduced overall latency. Regardless of my
specific case, though, I thought one of the best ways to get good
performance scaling in Cassandra was to distribute reads and writes to
multiple nodes. Are there situations where that's not a good idea?

CK.

On Mon, Feb 1, 2010 at 6:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> No.  Why do you want to do multiple parallel reads instead of one
> sequential read?
>
> On Mon, Feb 1, 2010 at 4:45 PM, Cagatay Kavukcuoglu
> <cagatay@kavukcuoglu.org> wrote:
>> Hi,
>>
>> What's the recommended way to do parallel reads of a large slice of
>> columns when one doesn't know enough about the column names to divide
>> them for parallel reading in a meaningful way? SliceRange allows
>> setting the start and finish column names, but you wouldn't be able to
>> set the start field of the next read until the previous read
>> completed. An offset field for the SliceRange would have worked, but I
>> don't see it. Is there a way to divide the big read query into
>> multiple *parallel* small read queries without requiring advance
>> knowledge of the column names?
>>
>> CK.
>>
>

Mime
View raw message