cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Recommended way to do parallel reads of a large column slice?
Date Tue, 02 Feb 2010 02:31:30 GMT
If you want to parallelize (a good idea in general) you are best
served by doing so across rows rather than across columns.

(Another possibility if you have a relatively static breakdown of
columns that makes sense is to spread them across different CFs w/ the
same key.)


On Mon, Feb 1, 2010 at 7:32 PM, Cagatay Kavukcuoglu
<> wrote:
> A large column slice in my case is tens of thousands of columns, each
> a few K's in size and independent in processing from others. My plan
> was to read slices of a few hundred to a thousand columns and process
> them in a pipeline for reduced overall latency. Regardless of my
> specific case, though, I thought one of the best ways to get good
> performance scaling in Cassandra was to distribute reads and writes to
> multiple nodes. Are there situations where that's not a good idea?
> CK.
> On Mon, Feb 1, 2010 at 6:00 PM, Jonathan Ellis <> wrote:
>> No.  Why do you want to do multiple parallel reads instead of one
>> sequential read?
>> On Mon, Feb 1, 2010 at 4:45 PM, Cagatay Kavukcuoglu
>> <> wrote:
>>> Hi,
>>> What's the recommended way to do parallel reads of a large slice of
>>> columns when one doesn't know enough about the column names to divide
>>> them for parallel reading in a meaningful way? SliceRange allows
>>> setting the start and finish column names, but you wouldn't be able to
>>> set the start field of the next read until the previous read
>>> completed. An offset field for the SliceRange would have worked, but I
>>> don't see it. Is there a way to divide the big read query into
>>> multiple *parallel* small read queries without requiring advance
>>> knowledge of the column names?
>>> CK.

View raw message