cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Voytek Jarnot <>
Subject Re: Read efficiency question
Date Fri, 30 Dec 2016 19:38:43 GMT
Thank you Janne.  Yes, these are random-access (scatter) reads - I've
decided on option 1; having also considered (as you wrote) that it will
never make sense to look at ranges of key3.

On Fri, Dec 30, 2016 at 3:40 AM, Janne Jalkanen <>

> In practice, the performance you’re getting is likely to be impacted by
> your reading patterns.  If you do a lot of sequential reads where key1 and
> key2 stay the same, and only key3 varies, then you may be getting better
> peformance out of the second option due to hitting the row and disk caches
> more often. If you are doing a lot of scatter reads, then you’re likely to
> get better performance out of the first option, because the reads will be
> distributed more evenly to multiple nodes.  It also depends on how large
> rows you’re planning to use, as this will directly impact things like
> compaction which has an overall impact of the entire cluster speed.  For
> just a few values of key3, I doubt there would be much difference in
> performance, but if key3 has a cardinality of say, a million, you might be
> better off with option 1.
> As always the advice is - benchmark your intended use case - put a few
> hundred gigs of mock data to a cluster, trigger compactions and do perf
> tests for different kinds of read/write loads. :-)
> (Though if I didn’t know what my read pattern would be, I’d probably go
> for option 1 purely on a gut feeling if I was sure I would never need range
> queries on key3; shorter rows *usually* are a bit better for performance,
> compaction, etc.  Really wide rows can sometimes be a headache
> operationally.)
> May you have energy and success!
> /Janne
> On 28 Dec 2016, at 16:44, Manoj Khangaonkar <> wrote:
> In the first case, the partitioning is based on key1,key2,key3.
> In the second case, partitioning is based on key1 , key2. Additionally you
> have a clustered key key3. This means within a partition you can do range
> queries on key3 efficiently. That is the difference.
> regards
> On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot <>
> wrote:
>> Wondering if there's a difference when querying by primary key between
>> the two definitions below:
>> primary key ((key1, key2, key3))
>> primary key ((key1, key2), key3)
>> In terms of read speed/efficiency... I don't have much of a reason
>> otherwise to prefer one setup over the other, so would prefer the most
>> efficient for querying.
>> Thanks.
> --

View raw message