cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oskar Kjellin <oskar.kjel...@gmail.com>
Subject Re: Read efficiency question
Date Tue, 27 Dec 2016 16:17:10 GMT
Yes sorry I missed the double parenthesis in the first case. 

I may be a bit off here, but I don't think the coordinator pinpoints the row but just the
node it needs to go to. 
It's more a case of creating smaller partitions, which makes for more even load among the
cluster and the node will not have to read a whole lot of data into memory to just GC later
on. 

If you think of Cassandra as a hash map (which it kind of is). You like the key to be as unique
as possible to not have to go to a bucket and filter there, or create hot spots. 

Sent from my iPhone

> On 27 Dec 2016, at 17:12, Voytek Jarnot <voytek.jarnot@gmail.com> wrote:
> 
> Thank you Oskar.  I think you may be missing the double parentheses in the first example
- difference is between partition key of (key1, key2, key3) and (key1, key2).  With that in
mind, I believe your answer would be that the first example is more efficient?
> 
> Is this essentially a case of the coordinator node being able to exactly pinpoint a row
(first example) vs the coordinator node pinpointing the partition and letting the partition-owning
node refine down to the right row using the clustering key (key3 in the second example)?
> 
>> On Tue, Dec 27, 2016 at 10:06 AM, Oskar Kjellin <oskar.kjellin@gmail.com> wrote:
>> The second one will be the most efficient.
>> How much depends on how unique key1 is.
>> 
>> In the first case everything for the same key1 will be on the same partition.  If
it's not unique at all that will be very bad.
>> In the second case the combo of key1 and key2 will decide what partition.
>> 
>> If you don't ever have to find all key2 for a given key1 I don't see any reason to
do case 1
>> 
>> 
>> > On 27 Dec 2016, at 16:42, Voytek Jarnot <voytek.jarnot@gmail.com> wrote:
>> >
>> > Wondering if there's a difference when querying by primary key between the two
definitions below:
>> >
>> > primary key ((key1, key2, key3))
>> > primary key ((key1, key2), key3)
>> >
>> > In terms of read speed/efficiency... I don't have much of a reason otherwise
to prefer one setup over the other, so would prefer the most efficient for querying.
>> >
>> > Thanks.
> 

Mime
View raw message