cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cluster key distribution wrong after upgrading to 0.8.4
Date Sun, 21 Aug 2011 22:01:37 GMT
I'm not sure what the fix is. 

When using an order preserving partitioner it's up to you to ensure the ring is correctly
balanced. 

Say you have the following setup…

node : token
1 : a
2 : h
3 : p

If keys are always 1 character we can say each node own's roughly 33% of the ring. Because
we know there are only 26 possible keys. 

With the RP we know how many keys there are, the output of the md5 calculation is a 128 bit
integer. So we can say what fraction of the total each range is. 

If in the example above keys are of any length, how many values exist between a and h ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2011, at 3:33 AM, Thibaut Britz wrote:

> Hi,
> 
> I will wait until this is fixed beforeI upgrade, just to be sure.
> 
> Shall I open a new ticket for this issue?
> 
> Thanks,
> Thibaut
> 
> On Sun, Aug 21, 2011 at 11:57 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> This looks like an artifact of the way ownership is calculated for the OOP.
>> See https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java#L177
it
>> was changed in this ticket
>> https://issues.apache.org/jira/browse/CASSANDRA-2800
>> The change applied in CASSANDRA-2800 was not applied to the
>> AbstractByteOrderPartitioner. Looks like it should have been. I'll chase
>> that up.
>> 
>> When each node calculates the ownership for the token ranges (for OOP and
>> BOP) it's based on the number of keys the node has in that range. As there
>> is no way for the OOP to understand the range of values the keys may take.
>> If you look at the 192 node it's showing ownership most with 192, 191 and
>> 190 - so i'm assuming RF3 and 192 also has data from the ranges owned by 191
>> and 190.
>> IMHO you can ignore this.
>> You can use load the the number of keys estimate from cfstats to get an idea
>> of whats happening.
>> Hope that helps.
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 19/08/2011, at 9:42 PM, Thibaut Britz wrote:
>> 
>> Hi,
>> 
>> we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
>> production and wanted to upgrade to 0.8.4.
>> 
>> Our cluster was well balanced and we only saved keys with a lower case
>> md5 prefix. (Orderpreserving partitioner).
>> Each node owned 20% of the tokens, which was also displayed on each
>> node in nodetool -h localhost ring.
>> 
>> After upgrading, our well balanced cluster shows completely wrong
>> percentage on who owns which keys:
>> 
>> *.*.*.190:
>> Address         DC          Rack        Status State   Load
>> Owns    Token
>> 
>>        ffffffffffffffff
>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>> 34.57%  2a
>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>> 0.02%   55
>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>> 0.02%   80
>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>> 0.02%   aa
>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>> 65.36%  ffffffffffffffff
>> 
>> *.*.*.191:
>> Address         DC          Rack        Status State   Load
>> Owns    Token
>> 
>>        ffffffffffffffff
>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>> 36.46%  2a
>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>> 26.02%  55
>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>> 0.02%   80
>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>> 0.02%   aa
>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>> 37.48%  ffffffffffffffff
>> 
>> *.*.*.192:
>> Address         DC          Rack        Status State   Load
>> Owns    Token
>> 
>>        ffffffffffffffff
>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>> 38.16%  2a
>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>> 27.61%  55
>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>> 34.17%  80
>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>> 0.02%   aa
>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>> 0.02%   ffffffffffffffff
>> 
>> *.*.*.194:
>> Address         DC          Rack        Status State   Load
>> Owns    Token
>> 
>>        ffffffffffffffff
>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>> 0.03%   2a
>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>> 31.43%  55
>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>> 39.69%  80
>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>> 28.82%  aa
>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>> 0.03%   ffffffffffffffff
>> 
>> *.*.*.196:
>> Address         DC          Rack        Status State   Load
>> Owns    Token
>> 
>>        ffffffffffffffff
>> *.*.*.190   datacenter1 rack1       Up     Normal  87.95 GB
>> 0.02%   2a
>> *.*.*.191   datacenter1 rack1       Up     Normal  84.3 GB
>> 0.02%   55
>> *.*.*.192   datacenter1 rack1       Up     Normal  79.46 GB
>> 0.02%   80
>> *.*.*.194   datacenter1 rack1       Up     Normal  68.16 GB
>> 27.52%  aa
>> *.*.*.196   datacenter1 rack1       Up     Normal  79.9 GB
>> 72.42%  ffffffffffffffff
>> 
>> 
>> Interestingly, each server shows something completely different.
>> 
>> Removing the locationInfo files didn't help.
>> -Dcassandra.load_ring_state=false didn't help as well.
>> 
>> Our cassandra.yaml is at http://pastebin.com/pCVCt3RM
>> 
>> Any idea on what might cause this? Is it save to suspect that
>> operating under this distribution will cause severe data loss? Or can
>> I safely ignore this?
>> 
>> Thanks,
>> Thibaut
>> 
>> 


Mime
View raw message