incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Dulin <oleg.du...@gmail.com>
Subject Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Date Fri, 27 Sep 2013 10:35:34 GMT
Wanted to add one more thing:

I can also tell that the numbers are not consistent across DRs this way 
-- I have a column family with really wide rows (a couple million 
columns).

DC1 reports higher column counts than DC2. DC2 only becomes consistent 
after I do the command a couple of times and trigger a read-repair. But 
why would nodetool repair logs show that everything is in sync ?

Regards,
Oleg

On 2013-09-27 10:23:45 +0000, Oleg Dulin said:

> Consider this output from nodetool ring:
> 
> Address         DC          Rack        Status State   Load            
> Effective-Ownership Token
>                                                                         
>                    127605887595351923798765477786913079396
> dc1.5      DC1 	RAC1        Up     Normal  32.07 GB        50.00%       
>        0
> dc2.100    DC2 RAC1        Up     Normal  8.21 GB         50.00%        
>       100
> dc1.6      DC1 RAC1        Up     Normal  32.82 GB        50.00%        
>       42535295865117307932921825928971026432
> dc2.101    DC2 RAC1        Up     Normal  12.41 GB        50.00%        
>       42535295865117307932921825928971026532
> dc1.7      DC1 RAC1        Up     Normal  28.37 GB        50.00%        
>       85070591730234615865843651857942052864
> dc2.102    DC2 RAC1        Up     Normal  12.27 GB        50.00%        
>       85070591730234615865843651857942052964
> dc1.8      DC1 RAC1        Up     Normal  27.34 GB        50.00%        
>       127605887595351923798765477786913079296
> dc2.103    DC2 RAC1        Up     Normal  13.46 GB        50.00%        
>       127605887595351923798765477786913079396
> 
> I concealed IPs and DC names for confidentiality.
> 
> All of the data loading was happening against DC1 at a pretty brisk 
> rate, of, say, 200K writes per minute.
> 
> Note how my tokens are offset by 100. Shouldn't that mean that load on 
> each node should be roughly identical ? In DC1 it is roughly around 30 
> G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by 
> token range.
> 
> To verify that the nodes are in sync, I ran nodetool -h localhost 
> repair MyKeySpace --partitioner-range on each node in DC2. Watching the 
> logs, I see that the repair went really quick and all column families 
> are in sync!
> 
> I need help making sense of this. Is this because DC1 is not fully 
> compacted ? Is it because DC2 is not fully synced and I am not checking 
> correctly ? How can I tell that there is still replication going on in 
> progress (note, I started my load yesterday at 9:50am).


-- 
Regards,
Oleg Dulin
http://www.olegdulin.com



Mime
View raw message