cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McGuire (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5178) Sometimes repair process doesn't work properly
Date Thu, 18 Apr 2013 17:31:54 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635389#comment-13635389
] 

Ryan McGuire commented on CASSANDRA-5178:
-----------------------------------------

I'm trying to simply this process a bit from what you've described, so far I have not been
able to reproduce this behaviour on 1.1.7. Here's my process so far:

Bring up 4 node cluster with two datacenters:
{code}
Address         DC          Rack        Status State   Load            Owns              
 Token                                       
                                                                                         
 85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  11.13 KB        50.00%            
 0                                           
192.168.1.145   dc2         r1          Up     Normal  11.1 KB         0.00%             
 100                                         
192.168.1.143   dc1         r1          Up     Normal  11.11 KB        50.00%            
 85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Up     Normal  11.1 KB         0.00%             
 85070591730234615865843651857942052964      
{code}

Manually shutdown dc2.
{code}
Address         DC          Rack        Status State   Load            Owns              
 Token                                       
                                                                                         
 85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  11.13 KB        50.00%            
 0                                           
192.168.1.145   dc2         r1          Down   Normal  15.53 KB        0.00%             
 100                                         
192.168.1.143   dc1         r1          Up     Normal  15.88 KB        50.00%            
 85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Down   Normal  15.53 KB        0.00%             
 85070591730234615865843651857942052964      
{code}

Create schema:
{code}
CREATE KEYSPACE ryan WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:dc1
= '2';
CREATE TABLE ryan.test (n int primary key, x int);
{code}

Create data to import:
{code}
seq 500000 | sed 's/$/,1/' | split -l 250000 - data_
{code}

Write the first data set to dc1:
{code}
COPY ryan.test FROM 'data_aa';
{code}

Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

Bring up dc2, then add it to the replication stategy:
{code}
ALTER KEYSPACE ryan WITH strategy_class = 'NetworkTopologyStrategy' AND strategy_options:dc1
= '2' AND strategy_options:dc2 = '2';
{code}

Verify dc2 has no data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 0
{code}

Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

Write the second data set to dc1 with local_quorum consistency:
{code}
COPY ryan.test FROM 'data_ab';
{code}

{code}
Address         DC          Rack        Status State   Load            Effective-Ownership
Token
                                                                                         
 85070591730234615865843651857942052964
192.168.1.141   dc1         r1          Up     Normal  12.39 MB        100.00%           
 0
192.168.1.145   dc2         r1          Up     Normal  6.33 MB         100.00%           
 100
192.168.1.143   dc1         r1          Up     Normal  12.72 MB        100.00%           
 85070591730234615865843651857942052864
192.168.1.133   dc2         r1          Up     Normal  6.33 MB         100.00%           
 85070591730234615865843651857942052964
{code}


Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 500000
{code}

Verify dc2 has only half the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

run repair from dc1:
{code}
nodetool repair
{code}
{code}
Address         DC          Rack        Status State   Load            Effective-Ownership
Token                                       
                                                                                         
 85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  27.12 MB        100.00%           
 0                                           
192.168.1.145   dc2         r1          Up     Normal  22.78 MB        100.00%           
 100                                         
192.168.1.143   dc1         r1          Up     Normal  12.72 MB        100.00%           
 85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Up     Normal  16.44 MB        100.00%           
 85070591730234615865843651857942052964 
{code}

Verify that dc2 has all the data:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 500000
{code}

I'll try adding more nodes and settings to try to approximate your setup.
                
> Sometimes repair process doesn't work properly
> ----------------------------------------------
>
>                 Key: CASSANDRA-5178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5178
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.7
>            Reporter: Vladimir Barinov
>            Assignee: Ryan McGuire
>            Priority: Minor
>
> Pre-conditions:
> 1. We have two separate datacenters called "DC1" and "DC2" correspondingly. Each of them
contains of 6 nodes.
> 2. DC2 is disabled.
> 3. Tokens for DC1 are calculated via https://raw.github.com/riptano/ComboAMI/2.2/tokentoolv2.py.
Tokens for DC2 are the same as for DC1 but they have an offset: +100. So for token 0 in DC1
we'll have token 100 in DC2 and so on.
> 4. We have a test data set (1 billion keys).
> *Steps to reproduce:*
> *Step 1:*
> Lets check current configuration.
> nodetool ring:                                                                      

> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  44,53 KB        33,33%    
         0                                           
>     <ip>     DC1         RAC1        Up     Normal  51,8 KB         33,33%    
         28356863910078205288614550619314017621      
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
         56713727820156410577229101238628035242      
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
         85070591730234615865843651857942052864      
>     <ip>     DC1         RAC1        Up     Normal  51,8 KB         33,33%    
         113427455640312821154458202477256070485     
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
         141784319550391026443072753096570088106  
> {noformat}   
> {quote}
> *Current schema:*
> {quote}
> {noformat}
>     create keyspace benchmarks
>       with placement_strategy = 'NetworkTopologyStrategy'
>       *and strategy_options = \{DC1 : 2};*
>     use benchmarks;
>     create column family test_family
>       with compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>       ... 
>       and compaction_strategy_options = \{'sstable_size_in_mb' : '20'}
>       and compression_options = \{'chunk_length_kb' : '32', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> {noformat}
> {quote}
> *STEP 2:*
> Write first part of test data set (500 000 of keys) to DC1 with LOCAL_QUORUM consistency
level.
> *STEP 3:*
> Update cassandra.yaml,cassandra-topology.properties with new IP's from DC2 and current
keyspace schema with *strategy_options = \{DC1 : 2, DC2 : 0};*
> *STEP 4:*
> Start all nodes from DC2.
> Check that nodes are started successfully:
> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  11,4 MB         33,33%    
         0                                           
>     <ip>     DC2         RAC2        Up     Normal  27,7 KB         0,00%     
         100                                         
>     <ip>     DC1         RAC1        Up     Normal  11,34 MB        33,33%    
         28356863910078205288614550619314017621      
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
         28356863910078205288614550619314017721      
>     <ip>     DC1         RAC1        Up     Normal  11,37 MB        33,33%    
         56713727820156410577229101238628035242      
>     <ip>     DC2         RAC2        Up     Normal  52,02 KB        0,00%     
         56713727820156410577229101238628035342      
>     <ip>     DC1         RAC1        Up     Normal  11,4 MB         33,33%    
         85070591730234615865843651857942052864      
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
         85070591730234615865843651857942052964      
>     <ip>     DC1         RAC1        Up     Normal  11,43 MB        33,33%    
         113427455640312821154458202477256070485     
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
         113427455640312821154458202477256070585     
>     <ip>     DC1         RAC1        Up     Normal  11,39 MB        33,33%    
         141784319550391026443072753096570088106     
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
         141784319550391026443072753096570088206     
> {noformat}
> {quote}
> *STEP 5:*
> Update keyspace schema with *strategy_options = \{DC1 : 2, DC2 : 2};*
> STEP 6:
> Write last 500 000 keys of the test data set to DC1 with *LOCAL_QUORUM* consistency level.

> STEP 7:
> Check that first part of the test data set (first 500 000 of keys) was written correct
to DC1.
> Check that last part of the test data set (last 500 000 of keys) was written correct
to both datacenters.
> STEP 8:
> Run *nodetool repair* on each node of DC2 and wait until it's completed.
> STEP 9:
> Current nodetool ring:
> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  21,45 MB        33,33%    
         0                                           
>     <ip>     DC2         RAC2        Up     Normal  23,5 MB         33,33%    
         100                                         
>     <ip>     DC1         RAC1        Up     Normal  20,67 MB        33,33%    
         28356863910078205288614550619314017621      
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
         28356863910078205288614550619314017721      
>     <ip>     DC1         RAC1        Up     Normal  21,18 MB        33,33%    
         56713727820156410577229101238628035242      
>     <ip>     DC2         RAC2        Up     Normal  23,5 MB         33,33%    
         56713727820156410577229101238628035342      
>     <ip>     DC1         RAC1        Up     Normal  23,5 MB         33,33%    
         85070591730234615865843651857942052864      
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
         85070591730234615865843651857942052964      
>     <ip>     DC1         RAC1        Up     Normal  21,44 MB        33,33%    
         113427455640312821154458202477256070485     
>     <ip>     DC2         RAC2        Up     Normal  23,46 MB        33,33%    
         113427455640312821154458202477256070585     
>     <ip>     DC1         RAC1        Up     Normal  20,53 MB        33,33%    
         141784319550391026443072753096570088106     
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
         141784319550391026443072753096570088206   
> {noformat}  
> {quote}
> Check that full test data set has been written to both datacenters.
> Resulit : 
> Full test data set was successfully written to DC1. *24448* of them are not present on
DC1.
> Repeating *nodetool repair* doesn’t help. 
> Result:
> It seems that problem is related with the process of identifying keys which must be repaired
when current datacenter already had some keys.
> If we start empty DC2 nodes after DC1 have got all 1 000 000  - *nodetool repair*  works
fine, without missing keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message