cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kévin LOVATO (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
Date Mon, 03 Jun 2013 16:09:24 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245
] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:09 PM:
-----------------------------------------------------------------

*[EDIT] I didn't see your latests posts before posting, but I hope the extra data can help
anyway*

You were right to say that I need to run the repair -pr on the three nodes, because I only
have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in
charge of this key.
But I restarted my test and did the repair on all three nodes, and it didn't work either;
here's the output:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1]
finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}

{code}
user@cassandra12:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243]
finished
[2013-06-03 17:33:17,866] Repair command #1 finished
{code}

{code}
user@cassandra13:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485]
finished
[2013-06-03 17:33:29,712] Repair command #1 finished
{code}

The data is still not copied to the new datacenter, and I don't understand why the repair
is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as
you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter)
as you can see in the following nodetool status:

{code}
user@cassandra13:~$ nodetool status
Datacenter: dc1
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token        
                           Rac
UN  cassandra01     102 GB     33.3%  fa7672f5-77f0-4b41-b9d1-13bf63c39122  0            
                           RC1
UN  cassandra02     88.73 GB   33.3%  c799df22-0873-4a99-a901-5ef5b00b7b1e  56713727820156410577229101238628035242
  RC1
UN  cassandra03     50.86 GB   33.3%  5b9c6bc4-7ec7-417d-b92d-c5daa787201b  113427455640312821154458202477256070484
 RC1
Datacenter: dc2
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token        
                           Rac
UN  cassandra11     51.21 GB   0.0%   7b610455-3fd2-48a3-9315-895a4609be42  1            
                           RC2
UN  cassandra12     45.02 GB   0.0%   8553f2c0-851c-4af2-93ee-2854c96de45a  56713727820156410577229101238628035243
  RC2
UN  cassandra13     36.8 GB    0.0%   7f537660-9128-4c13-872a-6e026104f30e  113427455640312821154458202477256070485
 RC2
{code}

Furthermore the full repair works, as you can see in this log:

{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication
[2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1]
finished
[2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484]
finished
[2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242]
finished
[2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243]
finished
[2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485]
finished
[2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0]
finished
[2013-06-03 17:44:07,934] Repair command #5 finished
{code}

I hope this information can help, please let me know if you think it's a configuration issue,
in which case I would talk to the mailing list.
                
      was (Author: alprema):
    *[EDIT] I didn't see your latests posts before posting, but I hope the extra data can
help*

You were right to say that I need to run the repair -pr on the three nodes, because I only
have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in
charge of this key.
But I restarted my test and did the repair on all three nodes, and it didn't work either;
here's the output:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1]
finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}

{code}
user@cassandra12:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243]
finished
[2013-06-03 17:33:17,866] Repair command #1 finished
{code}

{code}
user@cassandra13:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication
[2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485]
finished
[2013-06-03 17:33:29,712] Repair command #1 finished
{code}

The data is still not copied to the new datacenter, and I don't understand why the repair
is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as
you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter)
as you can see in the following nodetool status:

{code}
user@cassandra13:~$ nodetool status
Datacenter: dc1
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token        
                           Rac
UN  cassandra01     102 GB     33.3%  fa7672f5-77f0-4b41-b9d1-13bf63c39122  0            
                           RC1
UN  cassandra02     88.73 GB   33.3%  c799df22-0873-4a99-a901-5ef5b00b7b1e  56713727820156410577229101238628035242
  RC1
UN  cassandra03     50.86 GB   33.3%  5b9c6bc4-7ec7-417d-b92d-c5daa787201b  113427455640312821154458202477256070484
 RC1
Datacenter: dc2
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token        
                           Rac
UN  cassandra11     51.21 GB   0.0%   7b610455-3fd2-48a3-9315-895a4609be42  1            
                           RC2
UN  cassandra12     45.02 GB   0.0%   8553f2c0-851c-4af2-93ee-2854c96de45a  56713727820156410577229101238628035243
  RC2
UN  cassandra13     36.8 GB    0.0%   7f537660-9128-4c13-872a-6e026104f30e  113427455640312821154458202477256070485
 RC2
{code}

Furthermore the full repair works, as you can see in this log:

{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication
[2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1]
finished
[2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484]
finished
[2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242]
finished
[2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243]
finished
[2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485]
finished
[2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0]
finished
[2013-06-03 17:44:07,934] Repair command #5 finished
{code}

I hope this information can help, please let me know if you think it's a configuration issue,
in which case I would talk to the mailing list.
                  
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in
all DC's
> ----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5424
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5424
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.7
>            Reporter: Jeremiah Jordan
>            Assignee: Yuki Morishita
>            Priority: Critical
>             Fix For: 1.2.5
>
>         Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt
>
>
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in
all DC's
> Commands follow, but the TL;DR of it, range (127605887595351923798765477786913079296,0]
doesn't get repaired between .38 node and .236 node until I run a repair, no -pr, on .38
> It seems like primary arnge calculation doesn't take schema into account, but deciding
who to ask for merkle tree's from does.
> {noformat}
> Address         DC          Rack        Status State   Load            Owns         
      Token                                       
>                                                                                     
      127605887595351923798765477786913079296     
> 10.72.111.225   Cassandra   rack1       Up     Normal  455.87 KB       25.00%       
      0                                           
> 10.2.29.38      Analytics   rack1       Up     Normal  40.74 MB        25.00%       
      42535295865117307932921825928971026432      
> 10.46.113.236   Analytics   rack1       Up     Normal  20.65 MB        50.00%       
      127605887595351923798765477786913079296     
> create keyspace Keyspace1
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {Analytics : 2}
>   and durable_writes = true;
> -------
> # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1
> [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for keyspace
Keyspace1
> [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2-0000-8b5bf6ebea9e for range
(0,42535295865117307932921825928971026432] finished
> [2013-04-03 15:47:00,881] Repair command #1 finished
> root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java (line 676)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] new session: will sync a1/10.2.29.38, /10.46.113.236
on range (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java (line 881)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] requesting merkle trees for Standard1 (to [/10.46.113.236,
a1/10.2.29.38])
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java (line 211)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Received merkle tree for Standard1 from /10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java (line 211)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Received merkle tree for Standard1 from a1/10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java (line 1015)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Endpoints /10.46.113.236 and a1/10.2.29.38
are consistent for Standard1
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java (line 788)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Standard1 is fully synced
>  INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java (line 722)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] session completed successfully
> root@ip-10-46-113-236:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e /var/log/cassandra/system.log
>  INFO [AntiEntropyStage:1] 2013-04-03 15:46:59,944 AntiEntropyService.java (line 244)
[repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Sending completed merkle tree to /10.2.29.38
for (Keyspace1,Standard1)
> root@ip-10-72-111-225:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e /var/log/cassandra/system.log
> root@ip-10-72-111-225:/home/ubuntu# 
> -------
> # nodetool -h 10.46.113.236  repair -pr Keyspace1 Standard1
> [2013-04-03 15:48:00,274] Starting repair command #1, repairing 1 ranges for keyspace
Keyspace1
> [2013-04-03 15:48:02,032] Repair session dcb91540-9c75-11e2-0000-a839ee2ccbef for range
(42535295865117307932921825928971026432,127605887595351923798765477786913079296] finished
> [2013-04-03 15:48:02,033] Repair command #1 finished
> root@ip-10-46-113-236:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:5] 2013-04-03 15:48:00,280 AntiEntropyService.java (line 676)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] new session: will sync a0/10.46.113.236, /10.2.29.38
on range (42535295865117307932921825928971026432,127605887595351923798765477786913079296]
for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:5] 2013-04-03 15:48:00,285 AntiEntropyService.java (line 881)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] requesting merkle trees for Standard1 (to [/10.2.29.38,
a0/10.46.113.236])
>  INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,710 AntiEntropyService.java (line 211)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Received merkle tree for Standard1 from a0/10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,943 AntiEntropyService.java (line 211)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Received merkle tree for Standard1 from /10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 15:48:02,031 AntiEntropyService.java (line 1015)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Endpoints a0/10.46.113.236 and /10.2.29.38
are consistent for Standard1
>  INFO [AntiEntropyStage:1] 2013-04-03 15:48:02,032 AntiEntropyService.java (line 788)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Standard1 is fully synced
>  INFO [AntiEntropySessions:5] 2013-04-03 15:48:02,032 AntiEntropyService.java (line 722)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] session completed successfully
> root@ip-10-2-29-38:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef /var/log/cassandra/system.log
>  INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,898 AntiEntropyService.java (line 244)
[repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Sending completed merkle tree to /10.46.113.236
for (Keyspace1,Standard1)
> root@ip-10-72-111-225:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef /var/log/cassandra/system.log
> root@ip-10-72-111-225:/home/ubuntu# 
> -------
> # nodetool -h 10.72.111.225  repair -pr Keyspace1 Standard1
> [2013-04-03 15:48:30,417] Starting repair command #1, repairing 1 ranges for keyspace
Keyspace1
> [2013-04-03 15:48:30,428] Repair session eeb12670-9c75-11e2-0000-316d6fba2dbf for range
(127605887595351923798765477786913079296,0] finished
> [2013-04-03 15:48:30,428] Repair command #1 finished
> root@ip-10-72-111-225:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:1] 2013-04-03 15:48:30,427 AntiEntropyService.java (line 676)
[repair #eeb12670-9c75-11e2-0000-316d6fba2dbf] new session: will sync /10.72.111.225 on range
(127605887595351923798765477786913079296,0] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:1] 2013-04-03 15:48:30,428 AntiEntropyService.java (line 681)
[repair #eeb12670-9c75-11e2-0000-316d6fba2dbf] No neighbors to repair with on range (127605887595351923798765477786913079296,0]:
session completed
> root@ip-10-46-113-236:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf /var/log/cassandra/system.log
> root@ip-10-46-113-236:/home/ubuntu# 
> root@ip-10-2-29-38:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf /var/log/cassandra/system.log
> root@ip-10-2-29-38:/home/ubuntu# 
> ---
> root@ip-10-2-29-38:/home/ubuntu# nodetool -h 10.2.29.38 repair Keyspace1 Standard1
> [2013-04-03 16:13:28,674] Starting repair command #2, repairing 3 ranges for keyspace
Keyspace1
> [2013-04-03 16:13:31,786] Repair session 6bb81c20-9c79-11e2-0000-8b5bf6ebea9e for range
(42535295865117307932921825928971026432,127605887595351923798765477786913079296] finished
> [2013-04-03 16:13:31,786] Repair session 6cb05ed0-9c79-11e2-0000-8b5bf6ebea9e for range
(0,42535295865117307932921825928971026432] finished
> [2013-04-03 16:13:31,806] Repair session 6d24a470-9c79-11e2-0000-8b5bf6ebea9e for range
(127605887595351923798765477786913079296,0] finished
> [2013-04-03 16:13:31,807] Repair command #2 finished
> root@ip-10-2-29-38:/home/ubuntu# grep 6d24a470-9c79-11e2-0000-8b5bf6ebea9e /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,065 AntiEntropyService.java (line 676)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] new session: will sync a1/10.2.29.38, /10.46.113.236
on range (127605887595351923798765477786913079296,0] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,065 AntiEntropyService.java (line 881)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] requesting merkle trees for Standard1 (to [/10.46.113.236,
a1/10.2.29.38])
>  INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,751 AntiEntropyService.java (line 211)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Received merkle tree for Standard1 from /10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,785 AntiEntropyService.java (line 211)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Received merkle tree for Standard1 from a1/10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,805 AntiEntropyService.java (line 1015)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Endpoints /10.46.113.236 and a1/10.2.29.38
are consistent for Standard1
>  INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,806 AntiEntropyService.java (line 788)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Standard1 is fully synced
>  INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,806 AntiEntropyService.java (line 722)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] session completed successfully
> root@ip-10-46-113-236:/home/ubuntu# grep 6d24a470-9c79-11e2-0000-8b5bf6ebea9e /var/log/cassandra/system.log

>  INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,665 AntiEntropyService.java (line 244)
[repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Sending completed merkle tree to /10.2.29.38
for (Keyspace1,Standard1)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message