cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McGuire (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6053) system.peers table not updated after decommissioning nodes in C* 2.0
Date Wed, 18 Dec 2013 18:53:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851994#comment-13851994
] 

Ryan McGuire edited comment on CASSANDRA-6053 at 12/18/13 6:52 PM:
-------------------------------------------------------------------

OK, reproduced this by killing -9 one of the nodes and then doing a 'nodetool removenode':

{code}
01:20 PM:~$ kill -9 18961    (PID of node1)
01:21 PM:~$ ccm node1 status
Failed to connect to '127.0.0.1:7100': Connection refused
01:21 PM:~$ ccm node2 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns   Host ID                               Token             
                      Rack
DN  127.0.0.1  62.93 KB   20.0%  896644af-8640-4be6-a3ff-e8ed559d851c  -9223372036854775808
                    rack1
UN  127.0.0.2  51.17 KB   20.0%  d3801466-d36d-428c-b4e5-05ff69fe36c0  -5534023222112865485
                    rack1
UN  127.0.0.3  62.78 KB   20.0%  cb36c3ad-df45-4f77-bff5-ca93c504ec08  -1844674407370955162
                    rack1
UN  127.0.0.4  51.17 KB   20.0%  89031a05-a3f6-4ac7-9d29-6caa0c609dbc  1844674407370955161
                     rack1
UN  127.0.0.5  51.27 KB   20.0%  4909d856-a86e-493a-a7d0-7570d71eb9d8  5534023222112865484
                     rack1

# Issue removenode on node3 :
01:21 PM:~$ ~/.ccm/t/node1/bin/nodetool -p 7300 removenode 896644af-8640-4be6-a3ff-e8ed559d851c

01:22 PM:~$ ccm node3 cqlsh
Connected to t at 127.0.0.3:9160.
[cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version
| rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 127.0.0.2 | datacenter1 | d3801466-d36d-428c-b4e5-05ff69fe36c0 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.2 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-5534023222112865485'}
 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'5534023222112865484'}
 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'1844674407370955161'}

(3 rows)

# Check node2 peers table:

01:23 PM:~$ ccm node2 cqlsh
Connected to t at 127.0.0.2:9160.
[cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version
| rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 127.0.0.3 | datacenter1 | cb36c3ad-df45-4f77-bff5-ca93c504ec08 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.3 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-1844674407370955162'}
 127.0.0.1 |        null | 896644af-8640-4be6-a3ff-e8ed559d851c |         null |  null | 
          null |   127.0.0.1 |                                 null |                    
null
 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'5534023222112865484'}
 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'1844674407370955161'}

(4 rows)

# oh noes!... node2 still has an entry for node1 in peers table.

01:23 PM:~$ ccm node2 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns   Host ID                               Token             
                      Rack
UN  127.0.0.2  51.17 KB   40.0%  d3801466-d36d-428c-b4e5-05ff69fe36c0  -5534023222112865485
                    rack1
UN  127.0.0.3  62.78 KB   20.0%  cb36c3ad-df45-4f77-bff5-ca93c504ec08  -1844674407370955162
                    rack1
UN  127.0.0.4  51.17 KB   20.0%  89031a05-a3f6-4ac7-9d29-6caa0c609dbc  1844674407370955161
                     rack1
UN  127.0.0.5  51.27 KB   20.0%  4909d856-a86e-493a-a7d0-7570d71eb9d8  5534023222112865484
                     rack1

{code}

By issuing the removenode on node3, node3 seems to know about the node being removed and it's
peers table is correct. node2, although it's status output shows node1 going away, it's peers
table has not been updated.


was (Author: enigmacurry):
OK, reproduced this by killing -9 one of the nodes and then doing a 'nodetool removenode':

{code}
01:20 PM:~$ kill -9 18961    (PID of node1)
01:21 PM:~$ ccm node1 status
Failed to connect to '127.0.0.1:7100': Connection refused
01:21 PM:~$ ccm node2 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns   Host ID                               Token             
                      Rack
DN  127.0.0.1  62.93 KB   20.0%  896644af-8640-4be6-a3ff-e8ed559d851c  -9223372036854775808
                    rack1
UN  127.0.0.2  51.17 KB   20.0%  d3801466-d36d-428c-b4e5-05ff69fe36c0  -5534023222112865485
                    rack1
UN  127.0.0.3  62.78 KB   20.0%  cb36c3ad-df45-4f77-bff5-ca93c504ec08  -1844674407370955162
                    rack1
UN  127.0.0.4  51.17 KB   20.0%  89031a05-a3f6-4ac7-9d29-6caa0c609dbc  1844674407370955161
                     rack1
UN  127.0.0.5  51.27 KB   20.0%  4909d856-a86e-493a-a7d0-7570d71eb9d8  5534023222112865484
                     rack1

# Issue removenode on node3 :
01:21 PM:~$ ~/.ccm/t/node1/bin/nodetool -p 7300 removenode 896644af-8640-4be6-a3ff-e8ed559d851c

01:22 PM:~$ ccm node3 cqlsh
Connected to t at 127.0.0.3:9160.
[cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version
| rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 127.0.0.2 | datacenter1 | d3801466-d36d-428c-b4e5-05ff69fe36c0 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.2 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-5534023222112865485'}
 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'5534023222112865484'}
 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'1844674407370955161'}

(3 rows)

# Check node2 peers table:

01:23 PM:~$ ccm node2 cqlsh
Connected to t at 127.0.0.2:9160.
[cqlsh 4.1.0 | Cassandra 2.0.3-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select * from system.peers;

 peer      | data_center | host_id                              | preferred_ip | rack  | release_version
| rpc_address | schema_version                       | tokens
-----------+-------------+--------------------------------------+--------------+-------+-----------------+-------------+--------------------------------------+--------------------------
 127.0.0.3 | datacenter1 | cb36c3ad-df45-4f77-bff5-ca93c504ec08 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.3 | d133398f-f287-3674-83af-a1b04ee29f1f | {'-1844674407370955162'}
 127.0.0.1 |        null | 896644af-8640-4be6-a3ff-e8ed559d851c |         null |  null | 
          null |   127.0.0.1 |                                 null |                    
null
 127.0.0.5 | datacenter1 | 4909d856-a86e-493a-a7d0-7570d71eb9d8 |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.5 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'5534023222112865484'}
 127.0.0.4 | datacenter1 | 89031a05-a3f6-4ac7-9d29-6caa0c609dbc |         null | rack1 | 
2.0.3-SNAPSHOT |   127.0.0.4 | d133398f-f287-3674-83af-a1b04ee29f1f |  {'1844674407370955161'}

(4 rows)

01:23 PM:~$ ccm node2 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns   Host ID                               Token             
                      Rack
UN  127.0.0.2  51.17 KB   40.0%  d3801466-d36d-428c-b4e5-05ff69fe36c0  -5534023222112865485
                    rack1
UN  127.0.0.3  62.78 KB   20.0%  cb36c3ad-df45-4f77-bff5-ca93c504ec08  -1844674407370955162
                    rack1
UN  127.0.0.4  51.17 KB   20.0%  89031a05-a3f6-4ac7-9d29-6caa0c609dbc  1844674407370955161
                     rack1
UN  127.0.0.5  51.27 KB   20.0%  4909d856-a86e-493a-a7d0-7570d71eb9d8  5534023222112865484
                     rack1

{code}

By issuing the removenode on node3, node3 seems to know about the node being removed and it's
peers table is correct. node2, although it's status output shows node1 going away, it's peers
table has not been updated.

> system.peers table not updated after decommissioning nodes in C* 2.0
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-6053
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6053
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Datastax AMI running EC2 m1.xlarge instances
>            Reporter: Guyon Moree
>            Assignee: Brandon Williams
>         Attachments: peers
>
>
> After decommissioning my cluster from 20 to 9 nodes using opscenter, I found all but
one of the nodes had incorrect system.peers tables.
> This became a problem (afaik) when using the python-driver, since this queries the peers
table to set up its connection pool. Resulting in very slow startup times, because of timeouts.
> The output of nodetool didn't seem to be affected. After removing the incorrect entries
from the peers tables, the connection issues seem to have disappeared for us. 
> Would like some feedback on if this was the right way to handle the issue or if I'm still
left with a broken cluster.
> Attached is the output of nodetool status, which shows the correct 9 nodes. Below that
the output of the system.peers tables on the individual nodes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message