cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Farzad Panahi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9630) Killing cassandra process results in unclosed connections
Date Thu, 28 Jul 2016 01:14:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396756#comment-15396756
] 

Farzad Panahi commented on CASSANDRA-9630:
------------------------------------------

I am experiencing similar issue. 

Cassandra version: 3.0.8
Environment: Amazon EC2

Error Case:
When I restart Cassandra service on a node, after the node comes up it sees some or all of
other nodes as DN even though other nodes see this node as UN. 

Here is the output of netstat and nodetool status for this error case:

1. right after stopping cassandra service on node 10.4.68.222:
{code}
--------------------------------------
ip-10-4-54-176
tcp        0      0 10.4.54.176:51268           10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.54.176:56135           10.4.68.222:7000            TIME_WAIT   
tcp        1      0 10.4.54.176:43697           10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.54.176:52372           10.4.68.222:7000            TIME_WAIT   
--------------------------------------
--------------------------------------
ip-10-4-54-177
tcp        0      0 10.4.54.177:56960           10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.54.177:54539           10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.54.177:32823           10.4.68.222:7000            TIME_WAIT   
tcp        1      0 10.4.54.177:48985           10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-68-222
tcp        0      0 10.4.68.222:7000            10.4.54.176:43697           FIN_WAIT2   
tcp        0      0 10.4.68.222:7000            10.4.54.177:48985           FIN_WAIT2   
tcp        0      0 10.4.68.222:7000            10.4.68.222:54419           TIME_WAIT   
tcp        0      0 10.4.68.222:7000            10.4.43.65:43197            FIN_WAIT2   
tcp        0      0 10.4.68.222:7000            10.4.68.221:44149           FIN_WAIT2   
tcp        0      0 10.4.68.222:7000            10.4.68.222:41302           TIME_WAIT   
tcp        0      0 10.4.68.222:7000            10.4.43.66:54321            FIN_WAIT2   
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp        0      0 10.4.68.221:49599           10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.68.221:55033           10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.68.221:51628           10.4.68.222:7000            TIME_WAIT   
tcp        1      0 10.4.68.221:44149           10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp        0      0 10.4.43.66:55930            10.4.68.222:7000            TIME_WAIT   
tcp        1      0 10.4.43.66:54321            10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.43.66:60968            10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.43.66:49087            10.4.68.222:7000            TIME_WAIT   
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp        1      0 10.4.43.65:43197            10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.43.65:36467            10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.43.65:53317            10.4.68.222:7000            TIME_WAIT   
tcp        0      0 10.4.43.65:54897            10.4.68.222:7000            TIME_WAIT   
--------------------------------------
{code}

2. a bit after stopping cassandra service on node 10.4.68.222:
{code}
--------------------------------------
ip-10-4-54-176
tcp        1      0 10.4.54.176:43697           10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-54-177
--------------------------------------
--------------------------------------
ip-10-4-68-222
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp        1      0 10.4.68.221:44149           10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp        1      0 10.4.43.66:54321            10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp        1      0 10.4.43.65:43197            10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
{code}

3. after starting cassandra service on node 10.4.68.222: 
{code}
--------------------------------------
ip-10-4-54-176
tcp        0      0 10.4.54.176:42460           10.4.68.222:7000            ESTABLISHED 
tcp        1 303403 10.4.54.176:43697           10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.54.176:42109           10.4.68.222:7000            ESTABLISHED 
--------------------------------------
--------------------------------------
ip-10-4-54-177
tcp        0      0 10.4.54.177:43687           10.4.68.222:7000            ESTABLISHED 
tcp        0      0 10.4.54.177:56107           10.4.68.222:7000            ESTABLISHED 
tcp        0      0 10.4.54.177:39426           10.4.68.222:7000            ESTABLISHED 
--------------------------------------
--------------------------------------
ip-10-4-68-222
tcp        0      0 10.4.68.222:7000            0.0.0.0:*                   LISTEN      
tcp        0      0 10.4.68.222:7000            10.4.54.176:42109           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.54.177:43687           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.54.176:42460           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.43.66:55168            ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.43.65:60239            ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.54.177:39426           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.43.65:43480            ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.68.221:54490           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.68.221:59771           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.54.177:56107           ESTABLISHED 
tcp        0      0 10.4.68.222:7000            10.4.43.66:55581            ESTABLISHED 
--------------------------------------
--------------------------------------
ip-10-4-68-221
tcp        0      0 10.4.68.221:54490           10.4.68.222:7000            ESTABLISHED 
tcp        0      0 10.4.68.221:59771           10.4.68.222:7000            ESTABLISHED 
tcp        1 304316 10.4.68.221:44149           10.4.68.222:7000            CLOSE_WAIT  
--------------------------------------
--------------------------------------
ip-10-4-43-66
tcp        1 322344 10.4.43.66:54321            10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.43.66:55581            10.4.68.222:7000            ESTABLISHED 
tcp        0      0 10.4.43.66:55168            10.4.68.222:7000            ESTABLISHED 
--------------------------------------
--------------------------------------
ip-10-4-43-65
tcp        1 376331 10.4.43.65:43197            10.4.68.222:7000            CLOSE_WAIT  
tcp        0      0 10.4.43.65:43480            10.4.68.222:7000            ESTABLISHED 
tcp        0      0 10.4.43.65:60239            10.4.68.222:7000            ESTABLISHED 
--------------------------------------
{code}

4. nodetool status on all nodes after starting cassandra service on node 10.4.68.222:
{code}
ip-10-4-54-176
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
UN  10.4.54.176  127.67 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
UN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
UN  10.4.43.66   141.94 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
UN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
--------------------------------------
--------------------------------------
ip-10-4-54-177
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
UN  10.4.54.176  127.63 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
UN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
UN  10.4.43.66   141.94 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
UN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
------------------------
--------------------------------------
ip-10-4-68-222
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
DN  10.4.54.176  127.63 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
DN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
DN  10.4.43.66   141.94 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
DN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
--------------------------------------
--------------------------------------
ip-10-4-68-221
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
UN  10.4.54.176  127.63 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
UN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
UN  10.4.43.66   141.94 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
UN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
--------------------------------------
--------------------------------------
ip-10-4-43-66
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
UN  10.4.54.176  127.67 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
UN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
UN  10.4.43.66   141.95 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
UN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
--------------------------------------
--------------------------------------
ip-10-4-43-65
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                       
       Rack
UN  10.4.54.176  127.67 GB  256          47.5%             7163bf77-2fef-4e33-81c1-0e61038dece1
 1b
UN  10.4.43.65   124.19 GB  256          46.2%             80265afb-8beb-4887-a696-fc9b75956894
 1a
UN  10.4.54.177  136.06 GB  256          50.7%             b9010e24-4e92-4212-8a17-65892ea9ff66
 1b
UN  10.4.43.66   141.94 GB  256          52.3%             b00fdf10-1075-4953-8a96-caf375221684
 1a
UN  10.4.68.221  137.12 GB  256          50.7%             37479ec3-7b6d-4537-975c-f9d95e92ee1d
 1d
UN  10.4.68.222  141.89 GB  256          52.7%             8df87657-c39b-405a-ba54-d60b577c1429
 1d
--------------------------------------
{code}


> Killing cassandra process results in unclosed connections
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9630
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata, Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Minor
>             Fix For: 3.x
>
>
> After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a cassandra
process (with SIGTERM), some other nodes maintained a connection with the killed node in the
CLOSE_WAIT state on port 7000 for about 5-20 minutes.
> So, when we started the killed node again, other nodes could not establish a handshake
because of the connections on the CLOSE_WAIT state, so they remained on the DOWN state to
each other until the initial connection expired.
> The problem did not happen if I ran a nodetool disablegossip before killing the node.
> I was able to fix this issue by reverting the CASSANDRA-8336 commits (including CASSANDRA-9238).
After reverting this, cassandra now closes connection correctly when killed with -TERM, but
leaves connections on CLOSE_WAIT state if I run nodetool disablethrift before killing the
nodes.
> I did not try to reproduce the problem in a clean environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message