cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
Date Tue, 11 Feb 2014 19:14:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898166#comment-13898166
] 

Brandon Williams commented on CASSANDRA-6590:
---------------------------------------------

Hmm, so I went to test v4 and this time the ring weirdness came back, so perhaps it's just
intermittent, but here's what it looks like.  There are three nodes, 10.208.8.123, 10.208.35.225,
and 10.208.8.63.  123 is the seed, all nodes were started at the same time, and 63 is blocked
from 225.  The log from 123 looks normal:

{noformat}
 INFO 19:04:26,900 Handshaking version with /10.208.8.63
 INFO 19:04:26,913 Node /10.208.8.63 is now part of the cluster
 INFO 19:04:26,923 Handshaking version with /10.208.8.63
 INFO 19:04:26,963 Node bw-1/10.208.8.123 state jump to normal
 INFO 19:04:27,004 Startup completed! Now serving reads.
 INFO 19:04:27,076 Waiting for gossip to settle before accepting client requests...
 INFO 19:04:27,091 Handshaking version with /10.208.35.225
 INFO 19:04:27,097 Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-jb-5,].
 5,846 bytes to 5,684 (~97% of original) in 250ms = 0.021683MB/s.  4 total partitions merged
to 1.  Partition merge counts were {4:1, }
 INFO 19:04:27,100 Node /10.208.35.225 is now part of the cluster
 INFO 19:04:27,102 Handshaking version with /10.208.35.225
 INFO 19:04:35,190 Starting listening for CQL clients on bw-1/10.208.8.123:9042...
 INFO 19:04:35,252 Using TFramedTransport with a max frame size of 15728640 bytes.
 INFO 19:04:35,253 Binding thrift service to bw-1/10.208.8.123:9160
 INFO 19:04:35,262 Using synchronous/threadpool thrift server on bw-1 : 9160
 INFO 19:04:35,262 Listening for thrift clients...
{noformat}

And it can see the other nodes in status, but can't tell their state:

{noformat}
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                          
    Rack
UN  10.208.8.123   40.9 KB    256     68.1%             fa02838d-c39b-4d44-90db-f21a359deb12
 rack1
?N  10.208.8.63    40.85 KB   256     63.0%             90e71b90-9b41-4482-9521-71ba479c964e
 rack1
?N  10.208.35.225  40.93 KB   256     68.9%             e2fe818d-5d6c-47f9-8015-4580254cb91f
 rack1
{noformat}


The other two nodes can't even see anything but themselves:

{noformat}
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                          
    Rack
UN  10.208.35.225  40.93 KB   256     100.0%            e2fe818d-5d6c-47f9-8015-4580254cb91f
 rack1

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID                            
  Rack
UN  10.208.8.63  40.85 KB   256     100.0%            90e71b90-9b41-4482-9521-71ba479c964e
 rack1
{noformat}

Even though both are fully connected to the seed:

{noformat}
tcp        0      0 10.208.8.123:57215      10.208.8.63:7000        ESTABLISHED 16517/java
     
tcp        0      0 10.208.8.123:7000       10.208.8.63:37973       ESTABLISHED 16517/java
     
tcp        0      0 10.208.8.123:7000       10.208.35.225:41926     ESTABLISHED 16517/java
     
tcp        0      0 10.208.8.123:59308      10.208.35.225:7000      ESTABLISHED 16517/java
{noformat}

I'm not sure what's going on here yet.

> Gossip does not heal after a temporary partition at startup
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-6590
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Vijay
>             Fix For: 2.0.6
>
>         Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt
>
>
> See CASSANDRA-6571 for background.  If a node is partitioned on startup when the echo
command is sent, but then the partition heals, the halves of the partition will never mark
each other up despite being able to communicate.  This stems from CASSANDRA-3533.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message