cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6554) During upgrade from 1.2 -> 2.0, upgraded node sees other nodes as Down
Date Tue, 07 Jan 2014 20:49:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864672#comment-13864672
] 

Brandon Williams edited comment on CASSANDRA-6554 at 1/7/14 8:49 PM:
---------------------------------------------------------------------

The strange this is, it does mark the other non-upgraded nodes as up:

{noformat}
TRACE [GossipStage:1] 2014-01-07 20:07:53,456 GossipDigestAck2VerbHandler.java (line 38) Received
a GossipDigestAck2Message from /10.180.236.244
DEBUG [GossipStage:1] 2014-01-07 20:07:53,456 Gossiper.java (line 790) Clearing interval times
for /10.180.236.244 due to generation change
TRACE [GossipStage:1] 2014-01-07 20:07:53,457 FailureDetector.java (line 189) reporting /10.180.236.244
DEBUG [GossipStage:1] 2014-01-07 20:07:53,467 Gossiper.java (line 790) Clearing interval times
for /10.182.208.161 due to generation change
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 FailureDetector.java (line 189) reporting /10.182.208.161
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 932) /10.180.236.244local
generation 0, remote generation 1389124753
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 937) Updating heartbeat
state generation to 1389124753 from 0 for /10.180.236.244
 INFO [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 868) Node /10.180.236.244
has restarted, now UP
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 873) Adding endpoint state
for /10.180.236.244
TRACE [GossipStage:1] 2014-01-07 20:07:53,506 Gossiper.java (line 815) Sending a EchoMessage
to /10.180.236.244
 INFO [HANDSHAKE-/10.180.236.244] 2014-01-07 20:07:53,534 OutboundTcpConnection.java (line
386) Handshaking version with /10.180.236.244
TRACE [GossipStage:1] 2014-01-07 20:07:53,559 TokenSerializer.java (line 56) Reading token
of 8 bytes
 INFO [GossipStage:1] 2014-01-07 20:07:53,562 StorageService.java (line 1445) Node /10.180.236.244
state jump to normal
TRACE [GossipStage:1] 2014-01-07 20:07:53,570 Gossiper.java (line 932) /10.182.208.161local
generation 0, remote generation 1389124753
TRACE [GossipStage:1] 2014-01-07 20:07:53,571 Gossiper.java (line 937) Updating heartbeat
state generation to 1389124753 from 0 for /10.182.208.161
 INFO [GossipStage:1] 2014-01-07 20:07:53,571 Gossiper.java (line 868) Node /10.182.208.161
has restarted, now UP
{noformat}

And never marks them down after that.  Can you get a gms trace from one of the other nodes
too?


was (Author: brandon.williams):
The strange this is, it does mark the other non-upgraded nodes as up:

{{noformat}}
TRACE [GossipStage:1] 2014-01-07 20:07:53,456 GossipDigestAck2VerbHandler.java (line 38) Received
a GossipDigestAck2Message from /10.180.236.244
DEBUG [GossipStage:1] 2014-01-07 20:07:53,456 Gossiper.java (line 790) Clearing interval times
for /10.180.236.244 due to generation change
TRACE [GossipStage:1] 2014-01-07 20:07:53,457 FailureDetector.java (line 189) reporting /10.180.236.244
DEBUG [GossipStage:1] 2014-01-07 20:07:53,467 Gossiper.java (line 790) Clearing interval times
for /10.182.208.161 due to generation change
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 FailureDetector.java (line 189) reporting /10.182.208.161
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 932) /10.180.236.244local
generation 0, remote generation 1389124753
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 937) Updating heartbeat
state generation to 1389124753 from 0 for /10.180.236.244
 INFO [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 868) Node /10.180.236.244
has restarted, now UP
TRACE [GossipStage:1] 2014-01-07 20:07:53,468 Gossiper.java (line 873) Adding endpoint state
for /10.180.236.244
TRACE [GossipStage:1] 2014-01-07 20:07:53,506 Gossiper.java (line 815) Sending a EchoMessage
to /10.180.236.244
 INFO [HANDSHAKE-/10.180.236.244] 2014-01-07 20:07:53,534 OutboundTcpConnection.java (line
386) Handshaking version with /10.180.236.244
TRACE [GossipStage:1] 2014-01-07 20:07:53,559 TokenSerializer.java (line 56) Reading token
of 8 bytes
 INFO [GossipStage:1] 2014-01-07 20:07:53,562 StorageService.java (line 1445) Node /10.180.236.244
state jump to normal
TRACE [GossipStage:1] 2014-01-07 20:07:53,570 Gossiper.java (line 932) /10.182.208.161local
generation 0, remote generation 1389124753
TRACE [GossipStage:1] 2014-01-07 20:07:53,571 Gossiper.java (line 937) Updating heartbeat
state generation to 1389124753 from 0 for /10.182.208.161
 INFO [GossipStage:1] 2014-01-07 20:07:53,571 Gossiper.java (line 868) Node /10.182.208.161
has restarted, now UP
{{noformat}

And never marks them down after that.  Can you get a gms trace from one of the other nodes
too?

> During upgrade from 1.2 -> 2.0, upgraded node sees other nodes as Down
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-6554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6554
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: EC2 Ubuntu Precise 12.04
> Oracle JRE 1.7_25
> C* 1.2.13 upgrade to 2.0.4
>            Reporter: Michael Shuler
>         Attachments: 6554_trace_system.log
>
>
> During an upgrade from 1.2.13 to 2.0.3/2.0.4, the upgraded node sees the remaining nodes
of the cluster as Down.
> {code}
> automaton@ip-10-139-1-113:~$ nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Owns   Host ID                               Token   
                                Rack
> UN  10.139.1.113    98.94 MB   33.3%  33b1cd06-e17b-4332-8066-0c6c401e0cf3  -9223372036854775808
                    rack1
> DN  10.139.11.168   97.51 MB   33.3%  ec97c163-8f2d-4019-a3d1-55df5e4037d4  -3074457345618258603
                    rack1
> DN  10.238.221.115  97.34 MB   33.3%  73a76d3f-73ef-481d-b603-0833c0ff80c2  3074457345618258602
                     rack1
> automaton@ip-10-139-1-113:~$ nodetool gossipinfo
> /10.238.221.115
>   SEVERITY:0.0
>   RPC_ADDRESS:0.0.0.0
>   DC:datacenter1
>   RELEASE_VERSION:1.2.13
>   LOAD:1.02066255E8
>   STATUS:NORMAL,3074457345618258602
>   SCHEMA:8b351435-81ef-3a14-adf7-8555e2f19ecd
>   NET_VERSION:6
>   RACK:rack1
>   HOST_ID:73a76d3f-73ef-481d-b603-0833c0ff80c2
> /10.139.1.113
>   RPC_ADDRESS:0.0.0.0
>   SEVERITY:0.0
>   DC:datacenter1
>   RELEASE_VERSION:2.0.4
>   LOAD:1.03750451E8
>   STATUS:NORMAL,-9223372036854775808
>   SCHEMA:dfafb212-5b8f-31cb-a80b-2ba58fcef73d
>   NET_VERSION:7
>   RACK:rack1
>   HOST_ID:33b1cd06-e17b-4332-8066-0c6c401e0cf3
> /10.139.11.168
>   SEVERITY:0.0
>   RPC_ADDRESS:0.0.0.0
>   DC:datacenter1
>   RELEASE_VERSION:1.2.13
>   LOAD:1.02245066E8
>   STATUS:NORMAL,-3074457345618258603
>   SCHEMA:8b351435-81ef-3a14-adf7-8555e2f19ecd
>   NET_VERSION:6
>   RACK:rack1
>   HOST_ID:ec97c163-8f2d-4019-a3d1-55df5e4037d4
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message