cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address
Date Mon, 27 Jan 2014 14:18:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882833#comment-13882833
] 

Brandon Williams commented on CASSANDRA-6622:
---------------------------------------------

If the node is already marked dead, the FailureDetector isn't going to convict() again just
because it receives a dead state like hibernate.  It will call onDead again for subscribers,
but StreamSession doesn't care about that.  What it does care about, however, is onRestart
being called since there was a generation change, and that will fail the session.

That said, certainly delaying the stream until gossip has propagated should solve the issue,
though I'm not sure streaming should be failing based on gossip/FD events (/cc [~yukim]).
 However, instead of sleeping for BROADCAST_INTERVAL we can save half the time and sleep for
RING_DELAY, since if gossip hasn't propagated fully by then there are bigger problems.  WDYT
[~thobbs]?

> Streaming session failures during node replace using replace_address
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-6622
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: RHEL6, cassandra-2.0.4
>            Reporter: Ravi Prasad
>         Attachments: 6622-2.0.txt
>
>
> When using replace_address, Gossiper ApplicationState is set to hibernate, which is a
down state. We are seeing that the peer nodes are seeing streaming plan request even before
the Gossiper on them marks the replacing node as dead. As a result, streaming on peer nodes
convicts the replacing node by closing the stream handler.  
> I think, making the StorageService thread on the replacing node, sleep for BROADCAST_INTERVAL
before bootstrapping, would avoid this scenario.
> Relevant logs from peer node (see that the Gossiper on peer node mark the replacing node
as down, 2 secs after  the streaming init request):
>  INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 StreamResultFuture.java (line
116) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap
> ....
>  INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 181) [Stream
#5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete
>  WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 210) [Stream
#5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed
>  INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) InetAddress /x.x.x.x
is now DOWN
> ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 410) [Stream
#5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred
> java.lang.RuntimeException: Outgoing stream handler has been closed
>         at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175)
>         at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
>         at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
>         at java.lang.Thread.run(Thread.java:722)
>  INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 181)
[Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete
>  WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 210)
[Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message