cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (CASSANDRA-12008) Allow retrying failed streams (or stop them from failing)
Date Thu, 16 Jun 2016 16:55:05 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paulo Motta updated CASSANDRA-12008:
------------------------------------
    Comment: was deleted

(was: In this specific case it seems the streaming failed due to low {{streaming_socket_timeout}}
value. We just found out our previous default of 1 hour was too low, and raised that to 24
hours on CASSANDRA-11840, on 3.0.7. Could you try increasing that and see if it helps with
failed decommissions?)

> Allow retrying failed streams (or stop them from failing)
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-12008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Lifecycle
>            Reporter: Tom van der Woerdt
>
> We're dealing with large data sets (multiple terabytes per node) and sometimes we need
to add or remove nodes. These operations are very dependent on the entire cluster being up,
so while we're joining a new node (which sometimes takes 6 hours or longer) a lot can go wrong
and in a lot of cases something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM   ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
>         at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>         at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>         at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>         at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>         at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>         at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>         at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
>         at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
>         at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
>         at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
>         at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
>         at java.lang.Thread.run(Thread.java:745)
> 08:04 PM   ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to become normal
or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 - [Stream
#<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_77]
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_77]
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
~[na:1.8.0_77]
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission resume'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message