cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12008) Make decommission operations resumable
Date Sat, 30 Jul 2016 01:09:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400338#comment-15400338
] 

Paulo Motta commented on CASSANDRA-12008:
-----------------------------------------

Thanks for the update! See follow-up below:

bq. I've added a new strategy, please let me know what do you think about it.

Instead of building a {{streamedRangesPerEndpoints Map<InetAddress, Set<Range<Token>>}}
manually, maybe it's better to modify {{getStreamedRanges}} to take a description and keyspace
as argument an return a {{Map<InetAddress, Set<Range<Token>>}} by querying
{{"SELECT * FROM system.streamed_ranges WHERE operation = ? AND keyspace_name = ?"}}.

This way you can query {{getStreamedRanges}} directly to filter already transferred ranges
when iterating {{rangesToStreamByKeyspace}}.

bq. So instead we will obtain StreamSession from StreamTransferTask.getSession() when each
StreamTransferTask is complete i.e when StreamStateStore.handleStreamEvent is invoked. All
these means that we are going to only pass its responsible keyspace.

I think we can simplify that and instead of adding {{transferTasks}} to {{SessionCompleteEvent}}
we can simply add the session description and {{transferredRangesPerKeyspace}}, and that's
all we will need to populate the streamed ranges on {{StreamStateStore}}.

A minor nit is that the transferred ranges are always being overriden on {{addTransferRanges}}
while you should append to the existing set if it's already present on {{transferredRangesPerKeyspace}}.

bq. Don't know if there's some problem with current implementation or there's something weird
in the set-up, but it skips twice the same range:

this is for different keyspaces, so you should add the keyspace name in the log message so
it's not confusing.

bq. I think it's the set-up itself since StorageService.getChangedRangesForLeaving is also
returning the same range twice

that's probably for the same reason as above, maybe it would be a good idea to add the keyspace
name in that log as well.

> Make decommission operations resumable
> --------------------------------------
>
>                 Key: CASSANDRA-12008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Tom van der Woerdt
>            Assignee: Kaide Mu
>            Priority: Minor
>
> We're dealing with large data sets (multiple terabytes per node) and sometimes we need
to add or remove nodes. These operations are very dependent on the entire cluster being up,
so while we're joining a new node (which sometimes takes 6 hours or longer) a lot can go wrong
and in a lot of cases something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM   ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
>         at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>         at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>         at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>         at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>         at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>         at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>         at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
>         at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
>         at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
>         at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
>         at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
>         at java.lang.Thread.run(Thread.java:745)
> 08:04 PM   ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to become normal
or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 - [Stream
#<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_77]
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_77]
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
~[na:1.8.0_77]
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission resume'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message