cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12008) Make decommission operations resumable
Date Mon, 01 Aug 2016 16:52:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402388#comment-15402388
] 

Paulo Motta commented on CASSANDRA-12008:
-----------------------------------------

Thanks for the update. This is looking better and we're nearly done, see follow up below:
* Code
** Fix indentation of {{logger.debug("DECOMMISSIONING")}} 
** The {{isDecommissioning.get()}} should use a {{compareAndSet}} to avoid starting simultaneous
decommision sessions. See the {{isRebuilding}} check. Also, add a test to verify it's not
possible to start multiple decommission simultaneously based on the solution on CASSANDRA-11687
to avoid test flakiness.
** on {{SessionCompleteEvent}} use {{Collections.unmodifiableMap}} when copying the {{transferredRangesPerKeyspace}}
map to avoid modifications to the ma
** In order to avoid allocating a {{HashSet}} when it's not necessary, change this {noformat}
            Set<Range<Token>> toBeUpdated = new HashSet<>();
            if (transferredRangesPerKeyspace.containsKey(keyspace))
            {
                toBeUpdated = transferredRangesPerKeyspace.get(keyspace);
            }
{noformat} with this: {noformat}
            Set<Range<Token>> toBeUpdated = transferredRangesPerKeyspace.get(keyspace)
            if (toBeUpdated == null)
            {
                toBeUpdated = new HashSet<>();
            }
{noformat}
** {{Error while decommissioning node}} is never printed  because the {{ExecutionException}}
is being wrapped in a {{RuntimeException}} on {{unbootstrap}}, so perhaps you can modify {{unbootstrap}}
to throw {{ExecutionException | InterruptedException}} and catch that on {{decomission}} to
wrap in {{RuntimeException}}.

* dtests
** Simply running {{stress read}} will not fail if the keys are not there, you need to either
compare the retrieved keys or check that there was no failure on the stress process (see {{bootstrap_test}}
for examples).
** When verifying if the retrieved data is correct on {{resumable_decommission_test}}, you
need to stop either node1 or node3 when querying the other otherwise the data may be in only
one of these nodes (while it must be in both nodes, since RF=2 and N=2).
** Perhaps reduce the number of keys to 10k so the test will be faster.
** On {{resumable_decommission_test}} set {{stream_throughput_outbound_megabits_per_sec}}
to {{1}} to the streaming will be slower and allow more time for interrupting.
** Perhaps it's better for {{InterruptDecommission}} to watch on {{rebuild from dc}} since
this is print before {{"Executing streaming plan for Unbootstrap"}}
** Instead of counting for {{decommission_error}} you can add a {{self.fail("second rebuild
should fail")}} after {{node2.nodetool('decommission')}} and on the {{except}} part perhaps
check that the following message is being print on logs {{Error while decommissioning node}}
- see new version of {{simple_rebuild_test}} from CASSANDRA-11687.
** bq. I found that streamed range skipping behaviour log check-up is not working
*** This is probably because the {{Range (-2556370087840976503,-2548250017122308073] already
in /127.0.0.3, skipping}} message is only being print on {{debug.log}} so you should pass
a {{filename='debug.log'}} to {{watch_log_for}}.

When you modify {{StreamStateStore}} to {{updateStreamedRanges}} for requested ranges (ie.
bootstrap), there could be a collision between received and transferred ranges for the same
peer. While this collision will not show up in decommission, bootstrap and rebuild, since
we only transfer in one direction, this may be confusing and source of problems in the future,
so in order to avoid creating another table to support that in the future, I think we can
modify {{streamed_ranges}} to include an {{outgoing}} boolean primary key field indicating
if it's an incoming or outgoing transfer. WDYT [~yukim] [~kdmu]?

> Make decommission operations resumable
> --------------------------------------
>
>                 Key: CASSANDRA-12008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Tom van der Woerdt
>            Assignee: Kaide Mu
>            Priority: Minor
>
> We're dealing with large data sets (multiple terabytes per node) and sometimes we need
to add or remove nodes. These operations are very dependent on the entire cluster being up,
so while we're joining a new node (which sometimes takes 6 hours or longer) a lot can go wrong
and in a lot of cases something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM   ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
>         at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>         at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>         at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>         at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>         at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>         at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>         at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
>         at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
>         at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
>         at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
>         at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
>         at java.lang.Thread.run(Thread.java:745)
> 08:04 PM   ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to become normal
or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 - [Stream
#<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_77]
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_77]
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
~[na:1.8.0_77]
>         at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
~[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission resume'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message