Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7272200B5B for ; Thu, 21 Jul 2016 22:31:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E5ACF160A72; Thu, 21 Jul 2016 20:31:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 39806160A73 for ; Thu, 21 Jul 2016 22:31:22 +0200 (CEST) Received: (qmail 19839 invoked by uid 500); 21 Jul 2016 20:31:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 19678 invoked by uid 99); 21 Jul 2016 20:31:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jul 2016 20:31:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C169F2C0D64 for ; Thu, 21 Jul 2016 20:31:20 +0000 (UTC) Date: Thu, 21 Jul 2016 20:31:20 +0000 (UTC) From: "Kaide Mu (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (CASSANDRA-12008) Make decommission operations resumable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 21 Jul 2016 20:31:23 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaide Mu reassigned CASSANDRA-12008: ------------------------------------ Assignee: Kaide Mu > Make decommission operations resumable > -------------------------------------- > > Key: CASSANDRA-12008 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12008 > Project: Cassandra > Issue Type: Improvement > Components: Streaming and Messaging > Reporter: Tom van der Woerdt > Assignee: Kaide Mu > Priority: Minor > > We're dealing with large data sets (multiple terabytes per node) and sometimes we need to add or remove nodes. These operations are very dependent on the entire cluster being up, so while we're joining a new node (which sometimes takes 6 hours or longer) a lot can go wrong and in a lot of cases something does. > It would be great if the ability to retry streams was implemented. > Example to illustrate the problem : > {code} > 03:18 PM ~ $ nodetool decommission > error: Stream failed > -- StackTrace -- > org.apache.cassandra.streaming.StreamException: Stream failed > at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210) > at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186) > at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430) > at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622) > at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486) > at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274) > at java.lang.Thread.run(Thread.java:745) > 08:04 PM ~ $ nodetool decommission > nodetool: Unsupported operation: Node in LEAVING state; wait for status to become normal or restart > See 'nodetool help' or 'nodetool help '. > {code} > Streaming failed, probably due to load : > {code} > ERROR [STREAM-IN-/] 2016-06-14 18:05:47,275 StreamSession.java:520 - [Stream #] Streaming error occurred > java.net.SocketTimeoutException: null > at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_77] > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_77] > at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_77] > at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54) ~[apache-cassandra-3.0.6.jar:3.0.6] > at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268) ~[apache-cassandra-3.0.6.jar:3.0.6] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] > {code} > If implementing retries is not possible, can we have a 'nodetool decommission resume'? -- This message was sent by Atlassian JIRA (v6.3.4#6332)