Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A66AF200AC6 for ; Fri, 6 May 2016 16:37:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A54B1160A0C; Fri, 6 May 2016 14:37:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EE72D1608F8 for ; Fri, 6 May 2016 16:37:13 +0200 (CEST) Received: (qmail 70391 invoked by uid 500); 6 May 2016 14:37:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 70357 invoked by uid 99); 6 May 2016 14:37:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2016 14:37:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id DA92D2C1F64 for ; Fri, 6 May 2016 14:37:12 +0000 (UTC) Date: Fri, 6 May 2016 14:37:12 +0000 (UTC) From: "Paulo Motta (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8343) Secondary index creation causes moves/bootstraps to fail MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 06 May 2016 14:37:14 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274123#comment-15274123 ] Paulo Motta commented on CASSANDRA-8343: ---------------------------------------- while trying to reproduce this once more on 2.2-HEAD before submitting a final patch I noticed that bootstrap was not failing if secondary index creation takes longer than {{streaming_socket_timeout_in_ms}}, even though the stream session failed on sender side, which closes the socket, but completes successfully on the bootstrapping node. the strange thing is that while the socket was closed on the sender side after {{streaming_socket_timeout_in_ms}}, the receiver still sent the last {{complete}} message on the "closed" socket without failures. I'll see what's going on. > Secondary index creation causes moves/bootstraps to fail > -------------------------------------------------------- > > Key: CASSANDRA-8343 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8343 > Project: Cassandra > Issue Type: Bug > Reporter: Michael Frisch > Assignee: Paulo Motta > > Node moves/bootstraps are failing if the stream timeout is set to a value in which secondary index creation cannot complete. This happens because at the end of the very last stream the StreamInSession.closeIfFinished() function calls maybeBuildSecondaryIndexes on every column family. If the stream time + all CF's index creation takes longer than your stream timeout then the socket closes from the sender's side, the receiver of the stream tries to write to said socket because it's not null, an IOException is thrown but not caught in closeIfFinished(), the exception is caught somewhere and not logged, AbstractStreamSession.close() is never called, and the CountDownLatch is never decremented. This causes the move/bootstrap to continue forever until the node is restarted. > This problem of stream time + secondary index creation time exists on decommissioning/unbootstrap as well but since it's on the sending side the timeout triggers the onFailure() callback which does decrement the CountDownLatch leading to completion. > A cursory glance at the 2.0 code leads me to believe this problem would exist there as well. > Temporary workaround: set a really high/infinite stream timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)