Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44360107DF for ; Tue, 23 Dec 2014 08:31:16 +0000 (UTC) Received: (qmail 88147 invoked by uid 500); 23 Dec 2014 08:31:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88103 invoked by uid 500); 23 Dec 2014 08:31:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88093 invoked by uid 99); 23 Dec 2014 08:31:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Dec 2014 08:31:13 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of horky@avast.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Dec 2014 08:30:47 +0000 Received: by mail-wg0-f53.google.com with SMTP id l18so8397099wgh.40 for ; Tue, 23 Dec 2014 00:29:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=oYtECpFatUUcDpA62Ke1yWV4WyUigesolhxoPn7WzHQ=; b=DvXVHypkmwGFZJGS8tJBQlaR/HcnqfwNyc06g4KhRABSSRfophIpGRNSmpTge61s0c lCFMpROtjzCk5Md8hBDA2f+sc9ToeKHo7Y36+UhnYMiqWD7NlrqIwD6JC4PcXodXmtoI FZ6QYV6z4H9SbG3zh2Pixz8xoLbcmXNPes2UgbXFgb9Y5Wk4c3HLTODvnkPouvkH/dM4 v5ppJbtwLypp6KdbaIc4NIBg5bgAJMQls+KSGg1SIB6jTOcxm4F+9pZM7glP5Y37HHxA v+cqyugLzG0ZGeavzV1L4eph1UO7OrSuunRDDRwkIF5rfIoW97Kh09OFEK86dGa0ZTd2 J0+g== X-Gm-Message-State: ALoCoQlNTGwhMyMBs5qMFfMpw2+i0BvNAZv3gEVNohTtWtNAxral4ubOWa6mgF3G+eERuIZxJ38+ X-Received: by 10.194.78.204 with SMTP id d12mr49829527wjx.37.1419323356183; Tue, 23 Dec 2014 00:29:16 -0800 (PST) Received: from ?IPv6:2001:67c:284:32:3056:f62a:4b71:8400? ([2001:67c:284:32:3056:f62a:4b71:8400]) by mx.google.com with ESMTPSA id wb9sm16310090wic.20.2014.12.23.00.29.15 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Dec 2014 00:29:15 -0800 (PST) Message-ID: <549927DA.9010205@avast.com> Date: Tue, 23 Dec 2014 09:29:14 +0100 From: Jiri Horky User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: FF Systems Subject: Re: Node down during move References: <54947A90.5040406@avast.com> In-Reply-To: <54947A90.5040406@avast.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, just a follow up. We've seen this behavior multiple times now. It seems that the receiving node loses connectivity to the cluster and thus thinks that it is the sole online node, whereas the rest of the cluster thinks that it is the only offline node, really just after the streaming is over. I am not sure what causes that, but it is reproducible. Restart of the affected node helps. We have 3 datacenters (RF=1 for each datacenter) where we are moving the tokens. This happens only in one of them. Regards Jiri Horky On 12/19/2014 08:20 PM, Jiri Horky wrote: > Hi list, > > we added a new node to existing 8-nodes cluster with C* 1.2.9 without > vnodes and because we are almost totally out of space, we are shuffling > the token fone node after another (not in parallel). During one of this > move operations, the receiving node died and thus the streaming failed: > > WARN [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,227 > StorageService.java (line 3703) Streaming to /X.Y.Z.18 failed > INFO [RMI TCP Connection(12940)-X.Y.Z.17] 2014-12-19 19:25:56,233 > ColumnFamilyStore.java (line 629) Enqueuing flush of > Memtable-local@433096244(70/70 serialized/live bytes, 2 ops) > INFO [FlushWriter:3772] 2014-12-19 19:25:56,238 Memtable.java (line > 461) Writing Memtable-local@433096244(70/70 serialized/live bytes, 2 ops) > ERROR [Streaming to /X.Y.Z.18:2] 2014-12-19 19:25:56,246 > CassandraDaemon.java (line 192) Exception in thread Thread[Streaming to > /X.Y.Z.18:2,5,RMI Runtime] > java.lang.RuntimeException: java.io.IOException: Broken pipe > at com.google.common.base.Throwables.propagate(Throwables.java:160) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > > After restart of the receiving node, we tried to perform the move again, > but it failed with: > > Exception in thread "main" java.io.IOException: target token > 113427455640312821154458202477256070486 is already owned by another node. > at > org.apache.cassandra.service.StorageService.move(StorageService.java:2930) > > So we tried to move it with a token just 1 higher, to trigger the > movement. This didn't move anything, but finished successfully: > > INFO [Thread-5520] 2014-12-19 20:00:24,689 StreamInSession.java (line > 199) Finished streaming session 4974f3c0-87b1-11e4-bf1b-97d9ac6bd256 > from /X.Y.Z.18 > > Now, it is quite improbable that the first streaming was done and it > died just after copying everything, as the ERROR was the last message > about streaming in the logs. Is there any way how to make sure the data > are really moved and thus running nodetool cleanup is safe? > > Thank you. > Jiri Hoky