Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72D8192A2 for ; Mon, 8 Dec 2014 19:15:14 +0000 (UTC) Received: (qmail 81384 invoked by uid 500); 8 Dec 2014 19:15:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81349 invoked by uid 500); 8 Dec 2014 19:15:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81339 invoked by uid 99); 8 Dec 2014 19:15:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2014 19:15:10 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of omrib@everything.me designates 209.85.213.43 as permitted sender) Received: from [209.85.213.43] (HELO mail-yh0-f43.google.com) (209.85.213.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2014 19:15:05 +0000 Received: by mail-yh0-f43.google.com with SMTP id z6so2598874yhz.2 for ; Mon, 08 Dec 2014 11:13:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=P6rZA7GWiciR7enAS4Aq45B39z1HfKtUQod2ZhesvVc=; b=MA822peTuYWZvINWKHj+wtAr7D/5UIx3vDk7Xlg4I28al+3grIszyJPbATnCksyTQh P3c4aG8Sf1GjKLv5IHLkqKwj/kZATBhOEGop1buRjNsJJc0lZbcUbgbhRHI12paHrL/d PBeqwU6T8HJtrzQnL1/T09dHKc1JAKPfs+S3T9xytxfnUF/T8T4p3Ij3HxvMtTj9zTRR SKg/Kv0G/pDi/GiezdBUb9SAu9oI254kReKqvGIl/GFaDpdRUzScubvRwFZiUTnpaMeU w5+c+uXLPCNMNljRg8AThVn7ZkJX1DJD3OywvunDW4WmRY/cMKuftBOlj/rZRRuGdBso 14Rw== X-Gm-Message-State: ALoCoQn6YHoZRzgM0n8Pdj3c9WfHEpWeBAfFVUQy5vK1i6YahDAX5Torg6+lNFYz3a7uk4varhVY MIME-Version: 1.0 X-Received: by 10.236.25.166 with SMTP id z26mr30897529yhz.69.1418066038618; Mon, 08 Dec 2014 11:13:58 -0800 (PST) Received: by 10.170.92.69 with HTTP; Mon, 8 Dec 2014 11:13:58 -0800 (PST) In-Reply-To: References: Date: Mon, 8 Dec 2014 21:13:58 +0200 Message-ID: Subject: Re: Cassandra 2.1.2 node stuck on joining the cluster From: Omri Bahumi To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Any chance you have something along the path that causes the connectivity issues? What's the network connectivity between this node and the other node? Can you try transferring a big file between the two servers? perhaps you have an MTU issue that causes TCP PMTU discovery fail. Can you send large pings between the servers? try pinging them from both sides with large packets (5000, 10000). On Mon, Dec 8, 2014 at 3:22 PM, Krzysztof Zarzycki wrote: > Hi Cassandra users, > > I'm trying but failing to join a new (well old, but wiped out/decomissioned) > node to an existing cluster. > > Currently I have a cluster that consists of 2 nodes and runs C* 2.1.2. I > start a third node with 2.1.2, it gets to joining state, it bootstraps, i.e. > streams some data as shown by nodetool netstats, but after some time, it > gets stuck. From that point nothing gets streamed, the new node stays in > joining state. I restarted node multiple times, each time it streamed more > data, but then got stuck again. > > Other facts: > > I don't see any errors in the log on any of the nodes. > The connectivity seems fine, I can ping, netcat to port 7000 all ways. > I have ~ 200 GB load per running node, replication 2, 16 tokens. > Load of a new node got to around 300GBs now. > > The bootstrapping process stops in the middle of streaming some table, > always after sending exactly 10MB of some SSTable, e.g.: > > $ nodetool netstats | grep -P -v "bytes\(100" Mode: NORMAL Bootstrap > e0abc160-7ca8-11e4-9bc2-cf6aed12690e /192.168.200.16 Sending 516 files, > 124933333900 bytes total > /home/data/cassandra/data/some_ks/page_view-2a2410103f4411e4a266db7096512b05/some_ks-page_view-ka-13890-Data.db > 10485760/167797071 bytes(6%) sent to idx:0/192.168.200.16 Read Repair > Statistics: Attempted: 2016371 Mismatch (Blocking): 0 Mismatch (Background): > 168721 Pool Name Active Pending Completed Commands n/a 0 55802918 Responses > n/a 0 425963 > > > I'm trying to join this node for several days and I don't know what to do > with it... I'll be grateful for any help! > > > Cheers, > > Krzysztof Zarzycki > >