Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58EADE803 for ; Tue, 19 Feb 2013 16:32:29 +0000 (UTC) Received: (qmail 7767 invoked by uid 500); 19 Feb 2013 16:32:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 7723 invoked by uid 500); 19 Feb 2013 16:32:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7712 invoked by uid 99); 19 Feb 2013 16:32:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Feb 2013 16:32:26 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of btv1==7625952efa8==mkjellman@barracuda.com designates 64.235.145.82 as permitted sender) Received: from [64.235.145.82] (HELO bsf02.barracuda.com) (64.235.145.82) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Feb 2013 16:32:21 +0000 X-ASG-Debug-ID: 1361291517-03e1a547d21b55e40004-f7dORa Received: from bn-scl-fe05.Cudanet.local (bn-scl-fe05.cudanet.local [10.8.1.46]) by bsf02.barracuda.com with ESMTP id neUA3NuTm2BFCKCY (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO) for ; Tue, 19 Feb 2013 08:32:00 -0800 (PST) X-Barracuda-Envelope-From: mkjellman@barracuda.com Received: from bn-scl-be03.Cudanet.local (10.8.1.54) by bn-scl-fe05.Cudanet.local (10.8.1.46) with Microsoft SMTP Server (TLS) id 8.3.279.1; Tue, 19 Feb 2013 08:29:48 -0800 Received: from bn-scl-be03.Cudanet.local ([::1]) by bn-scl-be03.Cudanet.local ([::1]) with mapi; Tue, 19 Feb 2013 08:29:48 -0800 From: Michael Kjellman X-Barracuda-Apparent-Source-IP: ::1 To: "user@cassandra.apache.org" Date: Tue, 19 Feb 2013 08:29:49 -0800 Subject: Re: Long running nodetool repair Thread-Topic: Long running nodetool repair X-ASG-Orig-Subj: Re: Long running nodetool repair Thread-Index: Ac4OvlU4VawG7KBeSUmU8EwdtbcpTQ== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.0.121105 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CD48E78B11DC7mkjellmanbarracudacom_" MIME-Version: 1.0 X-Barracuda-Connect: bn-scl-fe05.cudanet.local[10.8.1.46] X-Barracuda-Start-Time: 1361291520 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://bsf02.barracuda.com:8000/cgi-mod/mark.cgi Received-SPF: softfail (barracuda.com: domain of transitioning mkjellman@barracuda.com does not designate ::1 as permitted sender) X-Barracuda-BRTS-Status: 1 X-Virus-Scanned: by bsmtpd at barracuda.com X-Barracuda-Spam-Score: 0.02 X-Barracuda-Spam-Status: No, SCORE=0.02 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=BSF_SPF_SOFTFAIL, HTML_MESSAGE, THREAD_INDEX, THREAD_TOPIC X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.123102 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 THREAD_INDEX thread-index: AcO7Y8iR61tzADqsRmmc5wNiFHEOig== 0.01 THREAD_TOPIC Thread-Topic: ...(Japanese Subject)... 0.00 HTML_MESSAGE BODY: HTML included in message 0.00 BSF_SPF_SOFTFAIL Custom Rule SPF Softfail X-Virus-Checked: Checked by ClamAV on apache.org --_000_CD48E78B11DC7mkjellmanbarracudacom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable This is very normal (unfortunately). Are you doing a repair =96pr or a stra= ight up repair? Does nodetool netstats show anything? I frequently see repair hang in 1.2.1= , and I haven't been able to figure out why yet though. Feel free to take a= stack dump with jstack on the node doing the repair and see if there are a= ny deadlocks potentially occurring after the merkel tree's are received. And to help more, do you have the last logs after AntiEntrophy? Any streami= ng sessions from other nodes? Bug is being tracked here: https://issues.apache.org/jira/browse/CASSANDRA-= 5146 Best, Michael From: Haithem Jarraya > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, February 19, 2013 1:29 AM To: "user@cassandra.apache.org" > Subject: Long running nodetool repair Hi, I am new to Cassandra and I am not sure if this is the normal behavior but = nodetool repair runs for too long even for small dataset per node. As I am = writing I started a nodetool repair last night at 18:41 and now it's 9:18 a= nd it's still running, the size of my data is only ~500mb per node. We have 3 Node cluster in DC1 with RF 3 1 Node Cluster in DC2 with RF 1 1 Node cluster in DC3 with RF 1 and running Cassandra V1.2.1 with 256 vNodes. >From cassandra logs I do not see AntiEntropy logs anymore only compaction T= ask and FlushWriter. Is this a normal behaviour of nodetool repair? Is the running time grow linearly with the size of the data? Any help or direction will be much appreciated. Thanks, H --_000_CD48E78B11DC7mkjellmanbarracudacom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
This is very normal (unfortunat= ely). Are you doing a repair =96pr or a straight up repair?

<= /div>
Does nodetool netstats show anything? I frequently see repair han= g in 1.2.1, and I haven't been able to figure out why yet though. Feel free= to take a stack dump with jstack on the node doing the repair and see if t= here are any deadlocks potentially occurring after the merkel tree's are re= ceived.

And to help more, do you have the last log= s after AntiEntrophy? Any streaming sessions from other nodes?

Best,
Michael

From: Haithem Jarr= aya <haithem.jarraya@struq.= com>
Reply-To: "user@cassandra.apache.org&quo= t; <user@cassandra.apache.o= rg>
Date: Tuesday, Febru= ary 19, 2013 1:29 AM
To: "= user@cassandra.apache.org&= quot; <user@cassandra.apach= e.org>
Subject: Long run= ning nodetool repair

Hi,

I am new to Cassandra and I am not sure if t= his is the normal behavior but nodetool repair runs for too long = even for small dataset per node. As I am writing I started a nodetool repai= r last night at 18:41 and now it's 9:18 and it's still running, the size of my data is only ~500mb per node.
We have<= /div>
3 Node cluster in DC1 with RF 3
1= Node Cluster in DC2 with RF 1
1 Node cluster in DC3 w= ith RF 1

and running Cassand= ra V1.2.1 with 256 vNodes.

F= rom cassandra logs I do not see AntiEntropy logs anymore only compaction Ta= sk and FlushWriter.

Is this = a normal behaviour of nodetool repair?
Is the running = time grow linearly with the size of the data?

Any help or direction will be much appreciated.


Thanks,

H
--_000_CD48E78B11DC7mkjellmanbarracudacom_--