Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 709569167 for ; Mon, 7 May 2012 16:09:54 +0000 (UTC) Received: (qmail 80968 invoked by uid 500); 7 May 2012 16:09:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80926 invoked by uid 500); 7 May 2012 16:09:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80913 invoked by uid 99); 7 May 2012 16:09:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 16:09:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bill.w.au@gmail.com designates 209.85.216.49 as permitted sender) Received: from [209.85.216.49] (HELO mail-qa0-f49.google.com) (209.85.216.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 16:09:46 +0000 Received: by qabj40 with SMTP id j40so2967107qab.8 for ; Mon, 07 May 2012 09:09:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TDiBLMqfZ8K8MGGz662OuW+zzuITcK2NX/nVqpfMm3Y=; b=c5W2sKD+3/4LgIBLthWXDsUTHpRvM6zDNbyAg62QhKyxgMNBMxMIJPY55n5nPKmAua 7vH9QbZbc5RR3G2mQKbx4NBsy4vKDESqoZhQQ1VW4YskEdCXe/KoVCDO2jnuY12JlsxY uHmw9iDKVwDyeTk7UWdffQ9G+sNwRCeF1EWbzPGY5nGTmJ6VYatDfJf5duG6QoqMzf7L EpzB+NA+zJV/hcPHxZdvFwlVbxQXbB9eyEIhsvU8pymFwehorwt3UNCqQSMjOgM7V+8V loTFZ6BA2ottR4NJDF8clqOwP/fc/iIDjcKqZM1qiiW6gDr/qWm9Md4SfqHfO6HRc99h CcuA== MIME-Version: 1.0 Received: by 10.60.10.231 with SMTP id l7mr2438010oeb.4.1336406965783; Mon, 07 May 2012 09:09:25 -0700 (PDT) Received: by 10.182.45.7 with HTTP; Mon, 7 May 2012 09:09:25 -0700 (PDT) In-Reply-To: References: Date: Mon, 7 May 2012 12:09:25 -0400 Message-ID: Subject: Re: getting status of long running repair From: Bill Au To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8fb204d8788e6604bf747eb6 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb204d8788e6604bf747eb6 Content-Type: text/plain; charset=ISO-8859-1 I restarted the nodes and then restarted the repair. It is still hanging like before. Do I keep repeating until the repair actually finish? Bill On Fri, May 4, 2012 at 2:18 PM, Rob Coli wrote: > On Fri, May 4, 2012 at 10:30 AM, Bill Au wrote: > > I know repair may take a long time to run. I am running repair on a node > > with about 15 GB of data and it is taking more than 24 hours. Is that > > normal? Is there any way to get status of the repair? tpstats does > show 2 > > active and 2 pending AntiEntropySessions. But netstats and > compactionstats > > show no activity. > > As indicated by various recent threads to this effect, many versions > of cassandra (including current 1.0.x release) contain bugs which > sometimes prevent repair from completing. The other threads suggest > that some of these bugs result in the state you are in now, where you > do not see anything that looks like appropriate activity. > Unfortunately the only solution offered on these other threads is the > one I will now offer, which is to restart the participating nodes and > re-start the repair. I am unaware of any JIRA tickets tracking these > bugs (which doesn't mean they don't exist, of course) so you might > want to file one. :) > > =Rob > > -- > =Robert Coli > AIM>ALK - rcoli@palominodb.com > YAHOO - rcoli.palominob > SKYPE - rcoli_palominodb > --e89a8fb204d8788e6604bf747eb6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I restarted the nodes and then restarted the repair.=A0 It is still hanging= like before.=A0 Do I keep repeating until the repair actually finish?
<= br>Bill

On Fri, May 4, 2012 at 2:18 PM, R= ob Coli <rcoli@palominodb.com> wrote:
On F= ri, May 4, 2012 at 10:30 AM, Bill Au <bill.w.au@gmail.com> wrote:
> I know repair may take a long time to run.=A0 I am running repair on a= node
> with about 15 GB of data and it is taking more than 24 hours.=A0 Is th= at
> normal?=A0 Is there any way to get status of the repair?=A0 tpstats do= es show 2
> active and 2 pending AntiEntropySessions.=A0 But netstats and compacti= onstats
> show no activity.

As indicated by various recent threads to this effect, many ver= sions
of cassandra (including current 1.0.x release) contain bugs which
sometimes prevent repair from completing. The other threads suggest
that some of these bugs result in the state you are in now, where you
do not see anything that looks like appropriate activity.
Unfortunately the only solution offered on these other threads is the
one I will now offer, which is to restart the participating nodes and
re-start the repair. I am unaware of any JIRA tickets tracking these
bugs (which doesn't mean they don't exist, of course) so you might<= br> want to file one. :)

=3DRob

--
=3DRobert Coli
AIM&GTALK - rcoli@palominodb.co= m
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

--e89a8fb204d8788e6604bf747eb6--