From user-return-25989-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue May 8 10:05:00 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45AC3C7C8 for ; Tue, 8 May 2012 10:05:00 +0000 (UTC) Received: (qmail 8399 invoked by uid 500); 8 May 2012 10:04:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8306 invoked by uid 500); 8 May 2012 10:04:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8288 invoked by uid 99); 8 May 2012 10:04:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 10:04:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a91.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 10:04:50 +0000 Received: from homiemail-a91.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTP id 3EF79AE05B for ; Tue, 8 May 2012 03:04:27 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=SuJFsvzeS5 b/xzvyTnXCL8yi1Eiwq4jl+A5xa98N/ofhWzep63ptL8tHYn0V/tsH6r48DrTYto RGDBQ1Ert/4m0nepDj9lXyDTyYbvgxo9K/7F1pA8Cr4U0Dj/TWsnAcEc3xLbg0aD DzlnJ5J4O0X8VTEgACtgRJBlSpf8rZQxk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=W+ZtEruX2A/Ez6Hp XGa3dWzVyaA=; b=bzCoQCsS7C1g+85C7MrmzXfzS6NAVZ0a/u6MfEUOE6bbQvFT j+D4/bQeCbxIbvUu74rb0G3ccGUp+vyIn+YMjMI+p9gCl9pEruXbSu5NBvka5QE2 Jx83ElKk+K7YZ8Zq5XRbqIY2irOPHrALdOdhFThgX3qRxjtOZnHJWhAljk4= Received: from [172.16.1.4] (253.194.69.111.dynamic.snap.net.nz [111.69.194.253]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTPSA id 91248AE059 for ; Tue, 8 May 2012 03:04:26 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_C9FD62CA-9C26-4A07-93BB-86A43C7953E1" Subject: Re: getting status of long running repair Date: Tue, 8 May 2012 22:04:24 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <15CE4CB3-EA3C-4C49-B0F9-1534D81A92DA@thelastpickle.com> X-Mailer: Apple Mail (2.1257) --Apple-Mail=_C9FD62CA-9C26-4A07-93BB-86A43C7953E1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 When you look in the logs please let me know if you see this error=85 https://issues.apache.org/jira/browse/CASSANDRA-4223 I look at nodetool compactionstats (for the Merkle tree phase), = nodetool netstats for the streaming, and this to check for streaming = progress: while true; do date; diff <(nodetool -h localhost netstats) <(sleep 5 && = nodetool -h localhost netstats); done Or use Data Stax Ops Centre where possible = http://www.datastax.com/products/opscenter Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/05/2012, at 2:15 PM, Ben Coverston wrote: > Check the log files for warnings or errors. They may indicate why your = repair failed. >=20 > On Mon, May 7, 2012 at 10:09 AM, Bill Au wrote: > I restarted the nodes and then restarted the repair. It is still = hanging like before. Do I keep repeating until the repair actually = finish? >=20 > Bill >=20 >=20 > On Fri, May 4, 2012 at 2:18 PM, Rob Coli wrote: > On Fri, May 4, 2012 at 10:30 AM, Bill Au wrote: > > I know repair may take a long time to run. I am running repair on a = node > > with about 15 GB of data and it is taking more than 24 hours. Is = that > > normal? Is there any way to get status of the repair? tpstats does = show 2 > > active and 2 pending AntiEntropySessions. But netstats and = compactionstats > > show no activity. >=20 > As indicated by various recent threads to this effect, many versions > of cassandra (including current 1.0.x release) contain bugs which > sometimes prevent repair from completing. The other threads suggest > that some of these bugs result in the state you are in now, where you > do not see anything that looks like appropriate activity. > Unfortunately the only solution offered on these other threads is the > one I will now offer, which is to restart the participating nodes and > re-start the repair. I am unaware of any JIRA tickets tracking these > bugs (which doesn't mean they don't exist, of course) so you might > want to file one. :) >=20 > =3DRob >=20 > -- > =3DRobert Coli > AIM>ALK - rcoli@palominodb.com > YAHOO - rcoli.palominob > SKYPE - rcoli_palominodb >=20 >=20 >=20 >=20 > --=20 > Ben Coverston > DataStax -- The Apache Cassandra Company >=20 --Apple-Mail=_C9FD62CA-9C26-4A07-93BB-86A43C7953E1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 https://issu= es.apache.org/jira/browse/CASSANDRA-4223

I look = at nodetool compactionstats (for the Merkle tree phase),  nodetool = netstats for the streaming, and this to check for streaming = progress:

while true; do date; diff <(nodetool -h = localhost netstats) <(sleep 5 && nodetool -h localhost = netstats); done

Or use Data Stax Ops Centre = where possible http://www.datastax.co= m/products/opscenter

Cheers


http://www.thelastpickle.com

On 8/05/2012, at 2:15 PM, Ben Coverston wrote:

Check the = log files for warnings or errors. They may indicate why your repair = failed.

On Mon, May 7, 2012 at 10:09 = AM, Bill Au <bill.w.au@gmail.com> wrote:
I restarted the nodes and then restarted the repair.  It is = still hanging like before.  Do I keep repeating until the repair = actually finish?

Bill


On Fri, May 4, 2012 at 2:18 PM, Rob Coli <rcoli@palominodb.com> wrote:
On Fri, May 4, 2012 at 10:30 AM, Bill Au <bill.w.au@gmail.com> wrote:
> I know repair may take a long time to run.  I am running = repair on a node
> with about 15 GB of data and it is taking more than 24 hours.  = Is that
> normal?  Is there any way to get status of the repair?  = tpstats does show 2
> active and 2 pending AntiEntropySessions.  But netstats and = compactionstats
> show no activity.

As indicated by various recent threads to this effect, many = versions
of cassandra (including current 1.0.x release) contain bugs which
sometimes prevent repair from completing. The other threads suggest
that some of these bugs result in the state you are in now, where = you
do not see anything that looks like appropriate activity.
Unfortunately the only solution offered on these other threads is = the
one I will now offer, which is to restart the participating nodes = and
re-start the repair. I am unaware of any JIRA tickets tracking these
bugs (which doesn't mean they don't exist, of course) so you might
want to file one. :)

=3DRob

--
=3DRobert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb




-- =
Ben Coverston
DataStax -- The Apache Cassandra Company


= --Apple-Mail=_C9FD62CA-9C26-4A07-93BB-86A43C7953E1--