Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BD7F9234 for ; Wed, 25 Apr 2012 15:50:30 +0000 (UTC) Received: (qmail 80612 invoked by uid 500); 25 Apr 2012 15:50:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80587 invoked by uid 500); 25 Apr 2012 15:50:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80579 invoked by uid 99); 25 Apr 2012 15:50:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 15:50:27 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gulrich@netflix.com designates 69.53.237.162 as permitted sender) Received: from [69.53.237.162] (HELO exout101.netflix.com) (69.53.237.162) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 15:50:19 +0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:references:in-reply-to:content-type :mime-version; bh=ng1+VV/Z+MDJ59LJGf4gDCh5AGc=; b=ki6zOcYZLdlCPhdcx4JQbgEnXmwM06n7y7qNFLhqLIwmHG7m+orOu/uiPUjl9iomL+U7WKXy /EwqpuYjRB0fF36jOZCn0WnkuKh6VnfcXdzKmwyrFQk+qYDsBRtKwHVAoI8GOBFf72jXKIyJ ovvLRM/PL5htiJpRpnmnS7LSK7M= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:references:in-reply-to:content-type :mime-version; b=UyLNraSK4VIv03m1IuZ3NTLO+4hMMwxTqLPTX2mh3Sf2aOk9/5snNGFSqaEA/nr5FPH1wGTt 17xtG0mD2iS9zRJr/C7tn9VEIs7sFOfnf4l58QIzyaM4HEjHcCN8wjhjRMPnPfUTNsga6c5U zZy8v25DGic3qpuM3v8s/74TBHQ= Received: from EXFE102.corp.netflix.com (10.64.32.162) by exout101.netflix.com (10.64.240.73) with Microsoft SMTP Server (TLS) id 8.3.245.1; Wed, 25 Apr 2012 08:49:58 -0700 Received: from EXMB105.corp.netflix.com ([169.254.5.72]) by exfe102.corp.netflix.com ([10.64.32.162]) with mapi id 14.02.0283.003; Wed, 25 Apr 2012 08:49:58 -0700 From: Gregg Ulrich To: "" Subject: Re: nodetool repair hanging Thread-Topic: nodetool repair hanging Thread-Index: AQHNImUMUIkW4SnS7EGvLaNd2HNQ/ZasJq+A Date: Wed, 25 Apr 2012 15:49:57 +0000 Message-ID: <3DE1F133-2AB9-474E-9C9B-828A78AF4E1D@netflix.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.2.177.18] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <363DD11A60AB094CB611A6666FF804D6@netflix.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 How much data do you have and how long is "a while"? In my experience repa= irs can take a very long time. Check to see if validation compactions are = running (nodetool compactionstats) or if files are streaming (nodetool nets= tats). If either of those are in progress then your repair should be runni= ng. I've seen 12 node, 50G clusters take days to repair to a new data cent= er. Not sure if 1.0 is different but in 0.X I don't believe killing the nodetoo= l process stops the repair. When we need to stop a repair we have bounced = all of the participating nodes. I've been told that there is no harm in st= opping repairs. On Apr 24, 2012, at 2:55 PM, Bill Au wrote: > I am running 1.0.8. I am adding a new data center to an existing cluster= . Following steps outlined in another thread on the mailing list, things w= ent fine except for the last step, which is to run repair on all the nodes = in the new data center. Repair seems to be hanging indefinitely. There is= no activity in system.log. I did notice that the node being repair is req= uesting ranges from nodes in both the existing and new data center. Since = there is not data in the new data center initially, I though that it may be= why repair is hanging. So I break out of the repair with a control-C afte= r waiting for a while. I do see data being added to the new nodes. When I= ran repair for the second time it is still hanging. >=20 > Why is repair hanging? Is it save to use control-C to break out of it. = How do I recover from this? >=20 > Bill