From user-return-33249-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Apr 5 17:20:05 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 183ABF1E8 for ; Fri, 5 Apr 2013 17:20:05 +0000 (UTC) Received: (qmail 82949 invoked by uid 500); 5 Apr 2013 17:20:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82923 invoked by uid 500); 5 Apr 2013 17:20:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82914 invoked by uid 99); 5 Apr 2013 17:20:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 17:20:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a83.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 17:19:55 +0000 Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id B2FD05E07D for ; Fri, 5 Apr 2013 10:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=yUhuaUeO+CX3IHBwlvStQ4OBgN I=; b=wtLRJ9qC87iZip8eFEcsHe0+h6gistcc8Ywp2g0hGVHGxYfex2REQUMfvI ve0HpUKzTk1mar4ZQ5sitWYwnWkksqOEf1Rr5BiwY2W3Gauwlr1fnUR9xYhma/SQ 82aRSpLr6UdsFd8JhZ5DPR5G8TwGqz9ZSNh6hZL+mjN4bTr2E= Received: from [10.65.15.205] (unknown [59.164.97.108]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id 5459E5E06A for ; Fri, 5 Apr 2013 10:19:31 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_74AF2863-B323-49C9-B234-80E5F78C8002" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Repair hangs when merkle tree request is not acknowledged Date: Fri, 5 Apr 2013 22:49:25 +0530 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_74AF2863-B323-49C9-B234-80E5F78C8002 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > A repair on a certain CF will fail, and I run it again and again, = eventually it will succeed. How does it fail? Can you see the repair start on the other node ?=20 If you are getting errors in the log about streaming failing because a = node died, and the FailureDetector is in the call stack, change the = phi_convict_threshold. You can set it in the yaml file or via JMX on the = FailureDetectorMBean, in either case boost it from 8 to 16 to get the = repair through. This will make it less likely that a node is marked as = down, you probably want to run with 8 or a little bit higher normally.=20= Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 4/04/2013, at 6:41 PM, Paul Sudol wrote: > Hello, >=20 > I have a cluster with 4 nodes, 2 nodes in 2 data centers. I had a = hardware failure in one DC and had to replace the nodes. I'm running = 1.2.3 on all of the nodes now. I was able to run nodetool rebuild on the = two replacement nodes, but now I cannot finish a repair on any of them. = I have 18 column families, if I run a repair on a single CF at a time, I = can get the node repaired eventually. A repair on a certain CF will = fail, and I run it again and again, eventually it will succeed. >=20 > I've got an RF of 2, 1 copy in each DC, so the repair needs to pull = data from the other DC to finish it's repair. >=20 > The problem seems to be that the merkle tree request sometimes is not = received by the node in the other DC. Usually when the merkle tree = request is sent, the nodes that it was sent to start a = compaciton/validation. In certain cases this does not happen, only the = node that I ran the repair on will begin compaction/validation and send = the merkle tree to itself. Then it's waiting for a merkle tree from the = other node, and it will never get it. After about 24 hours it will time = out and say the node in question died. >=20 > Is there a setting I can use to force the merkle tree request to be = acknowledged or resent if it's not acknowledged? I setup NTPD on all the = nodes and tried the cross_node_timeout, but that did not help. >=20 > Thanks in advance, >=20 > Paul --Apple-Mail=_74AF2863-B323-49C9-B234-80E5F78C8002 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii A repair on a certain CF will fail, and I = run it again and again, eventually it will succeed.How does = it fail?

Can you see the repair start on the other = node ? 
If you are getting errors in the log about = streaming failing because a node died, and the FailureDetector is in the = call stack, change the phi_convict_threshold. You can set it in the yaml = file or via JMX on the FailureDetectorMBean, in either case boost = it from 8 to 16 to get the repair through. This will make it less likely = that a node is marked as down, you probably want to run with 8 or a = little bit higher = normally. 

Cheers

http://www.thelastpickle.com

On 4/04/2013, at 6:41 PM, Paul Sudol <paulsudol@gmail.com> = wrote:

Hello,

I have a cluster with 4 nodes, 2 nodes in 2 = data centers. I had a hardware failure in one DC and had to replace the = nodes. I'm running 1.2.3 on all of the nodes now. I was able to run = nodetool rebuild on the two replacement nodes, but now I cannot finish a = repair on any of them. I have 18 column families, if I run a repair on a = single CF at a time, I can get the node repaired eventually. A repair on = a certain CF will fail, and I run it again and again, eventually it will = succeed.

I've got an RF of 2, 1 copy in each DC, so the repair = needs to pull data from the other DC to finish it's repair.

The = problem seems to be that the merkle tree request sometimes is not = received by the node in the other DC. Usually when the merkle tree = request is sent, the nodes that it was sent to start a = compaciton/validation. In certain cases this does not happen, only the = node that I ran the repair on will begin compaction/validation and send = the merkle tree to itself. Then it's waiting for a merkle tree from the = other node, and it will never get it. After about 24 hours it will time = out and say the node in question died.

Is there a setting I can = use to force the merkle tree request to be acknowledged or resent if = it's not acknowledged? I setup NTPD on all the nodes and tried the = cross_node_timeout, but that did not help.

Thanks in = advance,

Paul

= --Apple-Mail=_74AF2863-B323-49C9-B234-80E5F78C8002--