From user-return-33251-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Apr 5 18:03:47 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49723F67A for ; Fri, 5 Apr 2013 18:03:47 +0000 (UTC) Received: (qmail 29914 invoked by uid 500); 5 Apr 2013 18:03:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29891 invoked by uid 500); 5 Apr 2013 18:03:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29882 invoked by uid 99); 5 Apr 2013 18:03:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 18:03:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paulsudol@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 18:03:39 +0000 Received: by mail-ie0-f179.google.com with SMTP id k11so4680144iea.10 for ; Fri, 05 Apr 2013 11:03:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:content-type:message-id:mime-version:subject:date :references:to:in-reply-to:x-mailer; bh=dSDOwDpmvljwE0waIbYloJG/NIw21TT22x3DR6yHvNw=; b=InUjqSssQSK4p2EyYU5Y2K2Mbz/BXqd865cXPbTyBpeJ/GSYhE+dwRGgSTsEkAXe3y Vsve/pWMiAOzB/A3UZUZWOkhu5vctAHvW4t2JLjBCRN1HOUGvHRK4wrnP40nK//FI0l6 9h58wHt06jgxpQeOzLYxQ8y0jxQUYHAD11Bxmtgu3zpcCJg2Rn6tA3qdGvaDcoal6DdI otXVNLW3VtAtOdR/6PAzxxKf1Hiv2uJ20c98y5hNOn6GjOlUGUNeMXZiDojdUYWO8ulz 1mfZskK4nYtCgNKyvNXVk6eLhZBb1alYL5LUcuL+94K8fmpiHyLI7wM72CpwfmqBhx7a 7wlA== X-Received: by 10.42.203.68 with SMTP id fh4mr1985686icb.36.1365184998936; Fri, 05 Apr 2013 11:03:18 -0700 (PDT) Received: from [192.168.80.81] (23-25-25-125-static.hfc.comcastbusiness.net. [23.25.25.125]) by mx.google.com with ESMTPS id qn10sm3251364igc.6.2013.04.05.11.03.16 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 05 Apr 2013 11:03:17 -0700 (PDT) From: Paul Sudol Content-Type: multipart/alternative; boundary="Apple-Mail=_58DFD3BF-DDB7-4BA5-BF4C-B49418E008B7" Message-Id: <510E7E12-1550-458E-B7D0-EB25C42CDA9F@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Repair hangs when merkle tree request is not acknowledged Date: Fri, 5 Apr 2013 13:03:15 -0500 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_58DFD3BF-DDB7-4BA5-BF4C-B49418E008B7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > How does it fail? If I wait 24 hours, the repair command will return an error saying that = the node died=85 but the node really didn't die, I watch it the whole = time. I have the DEBUG messages on in the log files, when the node I'm = repairing sends out a merkle tree request, I will normally see, = {ColumnFamilyStore.java (line 700) forceFlush requested but everything = is clean in }, in the log of the node that should be = generating the merkle tree request. (in addition, when I run nodetool -h = localhost compactionstats, I will see activity). When the node that should be generating a merkle tree does not have this = message, and has no activity to see via nodetool compactionstats, it = will fail. There are no errors about streaming, it does not even get to the point = of streaming. One node will send requests for merkle trees, and = sometimes, the node in the other data center just doesn't get the = message. At least that's what it looks like. Should I still try the phi_convict_threshold? Thanks! Paul On Apr 5, 2013, at 12:19 PM, aaron morton = wrote: >> A repair on a certain CF will fail, and I run it again and again, = eventually it will succeed. >=20 >=20 > Can you see the repair start on the other node ?=20 > If you are getting errors in the log about streaming failing because a = node died, and the FailureDetector is in the call stack, change the = phi_convict_threshold. You can set it in the yaml file or via JMX on the = FailureDetectorMBean, in either case boost it from 8 to 16 to get the = repair through. This will make it less likely that a node is marked as = down, you probably want to run with 8 or a little bit higher normally.=20= >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 4/04/2013, at 6:41 PM, Paul Sudol wrote: >=20 >> Hello, >>=20 >> I have a cluster with 4 nodes, 2 nodes in 2 data centers. I had a = hardware failure in one DC and had to replace the nodes. I'm running = 1.2.3 on all of the nodes now. I was able to run nodetool rebuild on the = two replacement nodes, but now I cannot finish a repair on any of them. = I have 18 column families, if I run a repair on a single CF at a time, I = can get the node repaired eventually. A repair on a certain CF will = fail, and I run it again and again, eventually it will succeed. >>=20 >> I've got an RF of 2, 1 copy in each DC, so the repair needs to pull = data from the other DC to finish it's repair. >>=20 >> The problem seems to be that the merkle tree request sometimes is not = received by the node in the other DC. Usually when the merkle tree = request is sent, the nodes that it was sent to start a = compaciton/validation. In certain cases this does not happen, only the = node that I ran the repair on will begin compaction/validation and send = the merkle tree to itself. Then it's waiting for a merkle tree from the = other node, and it will never get it. After about 24 hours it will time = out and say the node in question died. >>=20 >> Is there a setting I can use to force the merkle tree request to be = acknowledged or resent if it's not acknowledged? I setup NTPD on all the = nodes and tried the cross_node_timeout, but that did not help. >>=20 >> Thanks in advance, >>=20 >> Paul >=20 --Apple-Mail=_58DFD3BF-DDB7-4BA5-BF4C-B49418E008B7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
How = does it fail?
If I wait 24 hours, the repair = command will return an error saying that the node died=85 but the node = really didn't die, I watch it the whole time.
I have the DEBUG = messages on in the log files, when the node I'm repairing sends out a = merkle tree request, I will normally see, {ColumnFamilyStore.java (line = 700) forceFlush requested but everything is clean in <COLUMN FAMILY = NAME>}, in the log of the node that should be generating the = merkle tree request. (in addition, when I run nodetool -h localhost = compactionstats, I will see activity).

When the = node that should be generating a merkle tree does not have this message, = and has no activity to see via nodetool compactionstats, it will = fail.

There are no errors about streaming, it = does not even get to the point of streaming. One node will send requests = for merkle trees, and sometimes, the node in the other data center just = doesn't get the message. At least that's what it looks = like.

Should I still try the = phi_convict_threshold?

Thanks!

Paul



On Apr 5, = 2013, at 12:19 PM, aaron morton <aaron@thelastpickle.com> = wrote:

A repair on a certain CF will fail, and I = run it again and again, eventually it will = succeed.


Can you see the repair = start on the other node ? 
If you are getting errors in = the log about streaming failing because a node died, and the = FailureDetector is in the call stack, change the phi_convict_threshold. = You can set it in the yaml file or via JMX on = the FailureDetectorMBean, in either case boost it from 8 to 16 to = get the repair through. This will make it less likely that a node is = marked as down, you probably want to run with 8 or a little bit higher = normally. 

Cheers

http://www.thelastpickle.com

On 4/04/2013, at 6:41 PM, Paul Sudol <paulsudol@gmail.com> = wrote:

Hello,

I have a cluster with 4 nodes, 2 nodes in 2 = data centers. I had a hardware failure in one DC and had to replace the = nodes. I'm running 1.2.3 on all of the nodes now. I was able to run = nodetool rebuild on the two replacement nodes, but now I cannot finish a = repair on any of them. I have 18 column families, if I run a repair on a = single CF at a time, I can get the node repaired eventually. A repair on = a certain CF will fail, and I run it again and again, eventually it will = succeed.

I've got an RF of 2, 1 copy in each DC, so the repair = needs to pull data from the other DC to finish it's repair.

The = problem seems to be that the merkle tree request sometimes is not = received by the node in the other DC. Usually when the merkle tree = request is sent, the nodes that it was sent to start a = compaciton/validation. In certain cases this does not happen, only the = node that I ran the repair on will begin compaction/validation and send = the merkle tree to itself. Then it's waiting for a merkle tree from the = other node, and it will never get it. After about 24 hours it will time = out and say the node in question died.

Is there a setting I can = use to force the merkle tree request to be acknowledged or resent if = it's not acknowledged? I setup NTPD on all the nodes and tried the = cross_node_timeout, but that did not help.

Thanks in = advance,

Paul

=
= --Apple-Mail=_58DFD3BF-DDB7-4BA5-BF4C-B49418E008B7--