Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E07867DE1 for ; Sun, 4 Dec 2011 01:34:45 +0000 (UTC) Received: (qmail 32595 invoked by uid 500); 4 Dec 2011 01:34:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32563 invoked by uid 500); 4 Dec 2011 01:34:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32555 invoked by uid 99); 4 Dec 2011 01:34:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Dec 2011 01:34:43 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of potekhin@bnl.gov designates 130.199.3.132 as permitted sender) Received: from [130.199.3.132] (HELO smtpgw.bnl.gov) (130.199.3.132) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Dec 2011 01:34:33 +0000 X-BNL-policy-q: X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhoFAO3M2k6CxzYH/2dsb2JhbABEhQWifYItgQWBcgEBBSMVQBELGAICBRYLAgIJAwIBAgFFEwgBAaxYkH2BMIZcgX+BFgSILZF2jG8 X-IronPort-AV: E=Sophos;i="4.71,292,1320642000"; d="scan'208";a="155992050" Received: from rcf.rhic.bnl.gov ([130.199.54.7]) by smtpgw.sec.bnl.local with ESMTP/TLS/DHE-RSA-AES256-SHA; 03 Dec 2011 20:34:12 -0500 Received: from [192.168.0.196] (ool-18bde93d.dyn.optonline.net [24.189.233.61]) (authenticated bits=0) by rcf.rhic.bnl.gov (8.13.8/8.13.8) with ESMTP id pB41Y8vx005576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 3 Dec 2011 20:34:12 -0500 Message-ID: <4EDACE10.6010804@bnl.gov> Date: Sat, 03 Dec 2011 20:34:08 -0500 From: Maxim Potekhin User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Repair failure under 0.8.6 References: <4EDAAF7E.40502@bnl.gov> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thank you Peter. Before I look into details as you suggest, may I ask what you mean "automatically restarted"? They way the box and Cassandra are set up in my case is such that the death of either if final. Also, how do I look for full GC? I just realized that in the latest install, I might have omitted capping the heap size -- and the nodes have 48GB each. I guess this could be a problem, precipitating GC death, right? Thank you Maxim On 12/3/2011 7:46 PM, Peter Schuller wrote: >> quite understand how Cassandra declared a node dead (in the below). Was is a >> timeout? How do I fix that? > I was about to respond to say that repair doesn't fail just due to > failure detection, but this appears to have been broken by > CASSANDRA-2433 :( > > Unless there is a subtle bug the exception you're seeing should be > indicative that it really was considered Down by the node. You might > grep the log for references ot the node in question (UP or DOWN) to > confirm. The question is why though. I would check if the node has > maybe automatically restarted, or went into full GC, etc. >