Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EE95322C for ; Wed, 4 May 2011 05:55:06 +0000 (UTC) Received: (qmail 12506 invoked by uid 500); 4 May 2011 05:55:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12487 invoked by uid 500); 4 May 2011 05:55:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12479 invoked by uid 99); 4 May 2011 05:55:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 05:55:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of izquierdo@strands.com designates 217.116.18.226 as permitted sender) Received: from [217.116.18.226] (HELO mail.strands.com) (217.116.18.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 05:54:53 +0000 Received: from localhost (localhost [127.0.0.1]) by mail.strands.com (Postfix) with ESMTP id 3FF3630021F for ; Wed, 4 May 2011 07:54:33 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at strands.com X-Spam-Score: -9.5 X-Spam-Level: Received: from mail.strands.com ([127.0.0.1]) by localhost (mail.strands.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ooRQh7svatnq for ; Wed, 4 May 2011 07:54:32 +0200 (CEST) Received: from [192.168.1.129] (46.207.218.87.dynamic.jazztel.es [87.218.207.46]) (using SSLv3 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: izquierdo) by mail.strands.com (Postfix) with ESMTPSA id 82FA230021E for ; Wed, 4 May 2011 07:54:32 +0200 (CEST) Subject: Re: Problems recovering a dead node From: =?ISO-8859-1?Q?H=E9ctor?= Izquierdo Seliva To: user@cassandra.apache.org In-Reply-To: <4AC716F9-69EB-439C-AF25-688E360B114E@thelastpickle.com> References: <1304420956.1951.14.camel@Avalon> <4AC716F9-69EB-439C-AF25-688E360B114E@thelastpickle.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 04 May 2011 07:54:30 +0200 Message-ID: <1304488470.9834.6.camel@Avalon> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-9.5 required=5.1 tests=[ALL_TRUSTED=-5.5, BAYES_00=-4] autolearn=ham Hi Aaron It has no data files whatsoever. The upgrade path is 0.7.4 -> 0.7.5. It turns out the initial problem was the sw raid failing silently because of another faulty disk. Now that the storage is working, I brought up the node again, same IP, same token and tried doing nodetool repair. All adjacent nodes have finished the streaming session, and now the node has a total of 248 GB of data. Is this normal when the load per node is about 18GB? Also there are 1245 pending tasks. It's been compacting or rebuilding sstables for the last 8 hours non stop. There are 2057 sstables in the data folder. Should I have done thing differently or is this the normal behaviour? Thanks! El mié, 04-05-2011 a las 07:54 +1200, aaron morton escribió: > When you say "it's clean" does that mean the node has no data files ? > > After you replaced the disk what process did you use to recover ? > > Also what version are you running and what's the recent upgrade history ? > > Cheers > Aaron > > On 3 May 2011, at 23:09, Héctor Izquierdo Seliva wrote: > > > Hi everyone. One of the nodes in my 6 node cluster died with disk > > failures. I have replaced the disks, and it's clean. It has the same > > configuration (same ip, same token). > > > > When I try to restart the node it starts to throw mmap underflow > > exceptions till it closes again. > > > > I tried setting io to standard, but it still fails. It gives errors > > about two decorated keys being different, and the EOFException. > > > > Here is an excerpt of the log > > > > http://pastebin.com/ZXW1wY6T > > > > I can provide more info if needed. I'm at a loss here so any help is > > appreciated. > > > > Thanks all for your time > > > > Héctor Izquierdo > > >