Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 22046 invoked from network); 10 May 2009 08:11:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 May 2009 08:11:48 -0000 Received: (qmail 79947 invoked by uid 500); 10 May 2009 08:11:46 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 79855 invoked by uid 500); 10 May 2009 08:11:46 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 79845 invoked by uid 99); 10 May 2009 08:11:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 May 2009 08:11:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tamirkamara@gmail.com designates 74.125.78.27 as permitted sender) Received: from [74.125.78.27] (HELO ey-out-2122.google.com) (74.125.78.27) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 May 2009 08:11:33 +0000 Received: by ey-out-2122.google.com with SMTP id d26so646378eyd.35 for ; Sun, 10 May 2009 01:11:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=uw29M0HGQ1fFhNebiPm1+xGYBOOMWpKGt9PmBfSylmY=; b=RJ2Rn5RJGykOS9An6dFJz2eLXV3UYDvuNqgWVJI3zkFdu17BZtELdAF+UErhGzy5K4 ebyBJ6NXFp5vITr+a/Q8IgI1nVoB6gjq/mJy0JlaWLG8ARV2G+P+irFR68CPozkwXXCI LG9nnH4bi09Ypm+4fBo+9apAuBFCvDvHUsE8o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ZruRrn7KYRQ92eF+sLvA5T6R+/xWy4W6q3DRK2dAfjfWG+3KY1Z5Fm0mfaqgF8MkkN wJlJRBJ4vcLyEdKOx/y4k9ST5E4qmQM5+kUrH3SmWYgDIr1e+VnbzSr3DIJXuFOJPYYv I1fEi5CkJ9KqQdrir8MwrBIA+FLTwYRPPP5Nw= MIME-Version: 1.0 Received: by 10.216.25.209 with SMTP id z59mr2686751wez.204.1241943072925; Sun, 10 May 2009 01:11:12 -0700 (PDT) In-Reply-To: <4A01DCC2.7000306@yahoo-inc.com> References: <77938bc20905040550o363118f7re37fb72efb243787@mail.gmail.com> <6d10e930905040553o62451ee7g9603e42f31738ab2@mail.gmail.com> <77938bc20905040648t1b5790acv5cb1e31aa0da25b0@mail.gmail.com> <6d10e930905042237jf749095h8e0542688c57ca63@mail.gmail.com> <4A008201.2010903@yahoo-inc.com> <6d10e930905052218y6cc34f19pd90565b10ab60fed@mail.gmail.com> <4A01DCC2.7000306@yahoo-inc.com> Date: Sun, 10 May 2009 11:11:12 +0300 Message-ID: <6d10e930905100111v7bea52adx5f071290a105ad94@mail.gmail.com> Subject: Re: Namenode failed to start with "FSNamesystem initialization failed" error From: Tamir Kamara To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6d56694b16a3304698a6763 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d56694b16a3304698a6763 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Filed HADOOP-5798. On Wed, May 6, 2009 at 9:53 PM, Raghu Angadi wrote: > Tamir Kamara wrote: > >> Hi Raghu, >> >> The thread you posted is my original post written when this problem first >> happened on my cluster. I can file a JIRA but I wouldn't be able to >> provide >> information other than what I already posted and I don't have the logs >> from >> that time. Should I still file ? >> > > yes. Jira is a better place for tracking and fixing bugs. I am pretty sure > what you saw is a bug (either already or needs to be fixed). > > Raghu. > > > Thanks, >> Tamir >> >> >> On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi >> wrote: >> >> Tamir, >>> >>> Please file a jira on the problem you are seeing with 'saveLeases'. In >>> the >>> past there have been multiple fixes in this area (HADOOP-3418, >>> HADOOP-3724, >>> and more mentioned in HADOOP-3724). >>> >>> Also refer the thread you started >>> http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html >>> >>> I think another user reported the same problem recently. >>> >>> These are indeed very serious and very annoying bugs. >>> >>> Raghu. >>> >>> >>> Tamir Kamara wrote: >>> >>> I didn't have a space problem which led to it (I think). The corruption >>>> started after I bounced the cluster. >>>> At the time, I tried to investigate what led to the corruption but >>>> didn't >>>> find anything useful in the logs besides this line: >>>> saveLeases found path >>>> >>>> >>>> /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002 >>>> but no matching entry in namespace >>>> >>>> I also tried to recover from the secondary name node files but the >>>> corruption my too wide-spread and I had to format. >>>> >>>> Tamir >>>> >>>> On Mon, May 4, 2009 at 4:48 PM, Stas Oskin >>>> wrote: >>>> >>>> Hi. >>>> >>>>> Same conditions - where the space has run out and the fs got corrupted? >>>>> >>>>> Or it got corrupted by itself (which is even more worrying)? >>>>> >>>>> Regards. >>>>> >>>>> 2009/5/4 Tamir Kamara >>>>> >>>>> I had the same problem a couple of weeks ago with 0.19.1. Had to >>>>> >>>>>> reformat >>>>>> the cluster too... >>>>>> >>>>>> On Mon, May 4, 2009 at 3:50 PM, Stas Oskin >>>>>> wrote: >>>>>> >>>>>> Hi. >>>>>> >>>>>>> After rebooting the NameNode server, I found out the NameNode doesn't >>>>>>> >>>>>>> start >>>>>> >>>>>> anymore. >>>>>>> >>>>>>> The logs contained this error: >>>>>>> "FSNamesystem initialization failed" >>>>>>> >>>>>>> >>>>>>> I suspected filesystem corruption, so I tried to recover from >>>>>>> SecondaryNameNode. Problem is, it was completely empty! >>>>>>> >>>>>>> I had an issue that might have caused this - the root mount has run >>>>>>> out >>>>>>> >>>>>>> of >>>>>> >>>>>> space. But, both the NameNode and the SecondaryNameNode directories >>>>>>> >>>>>>> were >>>>>> on >>>>>> >>>>>> another mount point with plenty of space there - so it's very strange >>>>>>> >>>>>>> that >>>>>> >>>>>> they were impacted in any way. >>>>>>> >>>>>>> Perhaps the logs, which were located on root mount and as a result, >>>>>>> >>>>>>> could >>>>>> not be written, have caused this? >>>>>> >>>>>>> >>>>>>> To get back HDFS running, i had to format the HDFS (including >>>>>>> manually >>>>>>> erasing the files from DataNodes). While this reasonable in test >>>>>>> environment >>>>>>> - production-wise it would be very bad. >>>>>>> >>>>>>> Any idea why it happened, and what can be done to prevent it in the >>>>>>> >>>>>>> future? >>>>>> >>>>>> I'm using the stable 0.18.3 version of Hadoop. >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> >>>>>>> >> > --0016e6d56694b16a3304698a6763--