Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 84648 invoked from network); 4 Mar 2010 20:22:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 20:22:56 -0000 Received: (qmail 93570 invoked by uid 500); 4 Mar 2010 20:22:42 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 93439 invoked by uid 500); 4 Mar 2010 20:22:42 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 93414 invoked by uid 99); 4 Mar 2010 20:22:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 20:22:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saidtherobot@gmail.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 20:22:34 +0000 Received: by pvc7 with SMTP id 7so848629pvc.35 for ; Thu, 04 Mar 2010 12:22:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=4LQ/OoqFqHxhvu62XP72a4CPFqztmexte7y5BQBDYjU=; b=vym6P8sSd+XQ37dxwnSoMLiGqV+FHMrFCGZeroT+nyXUSSd6pySfYe4tXjNGWbR10E fidwofsqGQsiP+sSPIEn2kNc2/gu4FSok4e4saoL4p8UfPtTEESyH0Um7AumwTgsLm36 RrwIJhPx24y7+wIEJHOtI9xx2y6TA51qx8rlI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=OXzg9JlMxlHnz18CDekTB3ZJMxykaWYAIJS8E/E0d2OyCyRAsSGPVyooCP4Ang5eFm 2MKGzd0ti16kIXtEmkMpmEstsWRDsySSgLbCW06nN7ob5buVWgBgXpf0AOQ1fRJtOaUA 0KXSmKSzSrjQb0jKvL9jeKQRy8lDGlgrwi3Co= MIME-Version: 1.0 Received: by 10.142.6.31 with SMTP id 31mr1280454wff.79.1267734133639; Thu, 04 Mar 2010 12:22:13 -0800 (PST) In-Reply-To: <45f85f71003041150h68fdd90bwf62f5801194954e3@mail.gmail.com> References: <2986c2f31003040857q5aaa6d4dje5fda99ee3cd198c@mail.gmail.com> <45f85f71003040900t5a3b20d2sd402fcbd5d217a2@mail.gmail.com> <2986c2f31003040905m789ccbbbu7b312c780086c4c1@mail.gmail.com> <45f85f71003041054w412e1840sdc9726c850babcf5@mail.gmail.com> <2986c2f31003041137j3410bed6wab112faf8f7b605c@mail.gmail.com> <45f85f71003041150h68fdd90bwf62f5801194954e3@mail.gmail.com> Date: Thu, 4 Mar 2010 15:22:13 -0500 Message-ID: <2986c2f31003041222w30d806ecpd8f3f46e7610ee98@mail.gmail.com> Subject: Re: can't start namenode From: mike anderson To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Todd, That did the trick. Thanks to everyone for the quick responses and effective suggestions. -Mike On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon wrote: > Hi Mike, > > Since you removed the edits, you restored to an earlier version of the > namesystem. Thus, any files that were deleted since the last checkpoint w= ill > have come back. But, the blocks will have been removed from the datanodes= . > So, the NN is complaining since there are some files that have missing > blocks. That is to say, some of your files are corrupt (ie unreadable > because the data is gone but the metadata is still there) > > In order to force it out of safemode, you can run hadoop dfsadmin -safemo= de > leave > You should also run "hadoop fsck" in order to determine which files are > broken, and then probably use the -delete option to remove their metadata= . > > Thanks > -Todd > > On Thu, Mar 4, 2010 at 11:37 AM, mike anderson wr= ote: > >> Removing edits.new and starting worked, though it didn't seem that >> happy about it. It started up nonetheless, in safe mode. Saying that >> "The ratio of reported blocks 0.9948 has not reached the threshold >> 0.9990. Safe mode will be turned off automatically." Unfortunately >> this is holding up the restart of hbase. >> >> About how long does it take to exit safe mode? is there anything I can >> do to expedite the process? >> >> >> >> On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon wrote: >> > >> > Sorry, I actually meant ls -l from name.dir/current/ >> > >> > Having only one dfs.name.dir isn't recommended - after you get your >> system >> > back up and running I would strongly suggest running with at least two= , >> > preferably with one on a separate server via NFS. >> > >> > Thanks >> > -Todd >> > >> > On Thu, Mar 4, 2010 at 9:05 AM, mike anderson > >wrote: >> > >> > > We have a single dfs.name.dir directory, in case it's useful the >> contents >> > > are: >> > > >> > > [mike@carr name]$ ls -l >> > > total 8 >> > > drwxrwxr-x 2 mike mike 4096 Mar =A04 11:18 current >> > > drwxrwxr-x 2 mike mike 4096 Oct =A08 16:38 image >> > > >> > > >> > > >> > > >> > > On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon >> wrote: >> > > >> > > > Hi Mike, >> > > > >> > > > Was your namenode configured with multiple dfs.name.dir settings? >> > > > >> > > > If so, can you please reply with "ls -l" from each dfs.name.dir? >> > > > >> > > > Thanks >> > > > -Todd >> > > > >> > > > On Thu, Mar 4, 2010 at 8:57 AM, mike anderson < >> saidtherobot@gmail.com >> > > > >wrote: >> > > > >> > > > > Our hadoop cluster went down last night when the namenode ran ou= t >> of >> > > hard >> > > > > drive space. Trying to restart fails with this exception (see >> below). >> > > > > >> > > > > Since I don't really care that much about losing a days worth of >> data >> > > or >> > > > so >> > > > > I'm fine with blowing away the edits file if that's what it take= s >> (we >> > > > don't >> > > > > have a secondary namenode to restore from). I tried removing the >> edits >> > > > file >> > > > > from the namenode directory, but then it complained about not >> finding >> > > an >> > > > > edits file. I touched a blank edits file and I got the exact sam= e >> > > > > exception. >> > > > > >> > > > > Any thoughts? I googled around a bit, but to no avail. >> > > > > >> > > > > -mike >> > > > > >> > > > > >> > > > > 2010-03-04 10:50:44,768 INFO >> org.apache.hadoop.ipc.metrics.RpcMetrics: >> > > > > Initializing RPC Metrics with hostName=3DNameNode, port=3D54310 >> > > > > 2010-03-04 10:50:44,772 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: >> > > > > carr.projectlounge.com/10.0.16.91:54310 >> > > > > 2010-03-04 < >> > > http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04 >> >10:50:44,773 >> > > > INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >> > > > > Initializing JVM Metrics with processName=3DNameNode, sessionId= =3Dnull >> > > > > 2010-03-04 10:50:44,774 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: >> > > > > Initializing >> > > > > NameNodeMeterics using context >> > > > > object:org.apache.hadoop.metrics.spi.NullContext >> > > > > 2010-03-04 10:50:44,816 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >> > > > fsOwner=3Dpubget,pubget >> > > > > 2010-03-04 10:50:44,817 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >> > > > supergroup=3Dsupergroup >> > > > > 2010-03-04 10:50:44,817 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >> > > > > isPermissionEnabled=3Dtrue >> > > > > 2010-03-04 10:50:44,823 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetri= cs: >> > > > > Initializing FSNamesystemMetrics using context >> > > > > object:org.apache.hadoop.metrics.spi.NullContext >> > > > > 2010-03-04 10:50:44,825 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >> > > > > FSNamesystemStatusMBean >> > > > > 2010-03-04 10:50:44,849 INFO >> > > > org.apache.hadoop.hdfs.server.common.Storage: >> > > > > Number of files =3D 2687 >> > > > > 2010-03-04 10:50:45,092 INFO >> > > > org.apache.hadoop.hdfs.server.common.Storage: >> > > > > Number of files under construction =3D 7 >> > > > > 2010-03-04 10:50:45,095 INFO >> > > > org.apache.hadoop.hdfs.server.common.Storage: >> > > > > Image file of size 347821 loaded in 0 seconds. >> > > > > 2010-03-04 10:50:45,104 INFO >> > > > org.apache.hadoop.hdfs.server.common.Storage: >> > > > > Edits file /mnt/hadoop/name/current/edits of size 4653 edits # 3= 9 >> > > loaded >> > > > in >> > > > > 0 seconds. >> > > > > 2010-03-04 10:50:45,114 ERROR >> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: >> > > > > java.lang.NumberFormatException: For input string: "" >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> java.lang.NumberFormatException.forInputString(NumberFormatException.jav= a:48) >> > > > > at java.lang.Long.parseLong(Long.java:424) >> > > > > at java.lang.Long.parseLong(Long.java:461) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java= :1273) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j= ava:670) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:= 997) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:= 812) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI= mage.java:364) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto= ry.java:87) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys= tem.java:311) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.= java:292) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java= :201) >> > > > > at >> > > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:279= ) >> > > > > at >> > > > > >> > > > > >> > > > >> > > >> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.= java:956) >> > > > > at >> > > > >> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965) >> > > > > >> > > > > 2010-03-04 10:50:45,115 INFO >> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: >> > > > > /************************************************************ >> > > > > SHUTDOWN_MSG: Shutting down NameNode at >> > > > carr.projectlounge.com/10.0.16.91 >> > > > > ************************************************************/ >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >