hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike anderson <saidthero...@gmail.com>
Subject Re: can't start namenode
Date Thu, 04 Mar 2010 20:22:13 GMT
Todd, That did the trick. Thanks to everyone for the quick responses
and effective suggestions.

-Mike


On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon <todd@cloudera.com> wrote:
> Hi Mike,
>
> Since you removed the edits, you restored to an earlier version of the
> namesystem. Thus, any files that were deleted since the last checkpoint will
> have come back. But, the blocks will have been removed from the datanodes.
> So, the NN is complaining since there are some files that have missing
> blocks. That is to say, some of your files are corrupt (ie unreadable
> because the data is gone but the metadata is still there)
>
> In order to force it out of safemode, you can run hadoop dfsadmin -safemode
> leave
> You should also run "hadoop fsck" in order to determine which files are
> broken, and then probably use the -delete option to remove their metadata.
>
> Thanks
> -Todd
>
> On Thu, Mar 4, 2010 at 11:37 AM, mike anderson <saidtherobot@gmail.com>wrote:
>
>> Removing edits.new and starting worked, though it didn't seem that
>> happy about it. It started up nonetheless, in safe mode. Saying that
>> "The ratio of reported blocks 0.9948 has not reached the threshold
>> 0.9990. Safe mode will be turned off automatically." Unfortunately
>> this is holding up the restart of hbase.
>>
>> About how long does it take to exit safe mode? is there anything I can
>> do to expedite the process?
>>
>>
>>
>> On Thu, Mar 4, 2010 at 1:54 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> >
>> > Sorry, I actually meant ls -l from name.dir/current/
>> >
>> > Having only one dfs.name.dir isn't recommended - after you get your
>> system
>> > back up and running I would strongly suggest running with at least two,
>> > preferably with one on a separate server via NFS.
>> >
>> > Thanks
>> > -Todd
>> >
>> > On Thu, Mar 4, 2010 at 9:05 AM, mike anderson <saidtherobot@gmail.com
>> >wrote:
>> >
>> > > We have a single dfs.name.dir directory, in case it's useful the
>> contents
>> > > are:
>> > >
>> > > [mike@carr name]$ ls -l
>> > > total 8
>> > > drwxrwxr-x 2 mike mike 4096 Mar  4 11:18 current
>> > > drwxrwxr-x 2 mike mike 4096 Oct  8 16:38 image
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon <todd@cloudera.com>
>> wrote:
>> > >
>> > > > Hi Mike,
>> > > >
>> > > > Was your namenode configured with multiple dfs.name.dir settings?
>> > > >
>> > > > If so, can you please reply with "ls -l" from each dfs.name.dir?
>> > > >
>> > > > Thanks
>> > > > -Todd
>> > > >
>> > > > On Thu, Mar 4, 2010 at 8:57 AM, mike anderson <
>> saidtherobot@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Our hadoop cluster went down last night when the namenode ran
out
>> of
>> > > hard
>> > > > > drive space. Trying to restart fails with this exception (see
>> below).
>> > > > >
>> > > > > Since I don't really care that much about losing a days worth
of
>> data
>> > > or
>> > > > so
>> > > > > I'm fine with blowing away the edits file if that's what it takes
>> (we
>> > > > don't
>> > > > > have a secondary namenode to restore from). I tried removing
the
>> edits
>> > > > file
>> > > > > from the namenode directory, but then it complained about not
>> finding
>> > > an
>> > > > > edits file. I touched a blank edits file and I got the exact
same
>> > > > > exception.
>> > > > >
>> > > > > Any thoughts? I googled around a bit, but to no avail.
>> > > > >
>> > > > > -mike
>> > > > >
>> > > > >
>> > > > > 2010-03-04 10:50:44,768 INFO
>> org.apache.hadoop.ipc.metrics.RpcMetrics:
>> > > > > Initializing RPC Metrics with hostName=NameNode, port=54310
>> > > > > 2010-03-04 10:50:44,772 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up
at:
>> > > > > carr.projectlounge.com/10.0.16.91:54310
>> > > > > 2010-03-04 <
>> > > http://carr.projectlounge.com/10.0.16.91:54310%0A2010-03-04
>> >10:50:44,773
>> > > > INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> > > > > Initializing JVM Metrics with processName=NameNode, sessionId=null
>> > > > > 2010-03-04 10:50:44,774 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>> > > > > Initializing
>> > > > > NameNodeMeterics using context
>> > > > > object:org.apache.hadoop.metrics.spi.NullContext
>> > > > > 2010-03-04 10:50:44,816 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> > > > fsOwner=pubget,pubget
>> > > > > 2010-03-04 10:50:44,817 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> > > > supergroup=supergroup
>> > > > > 2010-03-04 10:50:44,817 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> > > > > isPermissionEnabled=true
>> > > > > 2010-03-04 10:50:44,823 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> > > > > Initializing FSNamesystemMetrics using context
>> > > > > object:org.apache.hadoop.metrics.spi.NullContext
>> > > > > 2010-03-04 10:50:44,825 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> > > > > FSNamesystemStatusMBean
>> > > > > 2010-03-04 10:50:44,849 INFO
>> > > > org.apache.hadoop.hdfs.server.common.Storage:
>> > > > > Number of files = 2687
>> > > > > 2010-03-04 10:50:45,092 INFO
>> > > > org.apache.hadoop.hdfs.server.common.Storage:
>> > > > > Number of files under construction = 7
>> > > > > 2010-03-04 10:50:45,095 INFO
>> > > > org.apache.hadoop.hdfs.server.common.Storage:
>> > > > > Image file of size 347821 loaded in 0 seconds.
>> > > > > 2010-03-04 10:50:45,104 INFO
>> > > > org.apache.hadoop.hdfs.server.common.Storage:
>> > > > > Edits file /mnt/hadoop/name/current/edits of size 4653 edits
# 39
>> > > loaded
>> > > > in
>> > > > > 0 seconds.
>> > > > > 2010-03-04 10:50:45,114 ERROR
>> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode:
>> > > > > java.lang.NumberFormatException: For input string: ""
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>> > > > > at java.lang.Long.parseLong(Long.java:424)
>> > > > > at java.lang.Long.parseLong(Long.java:461)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:670)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>> > > > > at
>> > > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>> > > > > at
>> > > > >
>> > > > >
>> > > >
>> > >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>> > > > > at
>> > > >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>> > > > >
>> > > > > 2010-03-04 10:50:45,115 INFO
>> > > > > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> > > > > /************************************************************
>> > > > > SHUTDOWN_MSG: Shutting down NameNode at
>> > > > carr.projectlounge.com/10.0.16.91
>> > > > > ************************************************************/
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message