hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: a million log lines from one job tracker startup
Date Wed, 26 Sep 2007 16:00:30 GMT

It looks like you have a problem with insufficient replication or a
corrupted file.  This can happen if you are running with low replication
count and have lost a datanode or few.  I have also seen this happen
associated with somewhat aggressive nuking of hadoop jobs or processes or
overfull disk (I am not sure which).  In that case, I wound up with missing
blocks for map reduce intermediate output.

The simplest, but almost always unsatisfactory repair is to simply nuke the
contents of HDFS and reload cleanly.

It is also possible that the namenode will eventually be able to repair the
situation.

You may also be able to repair the file system piece-meal if the persistent
problems that you are experiencing have to do with files that you don¹t care
about.  To do this, you would use hadoop fsck / to find what the problems
really are, turn off safe mode by hand (warning, Will Robinson, DANGER), and
delete the files that are causing problems.  This is somewhat laborious.  I
think that there is a ³force repair² option on fsck, but I was unable to get
that right.

If you are a real cowboy, you can simply turn off safe mode and go forward.
If the goobered files are not important to you, this can let you get some
work.  This is a really bad idea, of course, since you are circumventing
some really important safe-guards.

My own impression of having experienced this as well as having watched files
slooowwly be replicated more widely after changing the replication count for
a bunch of files is that I would love to be able to tell the namenode to be
very aggressive about repairing replication issues.  Normally, the slow pace
that is used for fixing under-replication is a good thing since it allows
you to continue with additional work while replication goes on, but there
are situations where you really want the issues resolved sooner.


On 9/26/07 7:25 AM, "kate rhodes" <masukomi@gmail.com> wrote:

>> 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
>> problem cleaning system directory:
>> /home/krhodes/hadoop_files/temp/krhodes/mapred/system
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.dfs.SafeModeException: Cannot delete
>> /home/krhodes/hadoop_files/temp/krhodes/mapred/system. Name node is in
>> safe mode.
>> Safe mode will be turned off automatically.
>>         at 
>> org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1222)
>>         at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
>>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at 


Mime
View raw message