hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balanagireddy Mudiam <balanagiredd...@gmail.com>
Subject HDFS safemode recovery take more than an hour
Date Fri, 07 May 2010 22:17:20 GMT

We are running our cluster on Amazon EC2. we are using cloudera
scripts to setup hadoop. On the master node, we start below services.

609   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop-daemon.sh start namenode'
610   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop-daemon.sh start secondarynamenode'
611   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop-daemon.sh start jobtracker'
613   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop dfsadmin -safemode wait'

On the slave machine, we run the below services.

625   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop-daemon.sh start datanode'
626   $AS_HADOOP '"$HADOOP_HOME"/bin/hadoop-daemon.sh start tasktracker'

The main problem we are facing is, hdfs safemode recovery is taking
more than an hour and this is causing delays in our job completion.

Below are the main log messages.

1. domU-12-31-39-0A-34-61.compute-1.internal 10/05/05 20:44:19 INFO
ipc.Client: Retrying connect to server:
ec2-184-73-64-64.compute-1.amazonaws.com/ Already
tried 21 time(s).
2. The reported blocks 283634 needs additional 322258 blocks to reach
the threshold 0.9990 of total blocks 606499. Safe mode will be turned
off automatically.

The first message is thrown in task trackers log because, job tracker
is not started. job tracker didn't start because of hdfs safemode

The second message is thrown during the recovery process.

Is there something I am doing wrong?
How much time does normal hdfs safemode recovery takes?
Will there be any speedup, by not starting task trackers till job
tracker is started?
Are there any known hadoop problems on amazon cluster?

Thanks for your help.

Bala Mudiam

View raw message