hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cagdas Gerede" <cagdas.ger...@gmail.com>
Subject Re: Please Help: Namenode Safemode
Date Thu, 24 Apr 2008 18:55:49 GMT
Hi Dhruba,
Thanks for your answer. But I think you missed what I mentioned. I mentioned
that the extenstion is already 0 in my  configuration file.

After spending quite some time on the code, I found the reason. The reason
is dfs.blockreport.initialDelay.
If you do not set this in your config file, then it is 60,000 by default. In
datanodes, a random number between 0-60,000 is chosen.
Then, each datanode delays as long as this random value (in miliseconds) to
send the block report when they register with the namenode. As a result,
this value can be as much as 1 minute. If you want your namenode start
quicker, then you should put a smaller number for
dfs.blockreport.initialDelay.

When I set it to 0, the namenode now starts up in 1-2 seconds.


-- 
------------
Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


On Wed, Apr 23, 2008 at 4:44 PM, dhruba Borthakur <dhruba@yahoo-inc.com>
wrote:

>  By default, there is a variable called dfs.safemode.extension set in
> hadoop-default.xml that is set to 30 seconds. This means that once the
> Namenode has one replica of every block, it still waits for 30 more seconds
> before exiting Safemode.
>
>
>
> dhruba
>
>
>  ------------------------------
>
> *From:* Cagdas Gerede [mailto:cagdas.gerede@gmail.com]
> *Sent:* Wednesday, April 23, 2008 4:37 PM
> *To:* core-user@hadoop.apache.org
> *Cc:* dhruba Borthakur
> *Subject:* Please Help: Namenode Safemode
>
>
>
> I have a hadoop distributed file system with 3 datanodes. I only have 150
> blocks in each datanode. It takes a little more than a minute for namenode
> to start and pass safemode phase.
>
> The steps for namenode start, as much as I understand, are:
> 1) Datanode send a heartbeat to namenode. Namenode tells datanode to send
> blockreport as a piggyback to heartbeat.
> 2) Datanode computes the block report.
> 3) Datanode sends it to Namenode.
> 4) Namenode processes the block report.
> 5) Namenode safe mode thread monitor checks for exiting, and namenode exist
> if threshold is reached and the extension time is passed.
>
> Here are my numbers:
> Step 1) Datanodes send heartbeats every 3 seconds.
> Step 2) Datanode computes the block report. (this takes about 20
> miliseconds - as shown in the datanodes' logs)
> Step 3) No idea? (Depends on the size of blockreport. I suspect this should
> not be more than a couple of seconds).
> Step 4) No idea? Shouldn't be more than a couple of seconds.
> Step 5) Thread checks every second. The extension value in my configuration
> is 0. So there is no wait if threshold is achieved.
>
> Given these numbers, can any body explain where does one minute come from?
> Shouldn't this step take 10-20 seconds?
> Please help. I am very confused.
>
>
>
> --
> ------------
> Best Regards, Cagdas Evren Gerede
> Home Page: http://cagdasgerede.info
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message