hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: dfs datanode heartbeats and getBlockwork requests
Date Tue, 04 Apr 2006 16:58:07 GMT
Eric Baldeschwieler wrote:
> If we moved to a scheme where the name node was just given a small  
> number of blocks with each heartbeat, there would be no reason to not  
> start reporting blocks immediately, would there?

There would still be a small storm of un-needed replications on startup. 
  Say it takes a minute at startup for all data nodes to report their 
complete block lists to the name node.  If heartbeats are every 3 
seconds, then all but the last data node to report in would be handed 20 
small lists of blocks to start replicating.  And the switches could be 
saturated doing a lot of un-needed transfers, which would slow startup. 
  Then, for the next minute after startup, the nodes would be told to 
delete blocks that are now over-replicated.  We'd like startup to be as 
fast and painless as possible.  Waiting a bit before checking to see if 
blocks are over- or under-replicated seems a good way.


View raw message