hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-774) Datanodes fails to heartbeat when a directory with a large number of blocks is deleted
Date Mon, 04 Dec 2006 18:13:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-774?page=comments#action_12455370 ] 
dhruba borthakur commented on HADOOP-774:

1. Option 2 is simple to implement, very few code change.

2. The namenode migth not have detailed knowledge about the computing resources of a Datanode.
A Datanode that has a faster disk could process a blockinvalidate message faster than a datanode
that has slower disk IO speeds. Thus it is best left to the datanodes to figure out the amount
of computing power they want to spend on deleting blocks. This votes for Option 1.

2, Option 1 seems to be scalable because the namenode has to do bookkeeping for invalidatedNodes
for lesser time. The namenode, once its sends out the blockInvalidate command, can get rid
of the blocks from the recentInvalidateSets map immediately. This might mean that this piece
of memory can be reused earlier. If the namenode adopts Option 2, it might take a while before
all the blocks get sent to the Datanode. The memory for these block objects will hang around

3. With Option1, the speed of blockinvalidation depends on the heartbeat frequency. Thus changing
the heartbeat frequency might have an unintended side-effect on the block reclamation speed
of the cluster.

4. Option 2 has the disadvantage that the Datanode has to throttle the creation of new threads
(and queue if needed) if too-many individual block-invalidate commands arrive within a short
period of time. This code might be non-trivial.

Given the above, I would go ahead and implement Option 2.

> Datanodes fails to heartbeat when a directory with a large number of blocks is deleted
> --------------------------------------------------------------------------------------
>                 Key: HADOOP-774
>                 URL: http://issues.apache.org/jira/browse/HADOOP-774
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
> If a user removes a few files that are huge, it causes the namenode to send BlockInvalidate
command to the relevant Datanodes. The Datanode process the blockInvalidate command as part
of its heartbeat thread. If the number of blocks to be invalidated is huge, the datanode takes
a long time to process it. This causes the datanode to not send new heartbeats to the namenode.
The namenode declares the datanode as dead!
> 1. One option is to process the blockInvalidate as a separate thread from the heartbeat
thread in the Datanode. 
> 2. Another option would be to constrain the namenode to send a max (e.g. 500) blocks
per blockInvalidate message.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message