hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-641) Name-node should demand a block report from resurrected data-nodes.
Date Thu, 26 Oct 2006 01:40:17 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-641?page=all ]

Konstantin Shvachko updated HADOOP-641:

    Attachment: ResurrectDN.patch

Sorry attached a wrong version the first time.

> Name-node should demand a block report from resurrected data-nodes.
> -------------------------------------------------------------------
>                 Key: HADOOP-641
>                 URL: http://issues.apache.org/jira/browse/HADOOP-641
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.1.0, 0.7.2
>            Reporter: Konstantin Shvachko
>         Attachments: ResurrectDN.patch
> 1. This bug contributed to the crash discussed in HADOOP-572.
> The problem is that when the name-node is busy, and is not able to process all requests
from its clients,
> it can consider one of data-nodes dead and discard its blocks sending them into the neededRelications
> When it finally gets the heartbeat from this data-node it resurrects the node, but not
the data-node blocks,
> and hence continues to replicate them.
> Of course, eventually the name-node will receive the block report from this data-node,
but it could take up
> to 1 hour. During this time it proceeds with unnecessary block replications, which could
be avoided if the
> data-node sent its block report right after the resurrection.
> I modified code so that the name-node requests block report if it receives a heartbeat
from a dead data-node.
> I introduced a new command type in the BlockCommand class.
> I replaced multiple boolean indicators of the command types by one enum field.
> I changed the DatanodeProtocol version.
> 2. This patch also includes a fix for the data-node registration. If a data-nodes times
out during registration
> it silently exits, which is hard to notice with a large number of nodes. This patch places
registration in a loop,
> so that it could retry.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message