hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Fri, 27 Feb 2009 08:25:14 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677280#action_12677280

Konstantin Shvachko commented on HADOOP-4584:

> May be (3) still has some advantage : could you give a specific example that shows the

The example is as I mentioned before: In (2) when blockReport is scanning directories, which
may take minutes according to Suresh, blockRecieved can not be processed, and the commands
returned from the name-node
in reply to heartbeats like replicate and delete blocks will just accumulate on the command
queue and wait until block report is done. True, the data-node will not die, but it will still
be frozen in offerService thread.

I am just proposing to do with block reports the same we did with received blocks: when they
arrive we place them into {{receivedBlockList}}, and offerService sends blockRecieved when
the list is not empty. Block reports are prepared by a separate thread and placed into {{readyBlockReport}}
member. offerService sends it whenever the member is not null.

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message