hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Tue, 24 Feb 2009 19:57:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676391#action_12676391

Konstantin Shvachko commented on HADOOP-4584:

So what is wrong with just going with the original proposal in this jira that is: prepare
block reports in a separate thread without delaying heartbeats and other commands, and sending
them as soon as they are ready by {{offerService()}}. This seem to be the mission declared
by the issue, and changing block reports to be memory based is an add-on, which is not required
to solve the problem stated.
I understand Dhruba's concerns about reliability. I can add to this that memory based reports
can also slow down cleaning up disks from unnecessary blocks, which may be critical if the
data-node is close to running out of disk space.
My approach would be to drop the in-memory block report part and commit the rest. The in-memory
reports can be discussed in a subsequent issue.
I think that would be enough of a change by itself, because there may be a dangerous race
condition between {{blockReceived()}} and {{blockReport()}} if it is not done right, as we
had seen before.

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message