hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Wed, 08 Apr 2009 20:56:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697199#action_12697199

Suresh Srinivas commented on HADOOP-4584:

Will take care of other comments.
bq. 1.  default scan period is one hour (same as before).. I think it should be much less
often (may be 6 to 24 hours).
I wanted to retain the old behavior of scanning a directory every 1 hour for now. Changing
it to 6 hours, if no one expresses concerns.

bq.   2. Since there is no throttling of directory scan, it is better to randomize the start
time. The datanodes are usually started at the same time, the whole cluster could slow down
at the same time.
Randomizing between 0 and directory scan period?

bq.   5. At patchfile:834 : It updates generation stamp with 'diskGS' without moving the meta
file from prev directory to memBlock's directory. Could that result in block and meta files
in different directories?
I am not sure if I should be moving files. I think it is better to use the file if it exists
in the same directory as the block file. Otherwise, update the GS to grandfather generation

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>         Attachments: 4584.brthread.2.patch, 4584.brthread.3.patch, 4584.brthread.3.patch,
4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.4.patch,
4584.brthread.4.patch, 4584.brthread.4.patch, 4584.hbthread.patch, 4584.patch, 4584.patch,
4584.patch, 4584.patch, 4584.patch, 4584.patch, Design.pdf, Design.pdf
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message