hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Wed, 04 Mar 2009 20:19:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678896#action_12678896
] 

Raghu Angadi commented on HADOOP-4584:
--------------------------------------

> Does that make me a supporter of "Option 3"?
It looks like. 

It is a good point about loosing "throttling" and "indiating to admin about slow datanodes".
But fundamentally that is not job of a heartbeat. Those are couple of useful things piggy
backed on current heartbeats. Strictly, it is better to make HB report some number indicating
backlog at the DataNode rather than delaying HB. HB should only only mean "can this datanode
be used or not". 

That said, I am fine with not having a seperate HB threads, as long as we are going to move
hardware dependent expensive operations like deleting blocks out of offerservice/HB thread
(here or in a new jira).
 

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch,
4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message