hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
Date Wed, 04 Mar 2009 02:28:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678553#action_12678553
] 

Konstantin Shvachko commented on HADOOP-4584:
---------------------------------------------

Separating a HB thread from the main offerService thread has the following disadvantages:
# This does not remove contention on processing blocks reports.
That is, the data-node is still blocked preparing block report and cannot do anything useful
like send blockReceived or process commands from the name-node. The only good thing is that
it does not die.
# We loose automatic data-node activity throttling with this. 
Meaning that while the data-node is busy it still sends heartbeats and name-node replies with
commands, which are piled up in the queue because the DN cannot process them.
This can probably be solved with a smart command queue maintenance or by adjusting of heartbeat
frequency with respect to the length of the queue, but will require more work and very thorough
tuning.
# Related to previous. Administrators will no longer be able to judge that a data-node is
in trouble by just looking at its heartbeat interval.

So I would argue to keep HB processing in the main offerService loop, but rather separate
the block report processing into a separate thread.
In general we should keep all heavy-weight operation like delete-blocks away from the offer
service loop. They can be done in separate threads.
Does that make me a supporter of "Option 3"?

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch,
4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes
to generate a block report. It causes the datanode not able to send heartbeat to NameNode
every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly
decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories
and generating block report, and executes the requests sent by NameNode; Another thread is
for sending heartbeats, block reports, and picking up the requests from NameNode. By having
these two threads, the sending of heartbeats will not get delayed by any slow block report
or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message