hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6764) DN heartbeats may become clumped together
Date Tue, 29 Jul 2014 16:01:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077877#comment-14077877

Daryn Sharp commented on HDFS-6764:

Great minds think alike.  Skipping missed intervals + entropy is exactly what Nathan and I
considered as the solution.  We haven't had a chance to verify.  The entropy required is probably
an random delay for the first heartbeat.  That'll spread out DNs that all connected and blocked
while the NN is blocked during something like lengthy BR processing.

The more interesting part of the puzzle, which the aforementioned changes will probably mask/fix,
is what causes heartbeats to an active to clump together in a semi-rhythmic cycle?  I suspect
full BRs but I think I saw a similar jagged pattern on another cluster...  Will double check
when I have time.

> DN heartbeats may become clumped together
> -----------------------------------------
>                 Key: HDFS-6764
>                 URL: https://issues.apache.org/jira/browse/HDFS-6764
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Daryn Sharp
>         Attachments: Screen Shot 2014-07-28 at 11.12.06 AM.png
> DNs send heartbeats on a fixed schedule based on the last time a heartbeat was sent.
 If the NN takes longer to respond than the heartbeat interval then DNs do not sleep until
the next interval.  Instead, another heartbeat is immediately sent and all DNs begin heartbeating
on the same schedule.

This message was sent by Atlassian JIRA

View raw message