hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9911) TestDataNodeLifeline Fails intermittently
Date Fri, 09 Dec 2016 06:20:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734468#comment-15734468
] 

Vinayakumar B commented on HDFS-9911:
-------------------------------------

I think analysis of [~tasanuma0829] makes sense. There is a chance that LifeLineSender sends
the lifeline before BPServiceActor sends the heartbeat and postpones the next lifeline.
I think the problem is in {{BPServiceActor#Scheduler}} initial value of {{nextLifelineTime}}
is same as {{nextHeartbeatTime}} and its {{monotonicNow()}}, so whichever thread starts first,
will send its message. But first Lifeline should atleast wait for {{lifelineIntervalMs}} or
{{heartbeatIntervalMs}}, so that heartbeat can go first. When the heartbeat sent successfully,
then onwards lifeline messages will be scheduled properly.

So following change in {{BPServiceActor}} would do the needful I hope.
{code}@@ -1063,7 +1068,7 @@ private void sendLifeline() throws IOException {
     volatile long nextHeartbeatTime = monotonicNow();
 
     @VisibleForTesting
-    volatile long nextLifelineTime = monotonicNow();
+    volatile long nextLifelineTime;
 
     @VisibleForTesting
     volatile long lastBlockReportTime = monotonicNow();
@@ -1086,6 +1091,7 @@ private void sendLifeline() throws IOException {
       this.heartbeatIntervalMs = heartbeatIntervalMs;
       this.lifelineIntervalMs = lifelineIntervalMs;
       this.blockReportIntervalMs = blockReportIntervalMs;
+      scheduleNextLifeline(monotonicNow());
     }
 
     // This is useful to make sure NN gets Heartbeat before Blockreport
{code}


> TestDataNodeLifeline  Fails intermittently
> ------------------------------------------
>
>                 Key: HDFS-9911
>                 URL: https://issues.apache.org/jira/browse/HDFS-9911
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.0
>            Reporter: Anu Engineer
>            Assignee: Chris Nauroth
>             Fix For: 2.8.0
>
>
> In HDFS-1312 branch, we have a failure for this test.
> {{org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime}}
> {noformat}
> Error Message
> Expect metrics to count no lifeline calls. expected:<0> but was:<1>
> Stacktrace
> java.lang.AssertionError: Expect metrics to count no lifeline calls. expected:<0>
but was:<1>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline.testNoLifelineSentIfHeartbeatsOnTime(TestDataNodeLifeline.java:256)
> {noformat}
> Details can be found here.
> https://builds.apache.org/job/PreCommit-HDFS-Build/14726/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeLifeline/testNoLifelineSentIfHeartbeatsOnTime/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message