hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
Date Wed, 01 Jun 2016 00:53:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308980#comment-15308980
] 

Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM:
---------------------------------------------------------------------

Thanks for the explanation.  It sounds like the race condition is that the ShortCircuitRegistry
on the DN needs to be informed about the client's decision that short-circuit is not working
for the block, and this RPC takes time to arrive.  That background process races with completing
the TCP read successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-    void accept(HashMap<ShmId, RegisteredShm> segments,
+    boolean accept(HashMap<ShmId, RegisteredShm> segments,
                 HashMultimap<ExtendedBlockId, Slot> slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While you might find
a boolean convenient, some other potential users of the interface might not find it useful.
 Instead, just have your closure modify a {{final MutableBoolean}} declared nearby.

{code}
+    }, 100, 10000);
{code}
It seems like we could lower the latency here (perhaps check every 10 ms) and lengthen the
timeout.  Since the test timeouts are generally 60s, I don't think it makes sense to make
this timeout shorter than that.

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in reviews.


was (Author: cmccabe):
Thanks for the explanation.  It sounds like the race condition is that the ShortCircuitRegistry
on the DN needs to be informed about the client's decision that short-circuit is not working
for the block, and this RPC takes time to arrive.  That background process races with completing
the TCP read successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-    void accept(HashMap<ShmId, RegisteredShm> segments,
+    boolean accept(HashMap<ShmId, RegisteredShm> segments,
                 HashMultimap<ExtendedBlockId, Slot> slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While you might find
a boolean convenient, some other potential users of the interface would have no use for it.
 Instead, just have your closure modify a {{final MutableBoolean}} declared nearby.

{code}
+    }, 100, 10000);
{code}
No reason to make this shorter than the test limit, surely?

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in reviews.

> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> --------------------------------------------------------------------
>
>                 Key: HDFS-9466
>                 URL: https://issues.apache.org/jira/browse/HDFS-9466
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs, hdfs-client
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
> 	at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
> 	at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
> 	at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message