hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Ryakhovskiy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14422) Fix TestFastFailWithoutTestUtil
Date Sun, 03 Jul 2016 16:09:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360593#comment-15360593
] 

Konstantin Ryakhovskiy commented on HBASE-14422:
------------------------------------------------

[~stack] the issue reproduced with additional Thread.sleep(..) before latch.await():
I have added this Thread.sleep(..) to simulate bad enough hardware, like a long context switch.
the log: https://builds.apache.org/job/PreCommit-HBASE-Build/2506/testReport/org.apache.hadoop.hbase.client/TestFastFailWithoutTestUtil/testPreemptiveFastFailException50Times/
at the iteration #8 (see line Time-limited test #7) Thread2 is in FastFail mode (TT-2 difference=1),
it means, that when the code is in the method PreemptiveFastFailInterceptor#inFastFail(),
then EnvironmentEdge.currentTimeMillis is 1ms greater than (time of the first failure + fast
fail threshold).

To make the test more robust, we can increment done counter without verification, so, instead
of line:
if (pffe) done.incrementAndGet();
we can write directly:
done.incrementAndGet();

will that work from your perspective?


> Fix TestFastFailWithoutTestUtil
> -------------------------------
>
>                 Key: HBASE-14422
>                 URL: https://issues.apache.org/jira/browse/HBASE-14422
>             Project: HBase
>          Issue Type: Task
>          Components: test
>            Reporter: stack
>            Assignee: Konstantin Ryakhovskiy
>            Priority: Minor
>              Labels: beginner
>         Attachments: HBASE-14422.master.001.patch, HBASE-14422.master.002.patch, HBASE-14422.master.003.patch,
HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, HBASE-14422.master.006.patch,
HBASE-14422.master.007.patch, HBASE-14422.master.008.patch, HBASE-14422.master.009.patch,
HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, HBASE-14422.master.012.patch,
HBASE-14422.master.013.patch, HBASE-14422.master.014.patch, HBASE-14422.master.015.patch,
HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does testInterceptorIntercept50Times
Usually it passes but on occasion, the latching between thread 1 and thread 2 goes awry and
the test hangs and the test hangs out. Depends on the hardware but it seems to happen about
one in four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and just let the
test keep going.
> This issue is about digging in on figuring why the hang up on latches and then fixing
it so the test doesn't have to have the latch timeout. Hopefully the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message