hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7888) TestFailoverProxy fails intermittently on trunk
Date Wed, 07 Dec 2011 23:22:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164829#comment-13164829
] 

Jason Lowe commented on HADOOP-7888:
------------------------------------

Before I submitted the patch I stepped through the code with the debugger to make sure I was
seeing the two threads synchronizing within the invokeMethod(), so I have high confidence
it should address the issue.  RE: failure rate, I was seeing it very intermittently when running
the test directly via Eclipse, but then on my machine I can see the issue nearly 100% (e.g.:
34 out of 35 tries) with this build command:

mvn test -Dtest=TestFailoverProxy

With the patch I've never seen it fail from within Eclipse nor from the build command even
when placed in a test loop.
                
> TestFailoverProxy fails intermittently on trunk
> -----------------------------------------------
>
>                 Key: HADOOP-7888
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7888
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.24.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: hadoop-7888.patch
>
>
> TestFailoverProxy can fail intermittently with the failures occurring in testConcurrentMethodFailures().
 The test has a race condition where the two threads may be sequentially invoking the unreliable
interface rather than concurrently.  Currently the proxy provider's getProxy() method contains
the thread synchronization to enforce a concurrent invocation, but examining the source to
RetryInvocationHandler.invoke() shows that the call to getProxy() during failover is too late
to enforce a truly concurrent invocation.
> For this particular test, one thread could race ahead and block on the CountDownLatch
in getProxy() before the other thread even enters RetryInvocationHandler.invoke().  If that
happens the second thread will cache the newly updated value for proxyProviderFailoverCount,
since the failover has mostly been processed by the original thread.  Therefore the second
thread ends up assuming no other thread is present, performs a failover, and the test fails
because two failovers occurred instead of one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message