hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
Date Thu, 13 Sep 2012 22:51:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455393#comment-13455393
] 

Aaron T. Myers commented on HDFS-3894:
--------------------------------------

+1, the patch looks good to me.
                
> QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
> --------------------------------------------------------------------------
>
>                 Key: HDFS-3894
>                 URL: https://issues.apache.org/jira/browse/HDFS-3894
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3894.txt
>
>
> TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into
it, the issue seems to be that it's possible by random chance for an IPC server port to be
reused between two different iterations of the test loop. The client will then pick up and
re-use the existing IPC connection to the old server. However, the old server was shut down
and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client
to get an EOF when it sends the "format()" call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message