hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
Date Thu, 13 Sep 2012 22:45:07 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-3894:
------------------------------

    Attachment: hdfs-3894.txt

Attached patch fixes the issue:
- I had previously had the injecting proxy ignore close() calls, but in fact the right behavior
is to pass them through, just not inject any errors.
- Had to add QJM.close() calls in some spots in the test case.

I tested this by temporarily changing MiniJournalCluster so that it always started the nodes
on 10001, 10002, and 10003 instead of ephemeral ports. This caused the failure to reproduce
every time I ran the test. With the patch applied, it no longer failed.
                
> QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
> --------------------------------------------------------------------------
>
>                 Key: HDFS-3894
>                 URL: https://issues.apache.org/jira/browse/HDFS-3894
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3894.txt
>
>
> TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into
it, the issue seems to be that it's possible by random chance for an IPC server port to be
reused between two different iterations of the test loop. The client will then pick up and
re-use the existing IPC connection to the old server. However, the old server was shut down
and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client
to get an EOF when it sends the "format()" call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message