hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk
Date Tue, 03 Feb 2015 15:49:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303471#comment-14303471

Jason Lowe commented on YARN-1778:

Thanks for the analysis and patch, [~zxu]!  I'm wondering if the test is trying to tell us
there really is a problem with FSRMStateStore retries, and therefore fixing the test is actually
masking a real problem that needs to be fixed in the main code.  If I understand the intent
of the test correctly, it's trying to verify that FSRMStateStore will not throw an exception
while namenodes are down or coming back up.  However if we make the test wait until the namenodes
are back up before trying to connect then that defeats most of the point of the test.

I think the critical question is: should the "Namenode still not started" exception be retried
by either the DFSClient layer or by FSRMStateStore?  I think it should, otherwise a client
of FSRMStateStore is going to see this exception in a similar, real-world scenario where the
Namenode was restarted and wonder why the framework didn't auto-retry.

> TestFSRMStateStore fails on trunk
> ---------------------------------
>                 Key: YARN-1778
>                 URL: https://issues.apache.org/jira/browse/YARN-1778
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Xuan Gong
>            Assignee: zhihai xu
>         Attachments: YARN-1778.000.patch

This message was sent by Atlassian JIRA

View raw message