hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
Date Tue, 16 Jul 2013 22:02:49 GMT

    [ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710344#comment-13710344

Bikas Saha commented on YARN-875:

If users have done their homework then the libraries catch stmt will not be executed and we
are fine. If users have not done their homework and we ignore their exceptions then we can
get into bad cases where an allocation from the RM is lost due to exception in onContainersAllocated()
and so the app is hung now because its waiting for that the allocation to happen. That is
not acceptable IMO. These libraries are all freshly written and IMO its better to fail fast
and expose issues than to silently ignore them. If we see a common case of innocuous exceptions
then we can choose to ignore them but we first need to see them in real life usage.

We should fix the circular exception. The last patch attached has a bug in that regard.

Changing to Throwable will not be incompatible because the async library has not yet been
officially released. It did not go out in 2.0.4-alpha.
> Application can hang if AMRMClientAsync callback thread has exception
> ---------------------------------------------------------------------
>                 Key: YARN-875
>                 URL: https://issues.apache.org/jira/browse/YARN-875
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Bikas Saha
>            Assignee: Xuan Gong
>         Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch
> Currently that thread will die and then never callback. App can hang. Possible solution
could be to catch Throwable in the callback and then call client.onError().

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message