hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-337) RM handles killed application tracking URL poorly
Date Thu, 08 Aug 2013 15:12:48 GMT

     [ https://issues.apache.org/jira/browse/YARN-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated YARN-337:
----------------------------

    Attachment: YARN-337.patch

Patch that sets the tracking URL to the RM app page when an AM attempt is killed.  Also refactored
the places where this was done for FAILED attempts to better cover all the various ways an
AM attempt can fail.

As for the unregister attempt failure, I'm tempted to leave that as-is since there will always
be races between YARN-level kill/fail and apps unregistering.  As long as we point to the
RM app page when something goes wrong, at least the user has something to start with to diagnose
the problem rather than a bad link to nowhere.
                
> RM handles killed application tracking URL poorly
> -------------------------------------------------
>
>                 Key: YARN-337
>                 URL: https://issues.apache.org/jira/browse/YARN-337
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>              Labels: usability
>         Attachments: YARN-337.patch
>
>
> When the ResourceManager kills an application, it leaves the proxy URL redirecting to
the original tracking URL for the application even though the ApplicationMaster is no longer
there to service it.  It should redirect it somewhere more useful, like the RM's web page
for the application, where the user can find that the application was killed and links to
the AM logs.
> In addition, sometimes the AM during teardown from the kill can attempt to unregister
and provide an updated tracking URL, but unfortunately the RM has "forgotten" the AM due to
the kill and refuses to process the unregistration.  Instead it logs:
> {noformat}
> 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId
doesnt exist in cache appattempt_1357575694478_28614_000001
> {noformat}
> It should go ahead and process the unregistration to update the tracking URL since the
application offered it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message