hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14689) AM container might leak
Date Wed, 31 Jul 2019 13:46:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897191#comment-16897191

Wei-Chiu Chuang commented on HDFS-14689:

you can actually click on More -> Move to convert this into a YARN jira.

> AM container might leak
> -----------------------
>                 Key: HDFS-14689
>                 URL: https://issues.apache.org/jira/browse/HDFS-14689
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Tao Yang
>            Priority: Major
> There is a risk that AM container might leak when NM exits unexpected meanwhile AM
container is localizing if AM expiry interval (conf-key: yarn.am.liveness-monitor.expiry-interval-ms)
is less than NM expiry interval (conf-key: yarn.nm.liveness-monitor.expiry-interval-ms).
>  RMAppAttempt state changes as follows:
> {noformat}
> LAUNCHED/RUNNING – event:EXPIRED(FinalSavingTransition) 
>  --> FINAL_SAVING – event:ATTEMPT_UPDATE_SAVED(FinalStateSavedTransition / ExpiredTransition:
send AMLauncherEventType.CLEANUP )  --> FAILED
> {noformat}
> AMLauncherEventType.CLEANUP will be handled by AMLauncher#cleanup which internally call
ContainerManagementProtocol#stopContainer to stop AM container via communicating with NM,
if NM can't be connected, it just skip it without any logs.
> I think in this case we can complete the AM container in scheduler when failed to stop
it, so that it will have a chance to be stopped when NM reconnects with RM. 
>  Hope to hear your thoughts? Thank you!

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message