Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Mon, 16 Feb 2015 21:28:12 +0000 (UTC)
From: "Xuan Gong (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12725948.1404800986000.58507.1424122092831@Atlassian.JIRA>
In-Reply-To: <JIRA.12725948.1404800986000@Atlassian.JIRA>
References: <JIRA.12725948.1404800986000@Atlassian.JIRA>
 <JIRA.12725948.1404800986322@arcas>
Subject: [jira] [Commented] (YARN-2261) YARN should have a way to run
 post-application cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323321#comment-14323321 ] 

Xuan Gong commented on YARN-2261:
---------------------------------

Thanks for the comments. Steve.

bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore max...I'd advocate less mempory, but if pmem limits are turned on that's dangerous.

bq. would there be any actual/best effort offerings of the interval between AM termination and clean up scheduling?

I thought about this. 
* request the resource for clean-up container separately after the application is finished/failed/killed. In this case, the clean-up container can has its own resource requirement. As vinod's comment,  Cleanup container may not get resources because cluster may have gotten busy after the final AM exit.
* request the resource for the clean-up container at the same time when we request resource for AM container. And we can reserve the resource for the clean-up container, after the final AM exists, we use this reserved resource to launch the clean-up container.  In this case, the clean-up container can has its own resource requirement. But this option is not ideal. Because AM does not know whether it is the final. Even the RM does not know whether the current attempt is the final or not. RM only knows whether the previous attempt is final when it decides whether need to launch the next attempt. So, we need to request the resource for clean-up container every-time when we request resource for AM container. If current AM container is not the final, we will waste the resource.
* reuse the AM container resource as I proposed. If we have the feature (resize the container resource) ready, we could definitely let clean-up container has its own resource requirement.

Those are all the options that I can think for clean-up container scheduling, and that is why I propose that we can just reuse the AM container resource.

bq. My token concern is related to long lived apps: what tokens will they get/?

Currently, we could just give all the latest tokens which the AM has. I understand that for LRS apps, this is not enough. But i think that AM has the similar issue for the token renew/token update issue, we could fix those together.

bq. How does this mix up with pre-emption?

This is a good point. The resource for clean-up container still belongs to the application's resource. I think that we could do:
* if the container is clean-up container, we can not pre-empt it
OR
* if the clean-up container is pre-empted, we can just simply stop the clean-up process without retry, and mark as clean-up failure.


> YARN should have a way to run post-application cleanup
> ------------------------------------------------------
>
>                 Key: YARN-2261
>                 URL: https://issues.apache.org/jira/browse/YARN-2261
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)