Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 29 Jul 2015 05:35:04 +0000 (UTC)
From: "prakhar jauhari (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12849509.1438055559000.315080.1438148104799@Atlassian.JIRA>
In-Reply-To: <JIRA.12849509.1438055559000@Atlassian.JIRA>
References: <JIRA.12849509.1438055559000@Atlassian.JIRA>
 <JIRA.12849509.1438055559395@arcas>
Subject: [jira] [Comment Edited] (SPARK-9396) Spark yarn allocator does not
 call "removeContainerRequest" for allocated Container requests, resulting
 in bloated ask[] toYarn RM.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/SPARK-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643822#comment-14643822 ] 

prakhar jauhari edited comment on SPARK-9396 at 7/29/15 5:35 AM:
-----------------------------------------------------------------

This is because Yarn's AM client does not remove old container request from its MAP until the application's AM calls removeConatinerRequest for fulfilled container requests.

Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request.

Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled. Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case was for 1 container.

As long as the cluster size is large enough to allocate the bloated container requests, containers are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number of container to launch new executors and release the extra allocated containers. 

The problem increase in case of a long running job with large executor memory requirements. In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM) is of n+1 containers, which might be served by the RM if it still has enough resources, else RM starts reserving cluster resources for a containers which are not even required by spark in the first place. This causes inefficient resource utilization of cluster resources. 

I have added changes for removing fulfilled conatainer requests in spark 1.2 code. Will be creating a PR for same.


was (Author: prakhar088):
This is because Yarn's AM client does not remove old container request from its MAP until the application's AM calls removeConatinerRequest for fulfilled container requests.

Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request.

Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled. Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case was for 1 container.

As long as the cluster size is large enough to allocate the bloated container requests, containers are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number of container to launch new executors and release the extra allocated containers. 

The problem increase in case of a long running job with large executor memory requirements. In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM) is of n+1 containers, which might be served by the RM if it still has enough resources, else RM starts reserving cluster resources for a containers which are not even required by spark in the first place. This causes inefficient resource utilization of cluster resources. 

I have added changes for removing fulfilled conatainer requests in spark 1.2.1 code. Will be creating a PR for same.


> Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9396
>                 URL: https://issues.apache.org/jira/browse/SPARK-9396
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.2.1
>         Environment: Spark-1.2.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster running Linux version 2.6.32.
>            Reporter: prakhar jauhari
>
> Note : Attached logs contain logs that i added (spark yarn allocator side and Yarn client side) for debugging purpose.
> !!!!! My spark job is configured for 2 executors, on killing 1 executor the ask is of 3 !!!!!!!
> On killing a executor  - resource request logs :
> *************Killed container: ask for 3 containers, instead for 1***********
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Will allocate 1 executor containers, each with 2432 MB memory including 384 MB overhead
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: numExecutors: 1
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: host preferences is empty
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Container request (host: Any, priority: 1, capability: <memory:2432, vCores:4>
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: this.ask = [{Priority: 1, Capability: <memory:2432, vCores:4>, # Containers: 3, Location: *, Relax Locality: true}]
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: allocateRequest = ask { priority{ priority: 1 } resource_name: "*" capability { memory: 2432 virtual_cores: 4 } num_containers: 3 relax_locality: true } blacklist_request { } response_id: 354 progress: 0.1


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org