spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "prakhar jauhari (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-9396) Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM.
Date Wed, 29 Jul 2015 05:35:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643822#comment-14643822
] 

prakhar jauhari edited comment on SPARK-9396 at 7/29/15 5:35 AM:
-----------------------------------------------------------------

This is because Yarn's AM client does not remove old container request from its MAP until
the application's AM calls removeConatinerRequest for fulfilled container requests.

Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request.

Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled.
Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case
was for 1 container.

As long as the cluster size is large enough to allocate the bloated container requests, containers
are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number
of container to launch new executors and release the extra allocated containers. 

The problem increase in case of a long running job with large executor memory requirements.
In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM)
is of n+1 containers, which might be served by the RM if it still has enough resources, else
RM starts reserving cluster resources for a containers which are not even required by spark
in the first place. This causes inefficient resource utilization of cluster resources. 

I have added changes for removing fulfilled conatainer requests in spark 1.2 code. Will be
creating a PR for same.



was (Author: prakhar088):
This is because Yarn's AM client does not remove old container request from its MAP until
the application's AM calls removeConatinerRequest for fulfilled container requests.

Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request.

Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled.
Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case
was for 1 container.

As long as the cluster size is large enough to allocate the bloated container requests, containers
are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number
of container to launch new executors and release the extra allocated containers. 

The problem increase in case of a long running job with large executor memory requirements.
In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM)
is of n+1 containers, which might be served by the RM if it still has enough resources, else
RM starts reserving cluster resources for a containers which are not even required by spark
in the first place. This causes inefficient resource utilization of cluster resources. 

I have added changes for removing fulfilled conatainer requests in spark 1.2.1 code. Will
be creating a PR for same.


> Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests,
resulting in bloated ask[] toYarn RM.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-9396
>                 URL: https://issues.apache.org/jira/browse/SPARK-9396
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.2.1
>         Environment: Spark-1.2.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster
running Linux version 2.6.32.
>            Reporter: prakhar jauhari
>
> Note : Attached logs contain logs that i added (spark yarn allocator side and Yarn client
side) for debugging purpose.
> !!!!! My spark job is configured for 2 executors, on killing 1 executor the ask is of
3 !!!!!!!
> On killing a executor  - resource request logs :
> *************Killed container: ask for 3 containers, instead for 1***********
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Will allocate 1 executor containers,
each with 2432 MB memory including 384 MB overhead
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: numExecutors: 1
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: host preferences is empty
> 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Container request (host: Any, priority:
1, capability: <memory:2432, vCores:4>
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: this.ask
= [{Priority: 1, Capability: <memory:2432, vCores:4>, # Containers: 3, Location: *,
Relax Locality: true}]
> 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: allocateRequest
= ask { priority{ priority: 1 } resource_name: "*" capability { memory: 2432 virtual_cores:
4 } num_containers: 3 relax_locality: true } blacklist_request { } response_id: 354 progress:
0.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message