Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25BE318A5D for ; Wed, 29 Jul 2015 05:35:05 +0000 (UTC) Received: (qmail 20824 invoked by uid 500); 29 Jul 2015 05:35:04 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 20787 invoked by uid 500); 29 Jul 2015 05:35:04 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 20733 invoked by uid 99); 29 Jul 2015 05:35:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2015 05:35:04 +0000 Date: Wed, 29 Jul 2015 05:35:04 +0000 (UTC) From: "prakhar jauhari (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SPARK-9396) Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643822#comment-14643822 ] prakhar jauhari edited comment on SPARK-9396 at 7/29/15 5:35 AM: ----------------------------------------------------------------- This is because Yarn's AM client does not remove old container request from its MAP until the application's AM calls removeConatinerRequest for fulfilled container requests. Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request. Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled. Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case was for 1 container. As long as the cluster size is large enough to allocate the bloated container requests, containers are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number of container to launch new executors and release the extra allocated containers. The problem increase in case of a long running job with large executor memory requirements. In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM) is of n+1 containers, which might be served by the RM if it still has enough resources, else RM starts reserving cluster resources for a containers which are not even required by spark in the first place. This causes inefficient resource utilization of cluster resources. I have added changes for removing fulfilled conatainer requests in spark 1.2 code. Will be creating a PR for same. was (Author: prakhar088): This is because Yarn's AM client does not remove old container request from its MAP until the application's AM calls removeConatinerRequest for fulfilled container requests. Spark-1.2 : Spark's AM does not call removeConatinerRequest for fulfilled container request. Spark-1.3 : calls removeConatinerRequest for the container requests it can map to be fulfilled. Tried the same test case of killing one executor with spark-1.3 and the ask[] in this case was for 1 container. As long as the cluster size is large enough to allocate the bloated container requests, containers are sent to spark yarn allocator in allocate response, spark yarn allocator uses missing number of container to launch new executors and release the extra allocated containers. The problem increase in case of a long running job with large executor memory requirements. In this case when ever a executor gets killed, the next ask to yarn Resource manager (RM) is of n+1 containers, which might be served by the RM if it still has enough resources, else RM starts reserving cluster resources for a containers which are not even required by spark in the first place. This causes inefficient resource utilization of cluster resources. I have added changes for removing fulfilled conatainer requests in spark 1.2.1 code. Will be creating a PR for same. > Spark yarn allocator does not call "removeContainerRequest" for allocated Container requests, resulting in bloated ask[] toYarn RM. > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-9396 > URL: https://issues.apache.org/jira/browse/SPARK-9396 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.2.1 > Environment: Spark-1.2.1 on hadoop-yarn-2.4.0 cluster. All servers in cluster running Linux version 2.6.32. > Reporter: prakhar jauhari > > Note : Attached logs contain logs that i added (spark yarn allocator side and Yarn client side) for debugging purpose. > !!!!! My spark job is configured for 2 executors, on killing 1 executor the ask is of 3 !!!!!!! > On killing a executor - resource request logs : > *************Killed container: ask for 3 containers, instead for 1*********** > 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Will allocate 1 executor containers, each with 2432 MB memory including 384 MB overhead > 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: numExecutors: 1 > 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: host preferences is empty > 15/07/15 10:49:01 INFO yarn.YarnAllocationHandler: Container request (host: Any, priority: 1, capability: > 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: this.ask = [{Priority: 1, Capability: , # Containers: 3, Location: *, Relax Locality: true}] > 15/07/15 10:49:01 INFO impl.AMRMClientImpl: prakhar : AMRMClientImpl : allocate: allocateRequest = ask { priority{ priority: 1 } resource_name: "*" capability { memory: 2432 virtual_cores: 4 } num_containers: 3 relax_locality: true } blacklist_request { } response_id: 354 progress: 0.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org