spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-21383) YARN can allocate to many executors
Date Mon, 17 Jul 2017 08:02:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-21383:
------------------------------------

    Assignee:     (was: Apache Spark)

> YARN can allocate to many executors
> -----------------------------------
>
>                 Key: SPARK-21383
>                 URL: https://issues.apache.org/jira/browse/SPARK-21383
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.0.0
>            Reporter: Thomas Graves
>
> The YarnAllocator doesn't properly track containers being launched but not yet running.
 If it takes time to launch the containers on the NM they don't show up as numExecutorsRunning,
but they are already out of the Pending list, so if the allocateResources call happens again
it can think it has missing executors even when it doesn't (they just haven't been launched
yet).
> This was introduced by SPARK-12447 
> Where it check for missing:
> https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297
> Only updates the numRunningExecutors after NM has started it:
> https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524
> Thus if the NM is slow or the network is slow, it can miscount and start additional executors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message