spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-20540) Dynamic allocation constantly requests and kills executors
Date Sun, 30 Apr 2017 22:39:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-20540:
------------------------------------

    Assignee:     (was: Apache Spark)

> Dynamic allocation constantly requests and kills executors
> ----------------------------------------------------------
>
>                 Key: SPARK-20540
>                 URL: https://issues.apache.org/jira/browse/SPARK-20540
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 2.0.2, 2.1.0, 2.2.0
>            Reporter: Ryan Blue
>
> We are seeing some strange behavior with dynamic allocation, where in some cases the
driver will get into a state where it constantly kills idle executors while requesting new
executors. This happens at the end of a stage when all tasks are assigned and never stops
even when there are no tasks to run.
> From the YarnAllocator logs, it looks like the allocator is getting lots of requests
from the driver, even though the timeout between requests should be 5s:
> {code:title=Yarn allocator logs}
> 17/04/20 19:52:05 INFO dispatcher-event-loop-49 YarnAllocator: Driver requested a total
number of 227 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-30 YarnAllocator: Driver requested a total
number of 213 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 1 executor containers, each
with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality
no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any,
capability: &lt;memory:7168, vCores:2&gt;)
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: ip-100-74-34-230.ec2.internal
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: ip-100-74-47-57.ec2.internal
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 2 containers from YARN, launching
executors on 2 of them.
> 17/04/20 19:52:05 INFO dispatcher-event-loop-11 YarnAllocator: Driver requested a total
number of 195 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-55 YarnAllocator: Driver requested a total
number of 174 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 2 executor containers, each
with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests (locality
no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any,
capability: &lt;memory:7168, vCores:2&gt;)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request (host: Any,
capability: &lt;memory:7168, vCores:2&gt;)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 4 containers from YARN, launching
executors on 4 of them.
> {code}
> I think the allocator cancels what requests it can, but is getting containers that have
already been requested and the executors keep growing because of requests from the driver.
Here are 5 seconds from the log:
> {code}
> 17/04/20 19:52:30 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total
number of 185 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-48 YarnAllocator: Driver requested a total
number of 193 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-24 YarnAllocator: Driver requested a total
number of 192 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-60 YarnAllocator: Driver requested a total
number of 195 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-53 YarnAllocator: Driver requested a total
number of 205 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total
number of 202 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-17 YarnAllocator: Driver requested a total
number of 232 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-45 YarnAllocator: Driver requested a total
number of 243 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver requested a total
number of 254 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-42 YarnAllocator: Driver requested a total
number of 263 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-20 YarnAllocator: Driver requested a total
number of 271 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total
number of 280 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-61 YarnAllocator: Driver requested a total
number of 289 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-22 YarnAllocator: Driver requested a total
number of 305 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total
number of 310 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-0 YarnAllocator: Driver requested a total
number of 313 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-28 YarnAllocator: Driver requested a total
number of 315 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total
number of 316 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-13 YarnAllocator: Driver requested a total
number of 317 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total
number of 311 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-40 YarnAllocator: Driver requested a total
number of 308 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-4 YarnAllocator: Driver requested a total
number of 301 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-23 YarnAllocator: Driver requested a total
number of 294 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-46 YarnAllocator: Driver requested a total
number of 287 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-8 YarnAllocator: Driver requested a total
number of 285 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total
number of 283 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-35 YarnAllocator: Driver requested a total
number of 281 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-63 YarnAllocator: Driver requested a total
number of 278 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-3 YarnAllocator: Driver requested a total
number of 277 executor(s).
> 17/04/20 19:52:33 INFO dispatcher-event-loop-38 YarnAllocator: Driver requested a total
number of 276 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-51 YarnAllocator: Driver requested a total
number of 273 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-31 YarnAllocator: Driver requested a total
number of 271 executor(s).
> 17/04/20 19:52:34 INFO dispatcher-event-loop-44 YarnAllocator: Driver requested a total
number of 270 executor(s).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message