spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stavros Kontopoulos (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23485) Kubernetes should support node blacklist
Date Thu, 22 Feb 2018 22:23:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373558#comment-16373558
] 

Stavros Kontopoulos edited comment on SPARK-23485 at 2/22/18 10:22 PM:
-----------------------------------------------------------------------

When an executor fails all cases are covered via handleDisconnectedExecutors which is scheduled
at some rate and then it calls removeExecutor in CoarseGrainedSchedulerBackend which updates
blacklist info. When we want to launch new executors, TaskSchedulerImpl will terminate an
executor that is already started on a blacklisted node. IMHO kubernetes spark scheduler should
fail fast and constraint where pods are launched on (which nodes) as it knows already that
some nodes are no option. For example this could be done with: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration.. 

 


was (Author: skonto):
I guess everything is covered via handleDisconnectedExecutors which is scheduled at some rate
and then it calls removeExecutor in 

CoarseGrainedSchedulerBackend which updates blacklist info.

 

> Kubernetes should support node blacklist
> ----------------------------------------
>
>                 Key: SPARK-23485
>                 URL: https://issues.apache.org/jira/browse/SPARK-23485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not use for running
tasks (eg., because of bad hardware).  When running in yarn, this blacklist is used to avoid
ever allocating resources on blacklisted nodes: https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this is incorrect
-- but I didn't see any references to {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}}
so it seems this is missing.  Thought of this while looking at SPARK-19755, a similar issue
on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message