spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yinan Li (JIRA)" <>
Subject [jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist
Date Fri, 23 Feb 2018 17:37:00 GMT


Yinan Li commented on SPARK-23485:

In the Yarn case, yes, it's possible that a node is missing a jar commonly needed by applications.
In the Kubernetes mode, this will never be the case because containers either all have a particular
jar locally or none of them has it. An image missing a dependency is problematic by itself.
This consistency is one of the benefit of being containerized. Talking about node problems,
detecting node problems and avoid scheduling pods onto problematic nodes are the concerns
of the kubelets and the scheduler. Applications should not need to worry about if nodes are
healthy or not. Node problems happening at runtime cause pods to be evicted from the problematic
nodes and rescheduled somewhere else. Having applications be responsible for keeping track
of problematic nodes and maintain a blacklist means unnecessarily jumping into the business
of kubelets and the scheduler.



> Kubernetes should support node blacklist
> ----------------------------------------
>                 Key: SPARK-23485
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Scheduler
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not use for running
tasks (eg., because of bad hardware).  When running in yarn, this blacklist is used to avoid
ever allocating resources on blacklisted nodes:
> I'm just beginning to poke around the kubernetes code, so apologies if this is incorrect
-- but I didn't see any references to {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}}
so it seems this is missing.  Thought of this while looking at SPARK-19755, a similar issue
on mesos.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message