spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it.
Date Tue, 10 Apr 2018 15:31:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432467#comment-16432467
] 

Thomas Graves commented on SPARK-16630:
---------------------------------------

sorry I don't follow, the list we get from the blacklist tracker is all nodes that are blacklisted
currently that haven't met the expiry to unblacklist them.  You just union them with the
yarn allocator list.   There is obviously some race condition there if one of the nodes
it just about to be unblacklisted but I don't see that as a major issue, the next allocation
will not have it.  Is there something I'm missing?

> Blacklist a node if executors won't launch on it.
> -------------------------------------------------
>
>                 Key: SPARK-16630
>                 URL: https://issues.apache.org/jira/browse/SPARK-16630
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 1.6.2
>            Reporter: Thomas Graves
>            Priority: Major
>
> On YARN, its possible that a node is messed or misconfigured such that a container won't
launch on it.  For instance if the Spark external shuffle handler didn't get loaded on it
, maybe its just some other hardware issue or hadoop configuration issue. 
> It would be nice we could recognize this happening and stop trying to launch executors
on it since that could end up causing us to hit our max number of executor failures and then
kill the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message