spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <>
Subject [jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored
Date Wed, 28 Oct 2015 20:39:27 GMT


Sandy Ryza commented on SPARK-2089:

Dynamic allocation may not currently be used for batch workloads, but is there any reason
not to do so in the future?  Is there anything about static allocation that's better suited
for batch workloads?

> With YARN, preferredNodeLocalityData isn't honored 
> ---------------------------------------------------
>                 Key: SPARK-2089
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
> When running in YARN cluster mode, apps can pass preferred locality data when constructing
a Spark context that will dictate where to request executor containers.
> This is currently broken because of a race condition.  The Spark-YARN code runs the user
class and waits for it to start up a SparkContext.  During its initialization, the SparkContext
will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that .
 The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext
and uses it to start requesting containers.
> But in the SparkContext constructor that takes the preferredNodeLocationData, setting
preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN
code comes around quickly enough after being notified, the data that's fetched is the empty
unset version.  The occurred during all of my runs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message