spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-2669) Hadoop configuration is not localised when submitting job in yarn-cluster mode
Date Thu, 02 Apr 2015 22:28:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-2669:
-----------------------------------

    Assignee:     (was: Apache Spark)

> Hadoop configuration is not localised when submitting job in yarn-cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-2669
>                 URL: https://issues.apache.org/jira/browse/SPARK-2669
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Maxim Ivanov
>
> I'd like to propose a fix for a problem when Hadoop configuration is not localized when
job is submitted in yarn-cluster mode. Here is a description from github pull request https://github.com/apache/spark/pull/1574
> This patch fixes a problem when Spark driver is run in the container
> managed by YARN ResourceManager it inherits configuration from a
> NodeManager process, which can be different from the Hadoop
> configuration present on the client (submitting machine). Problem is
> most vivid when fs.defaultFS property differs between these two.
> Hadoop MR solves it by serializing client's Hadoop configuration into
> job.xml in application staging directory and then making Application
> Master to use it. That guarantees that regardless of execution nodes
> configurations all application containers use same config identical to
> one on the client side.
> This patch uses similar approach. YARN ClientBase serializes
> configuration and adds it to ClientDistributedCacheManager under
> "job.xml" link name. ClientDistributedCacheManager is then utilizes
> Hadoop localizer to deliver it to whatever container is started by this
> application, including the one running Spark driver.
> YARN ClientBase also adds "SPARK_LOCAL_HADOOPCONF" env variable to AM
> container request which is then used by SparkHadoopUtil.newConfiguration
> to trigger new behavior when machine-wide hadoop configuration is merged
> with application specific job.xml (exactly how it is done in Hadoop MR).
> SparkContext is then follows same approach, adding
> SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use
> client-side Hadopo configuration.
> Also all the references to "new Configuration()" which might be executed
> on YARN cluster side are changed to use SparkHadoopUtil.get.conf
> Please note that it fixes only core Spark, the part which I am
> comfortable to test and verify the result. I didn't descend into
> steaming/shark directories, so things might need to be changed there too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message