spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <>
Subject [jira] [Assigned] (SPARK-21917) Remote http(s) resources is not supported in YARN mode
Date Tue, 19 Sep 2017 14:28:00 GMT


Wenchen Fan reassigned SPARK-21917:

    Assignee: Saisai Shao

> Remote http(s) resources is not supported in YARN mode
> ------------------------------------------------------
>                 Key: SPARK-21917
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit, YARN
>    Affects Versions: 2.2.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>            Priority: Minor
>             Fix For: 2.3.0
> In the current Spark, when submitting application on YARN with remote resources {{./bin/spark-shell
--master yarn-client -v}}, Spark will be failed with:
> {noformat}
> No FileSystem for scheme: http
> 	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(
> 	at org.apache.hadoop.fs.FileSystem.access$200(
> 	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(
> 	at org.apache.hadoop.fs.FileSystem.get(
> 	at org.apache.hadoop.fs.Path.getFileSystem(
> 	at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
> 	at$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
> 	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
> 	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
> 	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
> 	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
> 	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
> 	at scala.collection.immutable.List.foreach(List.scala:381)
> 	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
> 	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
> 	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)
> {noformat}
> This is because {{YARN#client}} assumes resources must be on the Hadoop compatible FS,
also in the NM (
it will only use Hadoop compatible FS to download resources. So this makes Spark on YARN fail
to support remote http(s) resources.
> To solve this problem, there might be several options:
> * Download remote http(s) resources to local and add this local downloaded resources
to dist cache. The downside of this option is that remote resources will be uploaded again
> * Filter remote http(s) resources and add them with spark.jars or spark.files, to leverage
Spark's internal fileserver to distribute remote http(s) resources. The problem of this solution
is: for some resources which require to be available before application start may not work.
> * Leverage Hadoop's support http(s) file system (
This is only worked in Hadoop 2.9+, and I think even we implement a similar one in Spark will
not be worked.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message