spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-22587) Spark job fails if fs.defaultFS and application jar are different url
Date Thu, 23 Nov 2017 10:20:00 GMT


Sean Owen commented on SPARK-22587:

Hm, but if the src and dest FS are different, it overwrites destPath to be a path relative
to destDir. I am not sure if that is the actual problem.
Is it that compareFs believes incorrectly that these represent the same FS?
If so then I do wonder if it makes sense to always set {{destPath = new Path(destDir, destName.getOrElse(srcPath.getName()))}}

This is some old logic from Sandy; maybe [~vanzin] or [~steve_l] has an opinion on the logic

> Spark job fails if fs.defaultFS and application jar are different url
> ---------------------------------------------------------------------
>                 Key: SPARK-22587
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.6.3
>            Reporter: Prabhu Joseph
> Spark Job fails if the fs.defaultFs and url where application jar resides are different
and having same scheme,
> spark-submit  --conf spark.master=yarn-cluster wasb://XXX/tmp/
> core-site.xml fs.defaultFS is set to wasb:///YYY. Hadoop list works (hadoop fs -ls) works
for both the url XXX and YYY.
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: wasb://XXX/tmp/,
expected: wasb://YYY 
> at org.apache.hadoop.fs.FileSystem.checkPath( 
> at

> at org.apache.hadoop.fs.FileSystem.makeQualified( 
> at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:396) 
> at$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:507)

> at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:660) 
> at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:912)

> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) 
> at 
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1307) 
> at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at sun.reflect.NativeMethodAccessorImpl.invoke( 
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(

> at java.lang.reflect.Method.invoke( 
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)

> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
> {code}
> The code Client.copyFileToRemote tries to resolve the path of application jar (XXX) from
the FileSystem object created using fs.defaultFS url (YYY) instead of the actual url of application
> val destFs = destDir.getFileSystem(hadoopConf)
> val srcFs = srcPath.getFileSystem(hadoopConf)
> getFileSystem will create the filesystem based on the url of the path and so this is
fine. But the below lines of code tries to get the srcPath (XXX url) from the destFs (YYY
url) and so it fails.
> var destPath = srcPath
> val qualifiedDestPath = destFs.makeQualified(destPath)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message