spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhu Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-22587) Spark job fails if fs.defaultFS and application jar are different url
Date Thu, 23 Nov 2017 06:18:00 GMT
Prabhu Joseph created SPARK-22587:
-------------------------------------

             Summary: Spark job fails if fs.defaultFS and application jar are different url
                 Key: SPARK-22587
                 URL: https://issues.apache.org/jira/browse/SPARK-22587
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 1.6.3
            Reporter: Prabhu Joseph


Spark Job fails if the fs.defaultFs and url where application jar resides are different and
having same scheme,

spark-submit  --conf spark.master=yarn-cluster wasb://XXX/tmp/test.py

core-site.xml fs.defaultFS is set to wasb:///YYY. Hadoop list works (hadoop fs -ls) works
for both the url XXX and YYY.

{code}
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: wasb://XXX/tmp/test.py,
expected: wasb://YYY 
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665) 
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.checkPath(NativeAzureFileSystem.java:1251)

at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:485) 
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:396) 
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:507)

at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:660) 
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:912) 
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:172) 
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1248) 
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1307) 
at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
{code}

The code Client.copyFileToRemote tries to resolve the path of application jar (XXX) from the
FileSystem object created using fs.defaultFS url (YYY) instead of the actual url of application
jar.

val destFs = destDir.getFileSystem(hadoopConf)
val srcFs = srcPath.getFileSystem(hadoopConf)

getFileSystem will create the filesystem based on the url of the path and so this is fine.
But the below lines of code tries to get the srcPath (XXX url) from the destFs (YYY url) and
so it fails.

var destPath = srcPath
val qualifiedDestPath = destFs.makeQualified(destPath)






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message