spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: How to support dependency jars and files on HDFS in standalone cluster mode?
Date Thu, 11 Jun 2015 06:31:57 GMT
Oh sorry, I mistook --jars for --files. Yeah, for jars we need to add 
them to classpath, which is different from regular files.

Cheng

On 6/11/15 2:18 PM, Dong Lei wrote:
>
> Thanks Cheng,
>
> If I do not use --jars how can I tell spark to search the jars(and 
> files) on HDFS?
>
> Do you mean the driver will not need to setup a HTTP file server for 
> this scenario and the worker will fetch the jars and files from HDFS?
>
> Thanks
>
> Dong Lei
>
> *From:*Cheng Lian [mailto:lian.cs.zju@gmail.com]
> *Sent:* Thursday, June 11, 2015 12:50 PM
> *To:* Dong Lei; dev@spark.apache.org
> *Cc:* Dianfei (Keith) Han
> *Subject:* Re: How to support dependency jars and files on HDFS in 
> standalone cluster mode?
>
> Since the jars are already on HDFS, you can access them directly in 
> your Spark application without using --jars
>
> Cheng
>
> On 6/11/15 11:04 AM, Dong Lei wrote:
>
>     Hi spark-dev:
>
>     I can not use a hdfs location for the “--jars” or “--files” option
>     when doing a spark-submit in a standalone cluster mode. For example:
>
>                     Spark-submit  …   --jars hdfs://ip/1.jar  ….
>      hdfs://ip/app.jar (standalone cluster mode)
>
>     will not download 1.jar to driver’s http file server(but the
>     app.jar will be downloaded to the driver’s dir).
>
>     I figure out the reason spark not downloading the jars is that
>     when doing sc.addJar to http file server, the function called is
>     Files.copy which does not support a remote location.
>
>     And I think if spark can download the jars and add them to http
>     file server, the classpath is not correctly set, because the
>     classpath contains remote location.
>
>     So I’m trying to make it work and come up with two options, but
>     neither of them seem to be elegant, and I want to hear your advices:
>
>     Option 1:
>
>     Modify HTTPFileServer.addFileToDir, let it recognize a “hdfs” prefix.
>
>     This is not good because I think it breaks the scope of http file
>     server.
>
>     Option 2:
>
>     Modify DriverRunner.downloadUserJar, let it download all the
>     “--jars” and “--files” with the application jar.
>
>     This sounds more reasonable that option 1 for downloading files.
>     But this way I need to read the “spark.jars” and “spark.files” on
>     downloadUserJar or DriverRunnder.start and replace it with a local
>     path. How can I do that?
>
>     Do you have a more elegant solution, or do we have a plan to
>     support it in the furture?
>
>     Thanks
>
>     Dong Lei
>


Mime
View raw message