spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Schmitz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-13202) Jars specified with --jars do not exist on the worker classpath.
Date Thu, 04 Feb 2016 20:47:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Schmitz updated SPARK-13202:
------------------------------------
    Description: 
I have a Spark Scala 2.11 application.  To deploy it to the cluster, I create a jar of the
dependencies and a jar of the project (although this problem still manifests if I create a
single jar with everything).  I will focus on problems specific to Spark Shell, but I'm pretty
sure they also apply to Spark Submit.

I can get Spark Shell to work with my application, however I need to set spark.executor.extraClassPath.
 From reading the documentation (http://spark.apache.org/docs/latest/configuration.html#runtime-environment)
it sounds like I shouldn't need to set this option ("Users typically should not need to set
this option.") After reading about --jars, I understand that this should set the classpath
for the workers to use the jars that are synced to those machines.

When I don't set spark.executor.extraClassPath, I get a kryo registrator exception with the
root cause being that a class is not found.

    java.io.IOException: org.apache.spark.SparkException: Failed to register classes with
Kryo
    java.lang.ClassNotFoundException: org.allenai.common.Enum

If I SSH into the workers, I can see that we did create directories that contain the jars
specified by --jars.

    /opt/data/spark/worker/app-20160204212742-0002/0
    /opt/data/spark/worker/app-20160204212742-0002/1

Now, if I re-run spark-shell but with `--conf spark.executor.extraClassPath=/opt/data/spark/worker/app-20160204212742-0002/0/myjar.jar`,
my job will succeed.  In other words, if I put my jars at a location that is available to
all the workers and specify that as an extra executor class path, the job succeeds.

Unfortunately, this means that the jars are being copied to the workers for no reason.  How
can I get --jars to add the jars it copies to the workers to the classpath?

  was:
I have a Spark Scala 2.11 application.  To deploy it to the cluster, I create a jar of the
dependencies and a jar of the project (although this problem still manifests if I create a
single jar with everything).  I will focus on problems specific to Spark Shell, but I'm pretty
sure they also apply to Spark Submit.

I can get Spark Shell to work with my application, however I need to set spark.executor.extraClassPath.
 From reading the documentation (http://spark.apache.org/docs/latest/configuration.html#runtime-environment)
it sounds like I shouldn't need to set this option ("Users typically should not need to set
this option.") After reading about --jars, I understand that this should set the classpath
for the workers to use the jars that are synced to those machines.

When I don't set spark.executor.extraClassPath, I get a kryo registrator exception with the
root cause being that a class is not found.

    java.io.IOException: org.apache.spark.SparkException: Failed to register classes with
Kryo
    java.lang.ClassNotFoundException: org.allenai.common.Enum

If I SSH into the workers, I can see that we did create directories that contain the jars
specified by --jars.

    /opt/data/spark/worker/app-20160204212742-0002/0
    /opt/data/spark/worker/app-20160204212742-0002/1

Now, if I re-run spark-shell but with `--conf spark.executor.extraClassPath=/opt/data/spark/worker/app-20160204212742-0002/0`,
my job will succeed.  In other words, if I put my jars at a location that is available to
all the workers and specify that as an extra executor class path, the job succeeds.

Unfortunately, this means that the jars are being copied to the workers for no reason.  How
can I get --jars to add the jars it copies to the workers to the classpath?


> Jars specified with --jars do not exist on the worker classpath.
> ----------------------------------------------------------------
>
>                 Key: SPARK-13202
>                 URL: https://issues.apache.org/jira/browse/SPARK-13202
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Michael Schmitz
>
> I have a Spark Scala 2.11 application.  To deploy it to the cluster, I create a jar of
the dependencies and a jar of the project (although this problem still manifests if I create
a single jar with everything).  I will focus on problems specific to Spark Shell, but I'm
pretty sure they also apply to Spark Submit.
> I can get Spark Shell to work with my application, however I need to set spark.executor.extraClassPath.
 From reading the documentation (http://spark.apache.org/docs/latest/configuration.html#runtime-environment)
it sounds like I shouldn't need to set this option ("Users typically should not need to set
this option.") After reading about --jars, I understand that this should set the classpath
for the workers to use the jars that are synced to those machines.
> When I don't set spark.executor.extraClassPath, I get a kryo registrator exception with
the root cause being that a class is not found.
>     java.io.IOException: org.apache.spark.SparkException: Failed to register classes
with Kryo
>     java.lang.ClassNotFoundException: org.allenai.common.Enum
> If I SSH into the workers, I can see that we did create directories that contain the
jars specified by --jars.
>     /opt/data/spark/worker/app-20160204212742-0002/0
>     /opt/data/spark/worker/app-20160204212742-0002/1
> Now, if I re-run spark-shell but with `--conf spark.executor.extraClassPath=/opt/data/spark/worker/app-20160204212742-0002/0/myjar.jar`,
my job will succeed.  In other words, if I put my jars at a location that is available to
all the workers and specify that as an extra executor class path, the job succeeds.
> Unfortunately, this means that the jars are being copied to the workers for no reason.
 How can I get --jars to add the jars it copies to the workers to the classpath?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message