spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1520) Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
Date Thu, 17 Apr 2014 20:00:16 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973326#comment-13973326
] 

Xiangrui Meng commented on SPARK-1520:
--------------------------------------

The quick fix may be removing fastutil.

In RDD#countApproxDistinct, we use HyperLogLog from com.clearspring.analytics:stream, which
depends on fastutil. If this is the only place that introduces fastutil dependency, we should
implement HyperLogLog and remove fastutil completely from Spark's dependencies.

> Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-1520
>                 URL: https://issues.apache.org/jira/browse/SPARK-1520
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, Spark Core
>            Reporter: Patrick Wendell
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> This is a real doozie - when compiling a Spark assembly with JDK7, the produced jar does
not work well with JRE6. I confirmed the byte code being produced is JDK 6 compatible (major
version 50). What happens is that, silently, the JRE will not load any class files from the
assembled jar.
> {code}
> $> sbt/sbt assembly/assembly
> $> /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
org.apache.spark.ui.UIWorkloadGenerator
> usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] [FIFO|FAIR]
> $> /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
org.apache.spark.ui.UIWorkloadGenerator
> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/ui/UIWorkloadGenerator
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.UIWorkloadGenerator
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
> Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. Program will
exit.
> {code}
> I also noticed that if the jar is unzipped, and the classpath set to the currently directory,
it "just works". Finally, if the assembly jar is compiled with JDK6, it also works. The error
is seen with any class, not just the UIWorkloadGenerator. Also, this error doesn't exist in
branch 0.9, only in master.
> *Isolation*
> -I ran a git bisection and this appeared after the MLLib sparse vector patch was merged:-
> https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534
> SPARK-1212
> -I narrowed this down specifically to the inclusion of the breeze library. Just adding
breeze to an older (unaffected) build triggered the issue.-
> I've found that if I just unpack and re-pack the jar (using `jar` from java 6 or 7) it
always works:
> {code}
> $ cd assembly/target/scala-2.10/
> $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
org.apache.spark.ui.UIWorkloadGenerator # fails
> $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
> $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar *
> $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
org.apache.spark.ui.UIWorkloadGenerator # succeeds
> {code}
> I also noticed something of note. The Breeze package contains single directories that
have huge numbers of files in them (e.g. 2000+ class files in one directory). It's possible
we are hitting some weird bugs/corner cases with compatibility of the internal storage format
of the jar itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message