spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vipul Pandey <vipan...@gmail.com>
Subject Re: LZO support in Spark 1.0.0 - nothing seems to work
Date Thu, 18 Sep 2014 00:51:42 GMT
It works for me : 


export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/hadoop-lzo-cdh4-0.4.15-gplextras.jar


I hope you are adding this to the code : 

    val conf = sc.hadoopConfiguration
    conf.set("io.compression.codecs","com.hadoop.compression.lzo.LzopCodec")



Vipul

On Sep 17, 2014, at 5:40 PM, rogthefrog <roger@amino.com> wrote:

> I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
> matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
> work with Spark 1.0.0. I've tried:
> 
> * Setting this:
> 
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/"
> 
> * Setting this in spark-env.sh. I tried with and without "export". I tried
> in CDH Manager and manually on the host.
> 
> export
> SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> export
> SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
> 
> * Setting this in /etc/spark/conf/spark-defaults.conf:
> 
> spark.executor.extraLibraryPath 
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> spark.spark.executor.extraClassPath
> /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar
> 
> * Adding this in CDH manager:
> 
> export LD_LIBRARY_PATH=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
> 
> * Hardcoding
> -Djava.library.path=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native in
> the Spark command 
> 
> * Symlinking the gpl compression binaries into
> /opt/cloudera/parcels/CDH/lib/hadoop/lib/native
> 
> * Symlinking the gpl compression binaries into /usr/lib
> 
> And nothing worked. When I run pyspark I get this:
> 
> 14/09/17 20:38:54 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 
> and when I try to run a simple job on a LZO file in HDFS I get this:
> 
> distFile.count()
> 14/09/17 13:51:54 ERROR GPLNativeCodeLoader: Could not load native gpl
> library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>       at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886)
>       at java.lang.Runtime.loadLibrary0(Runtime.java:849)
>       at java.lang.System.loadLibrary(System.java:1088)
>       at
> com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
>       at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
> 
> Can anybody help please? Many thanks.
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-support-in-Spark-1-0-0-nothing-seems-to-work-tp14494.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message