systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deron Eriksson <deroneriks...@gmail.com>
Subject Re: Fixed hadoop configuration to run dml on large dataset
Date Thu, 04 Feb 2016 21:55:15 GMT
Ethan, thank you for posting the fix to the LZO configuration issue.

Deron


On Thu, Feb 4, 2016 at 9:45 AM, Ethan Xu <ethanxu@us.ibm.com> wrote:

> Thanks to help from the team, we fixed a hadoop classpath configuration so
> dml successfully invokes MapReduce jobs.
>
> I'm carrying the discussion here in case other people ran into the same
> problem.
>
> ----Problem description----
> I was running a simple dml to carry out data transformation on a hadoop
> cluster (hadoop 2.0.0 cdh4.2.1). The script ran successfully on 1GB data,
> but throws an error on ~30GB of data.
>
> It looks like SystemML didn't need to invoke MapReduce jobs on the small
> data set with console output ' Number of executed MR Jobs: 0'. On the
> larger data it attempted to run MR and threw the following error:
>
> ...
> Caused by: java.lang.ClassNotFoundException: Class
> com.hadoop.compression.lzo.LzoCodec not found
>         at
>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
>         at
>
> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
>         ... 38 more
>
>
> ----Solution----
> The missing class com.hadoop.compression.lzo.LzoCodec is contained in the
> lzo-hadoop jar file:
>
> http://search.maven.org/#search%7Cga%7C1%7Cfc%3A%22com.hadoop.compression.lzo.LzoCodec%22
>
> Installation and configuration information of LZO Parcel can be found
> here:
>
> http://www.cloudera.com/documentation/archive/manager/4-x/4-7-3/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html
> and this stackoverflow solution:
>
> http://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5
>
> For my case it turns out we have the lzo jar but it was not included in
> the classpath. Explicitly pointing to the jar at dml job submission via
> -libjars (https://hadoop.apache.org/docs/r1.2.1/commands_manual.html#jar)
> did the trick:
>
> hadoop jar ./SystemML.jar -libjars <path to lzo jar>/hadoop-lzo-0.4.15.jar
> -f ./transform.dml -nvargs X=<path on HDFS>/file-to-transform
>
> Ethan
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message