hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gordon Wang <gw...@gopivotal.com>
Subject Re: MR2 Job over LZO data
Date Mon, 10 Mar 2014 03:06:05 GMT
Can you run MR jobs (not pig job) which takes Lzo Files as input ?

If you can not run MR jobs. You may want to check the lzo compression
configuration in core-site.xml. Make sure the dynamic library is in
HADOOP_HOME/lib/native/

Here is a FAQ about how to configure lzo
https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1






On Sat, Mar 8, 2014 at 12:04 AM, Viswanathan J
<jayamviswanathan@gmail.com>wrote:

> Hi,
>
> Getting the below error while running pig job in hadoop-2.x,
>
> Caused by: java.io.IOException: No codec for file found
> 2639   at
> com.twitter.elephantbird.mapreduce.input.MultiInputFormat.determineFileFormat(MultiInputFormat.java:176)
> 2640   at
> com.twitter.elephantbird.mapreduce.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:88)
> 2641   at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)
>
> Have copied the respective lzo jars to lib folders, but facing this issue.
>
> pls help.
>
>
>
> On Fri, Mar 7, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>> King
>>
>> Here is my raw log of installing Hadoop LZO. This works on 2.2.0 and 2.3.0
>>
>>
>>
>> I hope this helps
>>
>>
>>
>> ./g
>>
>>
>>
>>
>>
>> *Where to get Hadoop LZO*
>>
>> https://github.com/twitter/hadoop-lzo
>>
>>
>>
>>
>> http://asmarterplanet.com/studentsfor/blog/2013/11/hadoop-cluster-module-lzo-compression.html
>>
>>
>>
>> *Requirements*
>>
>> On cents:
>>
>> sudo yum install lzo*  --> /usr/lib64/liblzo2.so.2....
>>
>>
>>
>> On ubuntu:
>>
>> sudo apt-get install liblzo -->  on X86:  /usr/lib64/liblzo2.so.2
>>
>>
>>
>> *Clone:*
>>
>> git clone https://github.com/twitter/hadoop-lzo.git
>>
>>
>>
>> Follow instructions on README.md from this github site, basically
>>
>>
>>
>>  cd hadoop-lzo
>>
>> *     mvn clean package  test*
>>
>>
>>
>> *To enable this at run time do:*
>>
>> a.       Copy the library to the hadoop/share/common (if  you don't want
>> to modify classpaths by putting the library somewhere else)
>>
>>
>>
>> cp lzo..././target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ..
>> hadoop/share/hadoop/common/
>>
>>
>>
>> a.       Copy /usr/lib64/liblzo2.so.2 to  .. Hadoop/lib/native/
>>
>>
>>
>>
>>
>> *From:* Gordon Wang [mailto:gwang@gopivotal.com]
>> *Sent:* Thursday, March 06, 2014 11:50 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: MR2 Job over LZO data
>>
>>
>>
>> You can try to get the source code https://github.com/twitter/hadoop-lzo and then
compile it against hadoop 2.2.0.
>>
>>
>>
>> In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0
>>
>>
>>
>> On Thu, Mar 6, 2014 at 6:29 PM, KingDavies <kingdavies@gmail.com> wrote:
>>
>> Running on Hadoop 2.2.0
>>
>>
>>
>> The Java MR2 job works as expected on an uncompressed data source using
>> the TextInputFormat.class.
>>
>> But when using the LZO format the job fails:
>>
>> import com.hadoop.mapreduce.LzoTextInputFormat;
>>
>> job.setInputFormatClass(LzoTextInputFormat.class);
>>
>>
>>
>> Dependencies from the maven repository:
>>
>> http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.19/
>>
>> Also tried with elephant-bird-core 4.4
>>
>>
>>
>> The same data can be queried fine from within Hive(0.12) on the same
>> cluster.
>>
>>
>>
>>
>>
>> The exception:
>>
>> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
>> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>>
>> at
>> com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
>>
>> at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
>>
>> at
>> com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
>>
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
>>
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
>>
>> at com.cloudreach.DataQuality.Main.main(Main.java:42)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>
>>
>>
>> I believe the issue is related to the changes in Hadoop 2, but where can
>> I find a H2 compatible version?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> --
>>
>> Regards
>>
>> Gordon Wang
>>
>
>
>
> --
> Regards,
> Viswa.J
>



-- 
Regards
Gordon Wang

Mime
View raw message