hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JOAQUIN GUANTER GONZALBEZ <x...@tid.es>
Subject RE: LzopCodec and SequenceFile?
Date Mon, 18 Jun 2012 06:34:33 GMT
Hi Harsh,

Thanks for the super-quick answer! It would be great if we could have this documented somewhere
in the official documentation, since there's no mention that SequenceFile cannot be used with
LzopCodec in either the SequenceFile documentation or the LzopCodec documentation.

Thanks again!
Ximo.

-----Mensaje original-----
De: Harsh J [mailto:harsh@cloudera.com]
Enviado el: viernes, 15 de junio de 2012 12:59
Para: mapreduce-user@hadoop.apache.org
Asunto: Re: LzopCodec and SequenceFile?

Hey Joaquin,

When using SequenceFiles, use LzoCodec. The reason is that SequenceFile is a container format
of its own, just like LZOP files are. It does not make sense combining the two.

For reading sequence files, use the SequenceFile.Reader class
(http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/SequenceFile.Reader.html)
and it will auto handle decompressing the K/V fields for you. You don't have to run lzop/etc.
first to be able to read it, as the compression is applied internally and not over the entire
file.

Here is also a good link on the difference at Quora:
http://www.quora.com/Whats-the-difference-between-the-LzoCodec-and-the-LzopCodec-in-Hadoop-LZO

On Fri, Jun 15, 2012 at 11:34 AM, JOAQUIN GUANTER GONZALBEZ <ximo@tid.es> wrote:
> Hello,
>
>
>
> I have a sequence of MR Jobs that are using the SequenceFile for their
> output and input format. If I run them without any compression enabled
> they work fine. If I use the LzoCodec they also work just fine (but
> then the output is not Lzop compatible which is inconvenient).
>
>
>
> If I try using the LzopCodec, then the first MR job (which reads from
> a TextFile and outputs to a SequenceFile) runs OK, but when the second
> job tries to read what the first job wrote, I get the following exception:
>
>
>
> java.io.EOFException: Premature EOF from inputStream
>
>         at
> com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.j
> ava:75)
>
>         at
> com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.
> java:114)
>
>         at
> com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java
> :54)
>
>         at
> com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:
> 83)
>
>         at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1591)
>
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1493
> )
>
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1480
> )
>
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475
> )
>
>         at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initial
> ize(SequenceFileRecordReader.java:50)
>
>         at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(Ma
> pTask.java:451)
>
>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>
>         at org.apache.ha
>
>
>
> Does anyone know why this could be happening? I'm using the latest's
> Couldera CDH3 distribution and I'm configuring the compression through
> the mapred.output.compression.codec property in the mapred-site.xml file.
>
>
>
> Thanks!
>
> Ximo.
>
>
> ________________________________
> Este mensaje se dirige exclusivamente a su destinatario. Puede
> consultar nuestra política de envío y recepción de correo electrónico
> en el enlace situado más abajo.
> This message is intended exclusively for its addressee. We only send
> and receive email on the basis of the terms set out at
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx



--
Harsh J

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política
de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on
the basis of the terms set out at
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Mime
View raw message