hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiayu Ji <jiayu...@gmail.com>
Subject Re: How does mapreduce job determine the compress codec
Date Mon, 16 Dec 2013 00:28:28 GMT
Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.

On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
> 2013/12/14 Jiayu Ji <jiayu.ji@gmail.com>
>> Hi
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>> Jiayu

Jiayu (James) Ji,

Cell: (312)823-7393

View raw message