hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hong Tang <ht...@yahoo-inc.com>
Subject Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop
Date Tue, 18 May 2010 18:11:42 GMT

See my comments inline.

Thanks, Hong

On May 18, 2010, at 8:44 AM, stan lee wrote:

> Hi Guys,
> I am trying to use compression to reduce the IO workload when trying  
> to run
> a job but failed. I have several questions which needs your help.
> For lzo compression, I found a guide
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it  
> said "Note
> that you must have both 32-bit and 64-bit liblzo2 installed" ? I am  
> not sure
> whether it means that we also need 32bit liblzo2 installed even when  
> we are
> on 64bit system. If so, why?

The answer on the wiki page is to the question of how to set up the  
native libraries so that both 32-bit AND 64-bit java would work. If  
you adhere to an environment with the same flavor of java across the  
whole cluster, then the solution would not apply to you.

> Also if I don't use lzo compression and tried to use gzip to  
> compress the
> final reduce output file, I just set below value in mapred-site.xml,  
> but
> seems it doesn't work(how can I find the final .gz file compressed?  
> I used
> "hadoop dfs -l <dir>" and didn't find that.). My question: can we  
> use gzip
> to compress the final result when it's not streaming job? How can we  
> ensure
> that the compression has been enabled during a job execution?
> <property>
>       <name>mapred.output.compress</name>
>       <value>true</value>
> </property>

The truth is, this option is honored by the implementation of  
OutputFormat classes.  If you use TextOutputFormat, then you should  
see files like "part-xxxx.gz" in the output directory. If you write  
your own output format class, then you should follow the  
implementations of TextOutputFormat or SequenceFileOutputFormat to set  
up compression properly.

View raw message