hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: how to compress..!
Date Sat, 11 Jul 2009 17:28:57 GMT
Here are the set of configuration parameters for compression from 0.19

You can enable mapred.compress.map.output, and mapred.output.compress
as well as set mapred.output.compression.type to BLOCK for a good set of
defaults.

The compression codec's very by release substantially, so I won't go into
that.
BZip to is slow, gzip is medium and lzo is fast, the compression rates seem
to be move the compression speed

<property>
  <name>io.compression.codecs</name>

<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  <description>A list of the compression codec classes that can be used
               for compression/decompression.</description>
</property>
<property>
  <name>mapred.output.compress</name>
  <value>false</value>
  <description>Should the job outputs be compressed?
  </description>
</property>
<property>
  <name>mapred.output.compression.type</name>
  <value>RECORD</value>
  <description>If the job outputs are to compressed as SequenceFiles, how
should
               they be compressed? Should be one of NONE, RECORD or BLOCK.
  </description>
</property>
<property>
  <name>mapred.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  <description>If the job outputs are compressed, how should they be
compressed?
  </description>
</property>
<property>
  <name>mapred.compress.map.output</name>
  <value>false</value>
  <description>Should the outputs of the maps be compressed before being
               sent across the network. Uses SequenceFile compression.
  </description>
</property>
<property>
  <name>mapred.map.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  <description>If the map outputs are compressed, how should they be
               compressed?
  </description>
</property>
<property>
  <name>io.seqfile.compress.blocksize</name>
  <value>1000000</value>
  <description>The minimum block size for compression in block compressed
                  SequenceFiles.
  </description>
</property>
<property>
  <name>io.seqfile.lazydecompress</name>
  <value>true</value>
  <description>Should values of block-compressed SequenceFiles be
decompressed
                  only when necessary.
  </description>
</property>


On Thu, Jul 9, 2009 at 10:50 AM, Alex Loddengaard <alex@cloudera.com> wrote:

> A few comments before I answer:
> 1) Each time you send an email, we receive two emails.  Is your mail client
> misconfigured?
> 2) You already asked this question in another thread :).  See my response
> there.
>
> Short answer: <
>
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
> >
>
> Alex
>
> On Thu, Jul 9, 2009 at 1:11 AM, Sugandha Naolekar <sugandha.n87@gmail.com
> >wrote:
>
> > Hello!
> >
> > How to compress data by using hadoop api's??
> >
> > I want to write a java code to comperss the core files(the data I am
> going
> > to dump in HDFS) and then place in HDFS. So, the api's usage is
> sufficient.
> > What about making related changes in hadoop-site.xml file?
> >
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message