incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feng Peng <>
Subject Re: Saving data using Lzo compression using HCatalog from Pig
Date Thu, 31 Jan 2013 17:39:48 GMT
Hi Tim,

You can try the following:

1. create a HiveLzoTextOutputFormat which extends
HiveIgnoreKeyTextOutputFormat<K, V>, the only thing it does is to set
compression output for the jobConf:

  public RecordWriter getHiveRecordWriter(...) {
      FileOutputFormat.setCompressOutput(jc, true);
      return super.getHiveRecordWriter(jc, outPath, valueClass, true,
tableProperties, progress);

   public org.apache.hadoop.mapred.RecordWriter<K, V> getRecordWriter(...) {
       FileOutputFormat.setCompressOutput(jc, true);
       return super.getRecordWriter(ignored, jc, name, progress);

2. When you create the table:

    set the InputFormat class to
    set the OutputFormat class to the new HiveLzoTextOutputFormat

Hope this helps.

On Wed, Jan 30, 2013 at 1:36 PM, Timothy Potter <>wrote:

> Been struggling with this one for a bit ... Lzo compression is enabled
> by default for my Hadoop cluster. If I forget to turn off compression
> from my Pig scripts that create data in Hive using HCatalog, then the
> partitions get created but I can't read the data back. I don't have
> the error handy but it looks like the read-side doesn't treat the data
> as compressed.
> So I've resorted to adding the following to my scripts:
> SET mapreduce.output.compress false;
> SET mapred.output.compress false;
> SET output.compression.enabled false;
> One of the those seems to do the trick ;-)
> I'd really like to store my Hive data compressed but haven't figured
> out how to enable this with HCatalog. Seems like it's either not
> supported yet or I'm missing something simple in my HQL DDL table
> declaration.
> Cheers,
> Tim

View raw message