incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feng Peng <pengf...@gmail.com>
Subject Re: Saving data using Lzo compression using HCatalog from Pig
Date Thu, 31 Jan 2013 17:39:48 GMT
Hi Tim,

You can try the following:

1. create a HiveLzoTextOutputFormat which extends
HiveIgnoreKeyTextOutputFormat<K, V>, the only thing it does is to set
compression output for the jobConf:

  @Override
  public RecordWriter getHiveRecordWriter(...) {
      FileOutputFormat.setCompressOutput(jc, true);
      return super.getHiveRecordWriter(jc, outPath, valueClass, true,
tableProperties, progress);
  }

   @Override
   public org.apache.hadoop.mapred.RecordWriter<K, V> getRecordWriter(...) {
       FileOutputFormat.setCompressOutput(jc, true);
       return super.getRecordWriter(ignored, jc, name, progress);
   }

2. When you create the table:

    set the InputFormat class to
DeprecatedLzoTextInputFormat<https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapred/input/DeprecatedLzoTextInputFormat.java>
    set the OutputFormat class to the new HiveLzoTextOutputFormat

Hope this helps.
Feng


On Wed, Jan 30, 2013 at 1:36 PM, Timothy Potter <thelabdude@gmail.com>wrote:

> Been struggling with this one for a bit ... Lzo compression is enabled
> by default for my Hadoop cluster. If I forget to turn off compression
> from my Pig scripts that create data in Hive using HCatalog, then the
> partitions get created but I can't read the data back. I don't have
> the error handy but it looks like the read-side doesn't treat the data
> as compressed.
>
> So I've resorted to adding the following to my scripts:
>
> SET mapreduce.output.compress false;
> SET mapred.output.compress false;
> SET output.compression.enabled false;
>
> One of the those seems to do the trick ;-)
>
> I'd really like to store my Hive data compressed but haven't figured
> out how to enable this with HCatalog. Seems like it's either not
> supported yet or I'm missing something simple in my HQL DDL table
> declaration.
>
> Cheers,
> Tim
>

Mime
View raw message