hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2225) Enable compression in HBase Export
Date Sat, 13 Feb 2010 21:00:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833453#action_12833453

Ted Yu commented on HBASE-2225:

Using command line switch is fine.
I think we can make this feature more versatile by naming the switch no_compression_export.
Meaning by default, GzipCodec is used for Export.

We detect compression mode of the table first. If the table is compressed, we don't apply
GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified.

Since SequenceFileInputFormat is able to handle GzipCodec, this won't cause regression for
the Import class.

> Enable compression in HBase Export
> ----------------------------------
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is
no change for Import class.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message