hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3166) HBase exporter should compress output files by default (or at least allow this as an option)
Date Thu, 28 Oct 2010 22:40:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925977#action_12925977

Kannan Muthukkaruppan commented on HBASE-3166:

Yes, I ran into this recently.

Turns out the compression part is already possible. The "export" uses the GenericOptionsParser,
which allows passing a bunch of settings as -D options.

 bin/hadoop jar <pathToHBaseJar> export -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapred.output.compression.type=BLOCK <tablename> <outputdirname>

We need to improve the documentation around this.

As part of finding this, I have also added support for exporting just a specific column family
as well as turning block cache off during export. Will create a separate JIRA for the same
and post a patch.

> HBase exporter should compress output files by default (or at least allow this as an
> --------------------------------------------------------------------------------------------
>                 Key: HBASE-3166
>                 URL: https://issues.apache.org/jira/browse/HBASE-3166
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.6
>            Reporter: Josh Rosenblum
>            Priority: Minor
> The HBase exporter puts (key, Result) pairs as keys and values into an output sequence
> There could be significant savings at low cost if at least default compression was enabled
on this output sequence file.
> In createSubmittableJob(), this might be as simple as adding the following:
>         SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);
>         SequenceFileOutputFormat.setCompressOutput(job, true);
>         FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class);

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message