hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2225) Enable compression in HBase Export
Date Tue, 27 Apr 2010 07:05:49 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars George updated HBASE-2225:

    Attachment: HBASE-2225-v2-trunk.patch

Patch v2 includes using JOpt-Simple to gather various command line parameters for the Export.
It also adds in parts HBASE-2434 namely the setCaching() option. It also adds a secondary
option to specify time ranges.

What I need here is an OK that JOpt-Simple is the way to go and I used it proper. Note that
I did not go down the "double braces" initializer, mainly because even if you specify a "ofType(Class)"
JOpt will still return an "Object" only. Furthermore it does *not* - unlike other packages
of the same kind - handle options with long and short option as the same that way. Only if
you use an OptionSpec class you will get the proper types back since it uses generics and
also combines options with long and short names.

Output is now:

$java org.apache.hadoop.hbase.mapreduce.Export

Option                                  Description                            
------                                  -----------                            
-?, -h, --help                          Show this help                         
-c, --caching <Integer>                 Number of rows for caching             
-e, --endtime <Long>                    End time as long value                 
--enddate <yyyyMMddHHmm>                End date (alternative to --endtime)    
-n, --versions <Integer>                Maximum versions                       
-o, --outputdir                         Output directory                       
-s, --starttime <Long>                  Start time as long value               
--startdate <yyyyMMddHHmm>              Start date (alternative to --starttime)
-t, --tablename                         Table name                             
-z, --compress                          Enable compression of output files     

Note: Another grief I have is that it also has the sorting of options when using "-h" for
example hardcoded. That is also the case for commons-cli. Not sure what that is about but
I would rather sort options the way I add them in the code as they belong to each other. Ah

> Enable compression in HBase Export
> ----------------------------------
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch, HBASE-2225-v2-trunk.patch
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is
no change for Import class.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message