crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-212) Need target wrapper for HFileOuptutFormat
Date Fri, 19 Jul 2013 09:48:50 GMT


Chao Shi updated CRUNCH-212:

    Attachment: crunch-212-draft.patch

Sorry for the late reply. I finally decided to not use hbase's HFileOutputFormat and implement
our own version as a thin wrapper of HFile.Writer, because I think the former one is too hacky
and is not a pure output format.

My plan is to leverage crunch's functionality of sort and multiple MR stages than using HFileOutputFormat's.

More over, I see HFileOutputFormat use a TreeSet to sort KVs in memory at reducer side. I
don't think it is neccessary for us, because we can sort them during shuffle (HFileOutputFormat
sorts on row only in shuffle. I don't know why).
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>                 Key: CRUNCH-212
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Chao Shi
>         Attachments: crunch-212-draft.patch
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient
than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after HFiles are generated?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message