crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat
Date Wed, 17 Jul 2013 04:22:50 GMT


Chao Shi commented on CRUNCH-212:

Exactly, Micah. I'm working on 1) for a while (I'm busy on other stuffs these days, so I haven't
get my tests pass. I will post it here after get test done). The API should be able to translate
PCollection<KeyValue> to a set of HFiles. There are two problems:
1. HFileOutputFormat only accepts PTable<ImmutableByteWritable, KeyValue> rather than
PCollection<KeyValue>. For us, as ImmutableByteWritable (the row key) can be extracted
from KeyValue, so we maybe prefer the cleaner PCollection<KeyValue> approach. Currently
I copied HFileOutputFormat and modified it a little.
2. We may need to specify a total order partitioned for bulk load. Although this should be
part of the BulkLoader, but we have to do this before generating the HFiles.
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>                 Key: CRUNCH-212
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Chao Shi
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient
than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after HFiles are generated?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message