crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat
Date Sun, 09 Jun 2013 09:15:20 GMT


Chao Shi commented on CRUNCH-212:

I think I hit something Gabriel mentioned that does not map cleanly to the crunch world:

HFileOutputFormat is not actually a target, as it needs a sort phase on rows as well as a
custom partitioner. So I have to make it a utility method: HBaseBulkLoadUtils#save(String
tableName, PCollection<Put or Delete>), which does sort in an additional stage. It can
also take advantage of handleOutputs() to call HBase to load the generated HFiles.

Any better ideas?
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>                 Key: CRUNCH-212
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Chao Shi
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient
than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after HFiles are generated?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message