crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat
Date Sat, 08 Jun 2013 09:37:20 GMT


Gabriel Reid commented on CRUNCH-212:

Yeah, writing directly to HFiles is *way* faster than HTableOutputFormat. Having this in Crunch
would be really great. I looked at it quickly in the past, and at the time it would have been
non-trivial due to the way output file handling is done in Crunch, but I think that situation
may have changed a bit recently, which could make it easier to integrate with Crunch.

It is indeed possible to call HBase to load the data in automatically. Coincidentally (or
maybe not) I did just this a while back at work -- it's also Apache-licensed, and the relevant
code is here:
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>                 Key: CRUNCH-212
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Chao Shi
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient
than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after HFiles are generated?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message