crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat
Date Sat, 08 Jun 2013 20:48:20 GMT


Gabriel Reid commented on CRUNCH-212:

That's correct about the total ordering -- actually, I believe the util method in HFileOutputFormat
sets up a (total order) partitioner based on existing regions in the target HTable so that
the outputs match up with the regions. That's all based on Writables (HBase Puts or KeyValues),
and I think that it's probably pretty reusable by us (except that it excludes Avro, which
will be the case regardless).

There are a few other clever things that happen in the util methods of HFileOutputFormat,
and it will likely be a challenge to map those cleanly to Crunch -- however, I still really
think that this would be super-useful.
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>                 Key: CRUNCH-212
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Chao Shi
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient
than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after HFiles are generated?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message