crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-308) Upgrade to Hadoop 2.2.0 and HBase 0.96
Date Fri, 06 Dec 2013 02:10:35 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840828#comment-13840828
] 

Chao Shi commented on CRUNCH-308:
---------------------------------

bq. The only real functional change is in how we do the sort for HBase bulk loads, where I
took what we had, which was doing the partitioned sort on the Writable KeyValue objects, and
changed it to do the sort on the ImmutableBytesWritable form of KeyValue.getRow(). This was
how this was done in HBase 0.96:

I think this requires to use with KeyValueSortReducer, which sorts KVs of the same row in
reducer's memory \[1\]. This is the main reason for us to introduce our own implementation
of output format for HFiles, which sorts KVs in MR's shuffle phase.

\[1\] https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/KeyValueSortReducer.java

> Upgrade to Hadoop 2.2.0 and HBase 0.96
> --------------------------------------
>
>                 Key: CRUNCH-308
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-308
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-HBASE96.patch
>
>
> As discussed on dev@crunch, we should update Crunch to run against the new mainline releases
of Hadoop (2.2.0) and HBase (0.96).
> There isn't a good way to maintain a shim between HBase 0.94 and HBase 0.96 due to a
number of API changes, so this change means that support for HBase 0.94 will remain in the
0.8.x sequence of Crunch releases, and 0.96 will be the supported version from 0.9.0 onwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message