hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14339) HBase Bulk Load and super wide rows
Date Sun, 30 Aug 2015 21:51:45 GMT
Ted Malaska created HBASE-14339:

             Summary: HBase Bulk Load and super wide rows
                 Key: HBASE-14339
                 URL: https://issues.apache.org/jira/browse/HBASE-14339
             Project: HBase
          Issue Type: Bug
            Reporter: Ted Malaska
            Priority: Minor

This may not be a huge issues but it does come up.  If the number of columns in a row are
to many then KeyValueSortReducer will blow up with a out of memory exception, because it uses
a TreeMap to sort the columns with in the memory of the reducer.

A solution would be to add the column family and qualifier to the key so the shuffle would
handle the sort.

The partitioner would only partition on the rowKey but ordering would apply to the RowKey,
Column Family, and Column Qualifier.

Look at the Spark Bulk load as an example.  HBASE-14150  

This message was sent by Atlassian JIRA

View raw message