hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module
Date Thu, 23 Jul 2015 00:39:04 GMT
Ted Malaska created HBASE-14150:
-----------------------------------

             Summary: Add BulkLoad functionality to HBase-Spark Module
                 Key: HBASE-14150
                 URL: https://issues.apache.org/jira/browse/HBASE-14150
             Project: HBase
          Issue Type: New Feature
            Reporter: Ted Malaska
            Assignee: Ted Malaska


Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a given
RDD.

This will do the following:
1. figure out the number of regions and sort and partition the data correctly to be written
out to HFiles
2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle stage
and not in the memory of the reducer.  This will allow this design to support super wide records
with out going out of memory.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message