hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module
Date Wed, 12 Aug 2015 18:59:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694013#comment-14694013
] 

Hudson commented on HBASE-14150:
--------------------------------

FAILURE: Integrated in HBase-TRUNK #6718 (See [https://builds.apache.org/job/HBase-TRUNK/6718/])
HBASE-14150 Add BulkLoad functionality to HBase-Spark Module (Ted Malaska) (tedyu: rev 72a48a1333f6c01c46cd244439198ccce3f941ac)
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/FamilyHFileWriteOptions.scala
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/KeyFamilyQualifier.scala
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/BulkLoadPartitioner.scala
* hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/BulkLoadSuite.scala


> Add BulkLoad functionality to HBase-Spark Module
> ------------------------------------------------
>
>                 Key: HBASE-14150
>                 URL: https://issues.apache.org/jira/browse/HBASE-14150
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14150.1.patch, HBASE-14150.2.patch, HBASE-14150.3.patch, HBASE-14150.4.patch,
HBASE-14150.5.patch
>
>
> Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a
given RDD.
> This will do the following:
> 1. figure out the number of regions and sort and partition the data correctly to be written
out to HFiles
> 2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle
stage and not in the memory of the reducer.  This will allow this design to support super
wide records with out going out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message