Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Date: Sun, 30 Aug 2015 21:59:45 +0000 (UTC)
From: "Ted Malaska (JIRA)" <jira@apache.org>
To: dev@hbase.apache.org
Message-ID: <JIRA.12860586.1440971959000.206218.1440971985719@Atlassian.JIRA>
In-Reply-To: <JIRA.12860586.1440971959000@Atlassian.JIRA>
References: <JIRA.12860586.1440971959000@Atlassian.JIRA>
 <JIRA.12860586.1440971959795@arcas>
Subject: [jira] [Created] (HBASE-14340) Add second bulk load option to Spark
 Bulk Load to send puts as the value
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Ted Malaska created HBASE-14340:
-----------------------------------

             Summary: Add second bulk load option to Spark Bulk Load to send puts as the value
                 Key: HBASE-14340
                 URL: https://issues.apache.org/jira/browse/HBASE-14340
             Project: HBase
          Issue Type: New Feature
          Components: spark
            Reporter: Ted Malaska
            Assignee: Ted Malaska
            Priority: Minor


The initial bulk load option for Spark bulk load sends values over one by one through the shuffle.  This is the similar to how the original MR bulk load worked.

How ever the MR bulk loader have more then one bulk load option.  There is a second option that allows for all the Column Families, Qualifiers, and Values or a row to be combined in the map side.

This only works if the row is not super wide.

But if the row is not super wide this method of sending values through the shuffle will reduce the data and work the shuffle has to deal with.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)