hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
Date Tue, 17 Nov 2015 23:06:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009797#comment-15009797

Ted Malaska commented on HBASE-14340:

Thank u Andrew.  Let me know if there r any other jiras u would like me to
look at.

Thank again

On Tuesday, November 17, 2015, Andrew Purtell (JIRA) <jira@apache.org>

Sent from Gmail Mobile

> Add second bulk load option to Spark Bulk Load to send puts as the value
> ------------------------------------------------------------------------
>                 Key: HBASE-14340
>                 URL: https://issues.apache.org/jira/browse/HBASE-14340
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>             Fix For: 2.0.0
>         Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch
> The initial bulk load option for Spark bulk load sends values over one by one through
the shuffle.  This is the similar to how the original MR bulk load worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a second option
that allows for all the Column Families, Qualifiers, and Values or a row to be combined in
the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the shuffle will
reduce the data and work the shuffle has to deal with.

This message was sent by Atlassian JIRA

View raw message