hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
Date Sun, 15 Nov 2015 19:20:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Malaska updated HBASE-14340:
--------------------------------
    Attachment: HBASE-14340.2.patch

Fixed copy paste issue.

It was my mistake.  The code was write on my laptop but I had made the patch out of sycn or
something.  

Thanks for finding that.

> Add second bulk load option to Spark Bulk Load to send puts as the value
> ------------------------------------------------------------------------
>
>                 Key: HBASE-14340
>                 URL: https://issues.apache.org/jira/browse/HBASE-14340
>             Project: HBase
>          Issue Type: New Feature
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch
>
>
> The initial bulk load option for Spark bulk load sends values over one by one through
the shuffle.  This is the similar to how the original MR bulk load worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a second option
that allows for all the Column Families, Qualifiers, and Values or a row to be combined in
the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the shuffle will
reduce the data and work the shuffle has to deal with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message