hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Segel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
Date Fri, 16 Jan 2015 16:00:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280440#comment-14280440

Michael Segel  commented on HBASE-12853:

First, lets get away from using the term salted. 
Salts do have a specific meaning and its associated with cryptography. While we're clearly
not talking about cryptography, it implies that the prefix is orthogonal to the data set and
the number of salted values is bound by the width of the prefix. 

Using the term bucketing the table would be more appropriate because in this example, you're
assigning a prefix from a round robin approach. 

I have to apologize, I don't play with HBase that much these days... my work is client driven.
With respect to client/server it seems that the delineation between client and server appears
to be a bit different from what I would expect from other databases.   In HBase, the client
creates a scan, and then has the hmaster will manage the scan and return a pointer to the
result set? 

With respect to the client side code... you're missing the point. You want to abstract the
bucketing from the client. So that the same scan will run against a bucketed table and an
un-bucketed table. The only exposed difference is that the metadata for the table will specify
the number of buckets which defaults to 1 (no bucketing) 

> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>                 Key: HBASE-12853
>                 URL: https://issues.apache.org/jira/browse/HBASE-12853
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Michael Segel 
>            Priority: Minor
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is that while
'salting' alleviated  regional hot spotting, it increased the complexity required to utilize
the data.  
> Through the use of coprocessors, it should be possible to offer a method which distributes
the data on write across the cluster and then manages reading the data returning a sort ordered
result set, abstracting the underlying process. 
> On table creation, a flag is set to indicate that this is a parallel table. 
> On insert in to the table, if the flag is set to true then a prefix is added to the key.
 e.g. <region server#>- or <region server #|| where the region server # is an integer
between 1 and the number of region servers defined.  
> On read (scan) for each region server defined, a separate scan is created adding the
prefix. Since each scan will be in sort order, its possible to strip the prefix and return
the lowest value key from each of the subsets. 

This message was sent by Atlassian JIRA

View raw message