hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
Date Fri, 31 Jul 2015 16:47:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649455#comment-14649455
] 

Lars Hofhansl commented on HBASE-12853:
---------------------------------------

Most committers have well paying jobs and won't risk leaving them either. The employer also
would be exposed to the very same risk (amplified, because there's more money to make).
I have personally many discussions with our legal team(s) about this. So I do know what I
am talking about. 

Most people fail to calculate the cost of legal risk and assume it to be infinite.

I get consulting gigs offered all the time _because_ I commit to open source (since I am employed
I cannot accept such gigs, but that's not the point here). It's all about how you set it up
with your customers. 

Sorry you feel this way. Contributing is what makes open source work. If everybody would think
like you there would be no open source.

In any case this is not the right place to discuss this.


> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
>                 Key: HBASE-12853
>                 URL: https://issues.apache.org/jira/browse/HBASE-12853
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Michael Segel 
>             Fix For: 2.0.0
>
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is that while
'salting' alleviated  regional hot spotting, it increased the complexity required to utilize
the data.  
> Through the use of coprocessors, it should be possible to offer a method which distributes
the data on write across the cluster and then manages reading the data returning a sort ordered
result set, abstracting the underlying process. 
> On table creation, a flag is set to indicate that this is a parallel table. 
> On insert in to the table, if the flag is set to true then a prefix is added to the key.
 e.g. <region server#>- or <region server #|| where the region server # is an integer
between 1 and the number of region servers defined.  
> On read (scan) for each region server defined, a separate scan is created adding the
prefix. Since each scan will be in sort order, its possible to strip the prefix and return
the lowest value key from each of the subsets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message