hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: avoiding hot spot for timestamp prefix key
Date Fri, 22 May 2015 09:43:28 GMT
This is why I created HBASE-12853. 

So you don’t have to specify a custom split policy. 

Of course the simple solutions are often passed over because of NIH.  ;-) 

To be blunt… You encapsulate the bucketing code so that you have a single API in to HBase
regardless of the type of storage underneath. 
KISS is maintained and you stop people from attempting to do stupid things.   (cc’ing dev@hbase)
As a product owner, (read PMC / committers) you want to keep people from mucking about in
the internals.  While its true that its open source, and you will have some who want to muck
around, you also have to consider the corporate users who need something that is reliable
and less customized so that its supportable.  This is the vendor’s dilemma. (hint Cloudera
, Horton, IBM, MapR)  You’re selling support to HBase and if a customer starts to overload
internals with their own code, good luck in supporting it.  This is why you do things like
12853 because it makes your life easier. 

This isn’t a sexy solution. Its core engineering work. 

HTH

-Mike

> On May 22, 2015, at 4:22 AM, Shushant Arora <shushantarora09@gmail.com> wrote:
> 
> since custom split policy is based on second part i.e guid so key with
> first part as 2015-05-22 00:01:02 will be in which region how will that be
> identified?
> 
> 
> On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
>> The custom split policy needs to respect the fact that timestamp is the
>> leading part of the rowkey.
>> 
>> This would avoid the overlap you mentioned.
>> 
>> Cheers
>> 
>> 
>> 
>>> On May 21, 2015, at 11:55 PM, Shushant Arora <shushantarora09@gmail.com>
>> wrote:
>>> 
>>> guid change with every key, patterns is
>>> 2015-05-22 00:02:01#AB12EC77778888945
>>> 2015-05-22 00:02:02#CD9870001234AB457
>>> 
>>> When we specify custom split algorithm , it may happen that keys of same
>>> sorting order range say (1-7) lies in region R1 as well as in region R2?
>>> Then how .META. table will make further lookups at read time,  say I
>> search
>>> for key 3, then will it search in both the regions R1 and R2 ?
>>> 
>>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> 
>>>> Does guid change with every key ?
>>>> 
>>>> bq. use second part of key
>>>> 
>>>> I don't think so. Suppose first row in the parent region is
>>>> '1432104178817#321'. After split, the first row in first daughter region
>>>> would still be '1432104178817#321'. Right ?
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
>> shushantarora09@gmail.com
>>>> wrote:
>>>> 
>>>>> Can I avoid hotspot of region with custom region split policy in hbase
>>>>>> 0.96 .
>>>>> 
>>>>> Key is of the form timestamp#guid.
>>>>> So can I have custom region split policy and use second part of key
>> (i.e)
>>>>> guid as region split criteria and avoid hot spot??
>>>> 
>> 


Mime
View raw message