hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Region Splits
Date Wed, 23 Nov 2011 01:47:19 GMT
Ok so this would be "short scans"?

In my use case this would be unnecessary so I think Im going to run with 
the reversed id technique. I'm actually surprised I've never heard of 
anyone using this over the non predictable hashing.

On 11/22/11 5:35 PM, Sam Seigal wrote:
> If you are prefixing your keys with predictable hashes, you can do
> range scans - i.e. create a scanner for each prefix and then merge
> results at the client. With unpredictable hashes and key reversals ,
> this might not be entirely possible.
>
> I remember someone on the mailing list mentioning that Mozilla Socorro
> uses a similar technique. I haven't had a chance to look at their code
> yet, but that is something you might want to look at.
>
> On Tue, Nov 22, 2011 at 5:11 PM, Mark<static.void.dev@gmail.com>  wrote:
>> What to you mean by "short scans"?
>>
>> I understand that scans will not be possible with this method but neither
>> would they be if I hashed them so it seems like I'm in the same boat anyway.
>>
>> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>> Mark
>>>
>>> Key designs depend on expected access patterns and use cases. From a
>>> theoretical stand point, what you are saying will work to distribute
>>> writes but if you want to access a small range, you'll need to fan out
>>> your reads and can't leverage short scans.
>>>
>>> Amandeep
>>>
>>> On Nov 22, 2011, at 4:55 PM, Mark<static.void.dev@gmail.com>    wrote:
>>>
>>>> I just thought of something.
>>>>
>>>> In cases where the id is sequential couldn't one simply reverse the id to
>>>> get more of a uniform distribution?
>>>>
>>>> 510911 =>    119015
>>>> 510912 =>    219015
>>>> 510913 =>    319015
>>>> 510914 =>    419015
>>>>
>>>> That seems like a reasonable alternative that doesn't require prefixing
>>>> each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>>>>
>>>>
>>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>> If you increase the region size to 2GB, then all regions (current and
>>>>> new)
>>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>>> limit.  Reorganizing the regions for a uniform growth pattern is really
>>>>> a
>>>>> schema design problem.  There is the capability to merge two adjacent
>>>>> regions if you know that your data growth pattern is non-uniform.
>>>>> StumbleUpon&     other companies have more experience with those
utilities
>>>>> than I do.
>>>>>
>>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want
>>>>> to
>>>>> lean towards increasing the region size.  HFile scalability code is more
>>>>> mature/stable than the region splitting code.  Plus, automatic region
>>>>> splitting is harder to optimize&     debug when failures occur.
>>>>>
>>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>>> <Srikanth_Shreenivas@mindtree.com>     wrote:
>>>>>
>>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>>
>>>>>> What will happen if we increased the region size, say from current
>>>>>> value
>>>>>> of 256 MB to a new value of 2GB?
>>>>>> Will existing regions continue to use only 256 MB space?
>>>>>>
>>>>>> Is there a way to reorganize the regions so that each regions grows
to
>>>>>> 2GB size?
>>>>>>
>>>>>> Thanks,
>>>>>> Srikanth
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>>> To: user@hbase.apache.org
>>>>>> Subject: Re: Region Splits
>>>>>>
>>>>>> No.  The purpose of major compactions is to merge&     dedupe
within a
>>>>>> region
>>>>>> boundary.  Compactions will not alter region boundaries, except in
the
>>>>>> case of splits where a compaction is necessary to filter out any
Rows
>>>>>> from
>>>>>> the parent region that are no longer applicable to the daughter region.
>>>>>>
>>>>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>>>>> <Srikanth_Shreenivas@mindtree.com>     wrote:
>>>>>>
>>>>>>> Will major compactions take care of merging "older" regions or
adding
>>>>>>> more key/values to them as number of regions grow?
>>>>>>>
>>>>>>> Regard,
>>>>>>> Srikanth
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Subject: Re: Region Splits
>>>>>>>
>>>>>>> Mark,
>>>>>>>
>>>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>>>> (timestamps
>>>>>>> etc), you will always be writing to the end of the table and
"older"
>>>>>>> regions will not get any writes. This is one of the arguments
against
>>>>>>> using
>>>>>>> sequential keys.
>>>>>>>
>>>>>>> -ak
>>>>>>>
>>>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void.dev@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Say we have a use case that has sequential row keys and we
have rows
>>>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when
there is
>>>>>>>> a
>>>>>>>> split it will split at the halfway mark so there will be
two regions
>>>>>>>> as
>>>>>>>> follows:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-END]
>>>>>>>>
>>>>>>>> So now at this point all inserts will be writing to Region2
only
>>>>>>>> correct?
>>>>>>>> Now at some point Region2 will need to split and it will
look like
>>>>>>>> the
>>>>>>>> following before the split:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-150]
>>>>>>>>
>>>>>>>> After the split it will look like:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-100]
>>>>>>>> Region3 [150-END]
>>>>>>>>
>>>>>>>> And this pattern will continue correct? My question is when
there is
>>>>>>>> a
>>>>>>>> use
>>>>>>>> case that has sequential keys how would any of the older
regions
>>>>>>>> every
>>>>>>>> receive anymore writes? It seems like they would always be
stuck at
>>>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this
issue?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> http://www.mindtree.com/email/disclaimer.html

Mime
View raw message