directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: HBase partition integration in trunks ?
Date Tue, 16 Aug 2011 13:43:41 GMT
On Tue, Aug 16, 2011 at 4:38 PM, Emmanuel L├ęcharny <elecharny@apache.org>wrote:

> On 8/16/11 3:27 PM, Alex Karasulu wrote:
>
>> On Tue, Aug 16, 2011 at 10:53 AM, Emmanuel L├ęcharny<elecharny@apache.org>
>> **wrote:
>>
>>  On 8/15/11 5:59 PM, Stefan Seelmann wrote:
>>>
>>>  Now I have to update the parts that are a bit special, let me explain:
>>>> In HBase partition I didn't use one-level and sub-level indices, but
>>>> use the RDN index table instead. I also extended the search engine in
>>>> that way that one-level and sub-level cursors get the search filter in
>>>> order to perform filtering within the store instead of returning all
>>>> candidates and evaluate them.
>>>>
>>>>  Some toughts about this one-level/sub-level index.
>>>
>>> Using the Rdn index makes perfect sense : we have the Rdn ->  parent
>>> relation plus the parent ->  children relation in this index, so there is
>>> no
>>> need to have a one level index (all the children are already listed in
>>> the
>>> RDN index for a specific entry). I'm a bit more concerned about the
>>> sub-level processing : we have to recurse on all the children to get all
>>> the
>>> candidates. That's fine, we can easily implement that (and you already
>>> did),
>>> but what concerns me is that we don't have the count of all the entries,
>>> we
>>> will have to compute them. This count is necessary in the search engine
>>> to
>>> select the index we will use to walk the entries.
>>>
>>> One solution would be to store two more elements in the ParentIdAndRdn
>>> data
>>> structure : the number of children directly below the RDN, and the number
>>> of
>>> children and descendant. That would probably solve the issue I'm
>>> mentioning.
>>> Of course, that also means we wil have to update all the RDN hierarchy
>>> from
>>> top to bottom (but affecting only the RDN part of the entry DN) each time
>>> we
>>> add/move/delete an entry. Note that we already do that for the oneLevel
>>> and
>>> Sublevel index.
>>>
>>>
>>>  Good idea Emmanuel.
>>
>
> Note that I just rephrased Stefan's idea here. It's not mine initially.
>
>
>> This would be a neat solution to handling the sub level count problem.
>> Let's
>> experiment with this and see if it does intact lead to a speedup which I
>> think it should but it's good to just see. I wish we had a nice lab for
>> this.
>>
> HBase work done by Stefa is already an excellent lab :)
>
>
Yes it is a very good exercise which shows the interfaces and design are
holding up pretty well if these relatively minor issues are all we have to
worry about.

But what I meant was a lab where machines are ready to run tests on our
nightly builds not just the experience of writing this partition :-). It
would be neat to see the progression with theses changes over time.

-- 
Best Regards,
-- Alex

Mime
View raw message