directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@apache.org>
Subject Re: HBase partition integration in trunks ?
Date Tue, 16 Aug 2011 13:33:29 GMT
On 8/16/11 3:14 PM, Kiran Ayyagari wrote:
> On Tue, Aug 16, 2011 at 6:27 PM, Emmanuel Lecharny<elecharny@gmail.com>  wrote:
>> On 8/16/11 2:22 PM, Kiran Ayyagari wrote:
>>> On Tue, Aug 16, 2011 at 1:23 PM, Emmanuel Lécharny<elecharny@apache.org>
>>>   wrote:
>>>> On 8/15/11 5:59 PM, Stefan Seelmann wrote:
>>>>> Now I have to update the parts that are a bit special, let me explain:
>>>>> In HBase partition I didn't use one-level and sub-level indices, but
>>>>> use the RDN index table instead. I also extended the search engine in
>>>>> that way that one-level and sub-level cursors get the search filter in
>>>>> order to perform filtering within the store instead of returning all
>>>>> candidates and evaluate them.
>>>> Some toughts about this one-level/sub-level index.
>>>>
>>>> Using the Rdn index makes perfect sense : we have the Rdn ->    parent
>>>> relation
>>>> plus the parent ->    children relation in this index, so there is no
need
>>>> to
>>>> have a one level index (all the children are already listed in the RDN
>>>> index
>>>> for a specific entry). I'm a bit more concerned about the sub-level
>>>> processing : we have to recurse on all the children to get all the
>>>> candidates. That's fine, we can easily implement that (and you already
>>>> did),
>>>> but what concerns me is that we don't have the count of all the entries,
>>>> we
>>>> will have to compute them. This count is necessary in the search engine
>>>> to
>>>> select the index we will use to walk the entries.
>>>>
>>>> One solution would be to store two more elements in the ParentIdAndRdn
>>>> data
>>>> structure : the number of children directly below the RDN, and the number
>>>> of
>>>> children and descendant. That would probably solve the issue I'm
>>>> mentioning.
>>>> Of course, that also means we wil have to update all the RDN hierarchy
>>>> from
>>>> top to bottom (but affecting only the RDN part of the entry DN) each time
>>>> we
>>>> add/move/delete an entry. Note that we already do that for the oneLevel
>>>> and
>>>> Sublevel index.
>>>>
>>> Just to make a point:
>>> I think, in the case of achieving SubLevel index evaluation with RDN
>>> index it becomes a costly and complex operation
>>> (recursive scanning and updating) where as with the current sublevel
>>> index it takes O(1) to fetch all the sublevel children of
>>> an entry.
>>> Not sure if HBase has any features to solve in an efficient manner
>> If you think about what is done when we add an entry in the current code
>> base :
>> - add the entry in the MasterTable
>> - add the RDN/parent into the RdnIndex
>> - update the one-level index with the newly added entry reference,
>> increasing the number of children for the parent
>> - for each RDN in the parent, update the sub-level index with the newly
>> added entry reference, increasing the number of children for the parent
>>
>> If we compare what we would do if we remove the one-level/sub-level index :
>> - add the entry in the MasterTable
>> - add the RDN/parent into the RdnIndex
>> - for each RDN in the parent, update the rdn index with the newly added
>> entry reference, increasing the number of children for the parent
>>
> this count needs to be updated in more than one ParentAndRdn entry(for
> sublevel indexing feature only) depending on the level
> at which the child entry was added(cause each entry in the ancestor
> chain is a valid base to perform sublevel search)
> and also when it comes to fetch all the sublevel entries we need to
> scan recursively.
Right, and this is why I added a 'for each RDN ' in the last operation. 
Btw, this is exactly the same kind of cost we already have to pay to 
update the sub-level index.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Mime
View raw message