directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Seelmann <seelm...@apache.org>
Subject Re: HBase partition integration in trunks ?
Date Tue, 16 Aug 2011 13:14:05 GMT
On Tue, Aug 16, 2011 at 9:53 AM, Emmanuel Lécharny <elecharny@apache.org> wrote:
> On 8/15/11 5:59 PM, Stefan Seelmann wrote:
>>
>> Now I have to update the parts that are a bit special, let me explain:
>> In HBase partition I didn't use one-level and sub-level indices, but
>> use the RDN index table instead. I also extended the search engine in
>> that way that one-level and sub-level cursors get the search filter in
>> order to perform filtering within the store instead of returning all
>> candidates and evaluate them.
>
> Some toughts about this one-level/sub-level index.
>
> Using the Rdn index makes perfect sense : we have the Rdn -> parent relation
> plus the parent -> children relation in this index, so there is no need to
> have a one level index (all the children are already listed in the RDN index
> for a specific entry). I'm a bit more concerned about the sub-level
> processing : we have to recurse on all the children to get all the
> candidates. That's fine, we can easily implement that (and you already did),
> but what concerns me is that we don't have the count of all the entries, we
> will have to compute them. This count is necessary in the search engine to
> select the index we will use to walk the entries.
>
> One solution would be to store two more elements in the ParentIdAndRdn data
> structure : the number of children directly below the RDN, and the number of
> children and descendant. That would probably solve the issue I'm mentioning.

Yes, that is exactly what I did for the HBase partition. I also did
some changes in the xdbm-partition code. There are two counters that
track the one-level and sub-level children count. I'll create a branch
and commit what I have tonight.

> Of course, that also means we wil have to update all the RDN hierarchy from
> top to bottom (but affecting only the RDN part of the entry DN) each time we
> add/move/delete an entry. Note that we already do that for the oneLevel and
> Sublevel index.

Yes. Only the ParentIdAndRdn objects in the RDN index needs to be
updated, beginning from the modified entry up till the root. If we
expect a flat tree then only few updates are necessary. I also think
that those "branches" are hot-spots, I mean are often accessed, and
should be cached.

A special case is the rename/move operation. In that case we can't
just drop and add the RDN index entry because then we would loose the
counters. Instead the counters must be copied to the new RDN index
entry.

> All in all, I do think this is feasable, and you probably already have
> implemented such logic in the HBase partition.
>
> Can you tell me if what I wrote above makes sense for HBase but also for the
> whole system ?

Yes, I think it makes sense for all kind of partitons.

> If we could get rid of the one-level/sub-level index, we would speed-up the
> add/move/delete operation greatly (as we will spare two index update),
> saving probably 25% of the time needed to update the backend (we will just
> have 5 index to update instead of 7). It might also speed up the search
> marginally, as we won't have to do  look-up in the one-level or sub-level
> index to build the scope filter.

Kind Regards,
Stefan

Mime
View raw message