On Tue, Aug 16, 2011 at 4:38 PM, Emmanuel Lécharny <elecharny@apache.org> wrote:
On 8/16/11 3:27 PM, Alex Karasulu wrote:
On Tue, Aug 16, 2011 at 10:53 AM, Emmanuel Lécharny<elecharny@apache.org>wrote:

On 8/15/11 5:59 PM, Stefan Seelmann wrote:

Now I have to update the parts that are a bit special, let me explain:
In HBase partition I didn't use one-level and sub-level indices, but
use the RDN index table instead. I also extended the search engine in
that way that one-level and sub-level cursors get the search filter in
order to perform filtering within the store instead of returning all
candidates and evaluate them.

Some toughts about this one-level/sub-level index.

Using the Rdn index makes perfect sense : we have the Rdn ->  parent
relation plus the parent ->  children relation in this index, so there is no
need to have a one level index (all the children are already listed in the
RDN index for a specific entry). I'm a bit more concerned about the
sub-level processing : we have to recurse on all the children to get all the
candidates. That's fine, we can easily implement that (and you already did),
but what concerns me is that we don't have the count of all the entries, we
will have to compute them. This count is necessary in the search engine to
select the index we will use to walk the entries.

One solution would be to store two more elements in the ParentIdAndRdn data
structure : the number of children directly below the RDN, and the number of
children and descendant. That would probably solve the issue I'm mentioning.
Of course, that also means we wil have to update all the RDN hierarchy from
top to bottom (but affecting only the RDN part of the entry DN) each time we
add/move/delete an entry. Note that we already do that for the oneLevel and
Sublevel index.


Good idea Emmanuel.

Note that I just rephrased Stefan's idea here. It's not mine initially.


This would be a neat solution to handling the sub level count problem. Let's
experiment with this and see if it does intact lead to a speedup which I
think it should but it's good to just see. I wish we had a nice lab for
this.
HBase work done by Stefa is already an excellent lab :)


Yes it is a very good exercise which shows the interfaces and design are holding up pretty well if these relatively minor issues are all we have to worry about.

But what I meant was a lab where machines are ready to run tests on our nightly builds not just the experience of writing this partition :-). It would be neat to see the progression with theses changes over time.
 
--
Best Regards,
-- Alex