directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Re: About reverse index ...
Date Mon, 10 May 2010 08:45:15 GMT
On 5/13/10 8:08 AM, Alex Karasulu wrote:
> Hi Emmanuel,
> On Sun, May 9, 2010 at 6:42 PM, Emmanuel Lecharny<>wrote:
> ...
> As soon as you consider that a delete operation will need to update all the
>> index the deleted entry uses, you see that you can benefit from having such
>> reverse index : you don't have to grab the entry from the backend, as you
>> already have it's ID, which is all what you need. Thus, a delete operation
>> is just about doing :
>> - get the entry ID
>> - for each index, if we have an<Entry ID>  stored, then grab the associated
>> values, and for each value, delete them from the forward index.
>> - delete the entry from the MT
> Yep this is one benefit of the reverse index.
At this point, I'm questionning the fact that it's an advantage. We can 
do almost the same thing using the forward index, assuming we know the 
attribute we are dealing with, and we also have the value in the filter, 
so we can check that the entry we are looking at is present in the index 
for this attribuet. Let me explain with some example :

filter : (cn=value), cn is indexed.
you are looking for entry E(x)
The reverse index will contain :
E(x) --> value
the forward index will contain :
vaue --> E(1), E(5), ... E(x)...E(N)

so you can still in a O(log(N)) find the entry in the forward index 
without fetching the entry first.

> There are other uses for the
> reverse index. To start let's just list the various index methods leveraging
> the reverse index:
>      boolean reverse( ID id ) throws Exception;
>      boolean reverse( ID id, K attrVal ) throws Exception;
>      boolean reverseGreaterOrEq( ID id ) throws Exception;
>      boolean reverseGreaterOrEq( ID id, K attrVal ) throws Exception;
>      boolean reverseLessOrEq( ID id ) throws Exception;
>      boolean reverseLessOrEq( ID id, K attrVal ) throws Exception;
In fact, those methods are not existing anymore (don't ask me why, I 
just grepped all the code). I don't know where you get them from ...

The only place where the reverse index is used atm is when checking that 
an entry has a value.
> These various reverse lookups are used all over the XDBM search code base
> with various expression Evaluators and Cursors.
Well, not anymore :)
>   They are used to evaluate
> assertion expressions in filters during search. Without reverse indices
> entries must be accessed and deserialized from the master table for every
> potential candidate before full filter evaluation takes place.
Nope. You can still use the forward index. But even, it might be 
valuable to grab the entry from the disk and check directly against the 
filter in memory, instead of reading index. My perception is that we can 
save a lot of time assuming that the set of returned entries is low, and 
if you have more than a couple of filter expressions, as reading index 
might lead to disk access.

Now, it all depends on the memory size. If all the index and entries are 
in memory (optimal solution), then fetching an entry cost nothing, but 
we save all the index brwosing by checking the entry against the filter 

Of course, if we have disk access, then reading entries from the disk is 
costly, but so is the index browsing, as you will have to hit the disk 
too. Considering that you have N entry candidates, 1/3 of them being 
valid candidate (ie willbe returned to the client), then the other 
filter expressions will be used to eliminate those 2/3 of bad entries. 
If you have, say, M filter expressions, then you will potentially hit 
the disk while reading the (M-1) index, then when the entry fits, read 
the entry from the disk too. If the entry does not fit, you just save 
the MT access. So for valid entries you have an extra cost, as you read 
it from disk anyway, plus all the index potential disk acces. For 
invalid entries, you still have the index reads.

Of course, this is all but proven, I'm talking from a theorical POV. I'm 
just trying to rationalise the agorithm, in order to have more than one 
perspective about the search performance.

By no mean I think that the proposed solution is faster, I just think 
that it might be a good idea to test it.

Now, an even better option : all thise should be optionnal. If we can 
fit all the data in memory, then I don't think that we should use 
reverse indexes. If the databas is big, and if it's proven that having 
reverse indexes has benefit, then we should create them.

In all cases, we should not close any door.
> This could be
> massively expensive and would cause cache over-turn and most searches.  The
> performance would degrade considerably and the whole search engine must be
> altered to handle things like alias dereferencing differently which relies
> heavily on reverse indices.
> Certainly we can have improvements in the algorithm; it is by no means
> perfect.  However please realize that this search engine design and db
> structure has suited our needs for over 8 years with minor tweaks here and
> there.  I'm fairly confident that the removal of reverse tables in indices
> will be disastrous but in now way would I say we should not give it a try.
>   But please experiment in another branch. This is by far one of the most
> critical aspects of the LDAP server.
Well, I didn't intend to change *anything* atm. It's far too premature, 
and I just wanted to push those ideas here for future evaluations.

Those kind of experimentation deserver obviously to be done in a branch.

What I expect at the end of the day is that we find the best possible 
algotithme, assuming that we don't have a fit-all solution. If only we 
were able to tune the search algorithme depending on the memory sise, 
the data size, and the requests we receive, then it would be great !

In any case, it's an interesting discussion!

Emmanuel L├ęcharny

View raw message