directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Index reverse tables : are they useful ?
Date Fri, 24 Jun 2011 07:51:16 GMT
On Fri, Jun 24, 2011 at 10:47 AM, Emmanuel Lécharny
<elecharny@apache.org> wrote:
> On 6/24/11 9:40 AM, Alex Karasulu wrote:
>>
>> On Thu, Jun 23, 2011 at 7:20 PM, Emmanuel Lecharny<elecharny@gmail.com>
>>  wrote:
>>>
>>> Hi guys,
>>>
>>> as I was reviewing the Table interface, and the Index hierarchy, I had a
>>> bit
>>> of time to try to understand this part of the server I was not
>>> comfortable
>>> with. And I saw that each index is using two tables : a forward and a
>>> reverse table.
>>>
>>> The forward table is obviously mandatory. It is used to retrieve the list
>>> of
>>> entry's ID from an AT value. Fine. But what about the reverse table ?
>>>
>>> Alex told me that it's useful when dealing with such filters :
>>> (&(cn=A)(sn=B)).
>>>
>>> The optimization engine will get the index that provides the smallest set
>>> of
>>> candidate (say, CN here). We then grab from the CN forward table all the
>>> entry's ID associated with the 'A' value. Let's call it the CN-IDs set.
>>>
>>> Now, if we were to grab all the entries from the MasterTable to compare
>>> their SN attribute with the 'B' value, t would be quite costly. We use
>>> the
>>> SN's reverse table. For each entry's ID, we check if the associated value
>>> in
>>> the SN's reverse table exists or not. If it exists, we have a valid
>>> candidate, otherwise, we simply discard the candidate.
>>>
>>> As we can see, we never fetch the entry unless we have a valid candidate
>>> (of
>>> course, if the SN and CN AT are indexed).
>>>
>>> However, I do think this reverse table is spurious. We can get the same
>>> result by using the SN's forward table : for each entry's ID, check in
>>> the
>>> SVn's forward table if the set of values (SN-IDs) contains the ID we
>>> pulled
>>> from the CN-IDs set.
>>>
>>> We can even do an SN-IDs ^ CN-IDs ( '^'<=>  intersection) to get the list
>>> of
>>> valid candidates. No need to do N fetches in the reverse table (where N
>>> is
>>> the number of ID in CN-IDs).
>>>
>>> I think we can remove those reverse table, and I expect that it can speed
>>> up
>>> the search a bit, but mainly speed up greatly any modification operation
>>> (add, delete, modify) and even decrease the disk consumption.
>>
>> I think this is possible. It would be interesting to test these two
>> different systems using different kinds of data. The number of
>> duplicate keys will obviously impact the performance: it will be a
>> trade off.
>>
>> To test an Evaluator (i.e. sn=foo) on a candidate entry say id 37, we
>> need to perform a reverse lookup on the sn index. The btree key is 37
>> and we check if it contains a tuple with a value of 'foo'. If it does
>> then the Evaluator returns true if not it returns false. Now since we
>> know we're looking for tuple ('foo', 37) and we only have the forward
>> table we can lookup the 'foo' entries, and the values from duplicate
>> keys would be sorted. We would just need to check if 37 were in these
>> values.
>>
>> The reverse index has no duplicate keys. The only way to get a
>> duplicate key in the reverse index is if the same entry (i.e. 37)
>> contained the same value ('foo') for the same (sn) attribute. And this
>> we know is not possible. So the lookups against the reverse table will
>> be faster.
>
> I was thinking about something a bit different : as soon as you have grabbed
> the list of entry's ID from the first index, looking into the other indexes
> will also return a list of Entry's ID. Checking if those IDs are valid
> candidate can then be done in one shot : do the intersection of the two sets
> (they are ordered, so it's a O(n) operation) and just get the matching
> entries.
>
> Compared to the current processing (ie, accessing the reverse index for
> *each* candidate), this will be way faster, IMO.

This is a VERY interesting idea. Maybe we should create a separate
thread for this and drive deeper into it. You got something I think
here.

> Note that it's true for the AND connector, but even for the OR connector, we
> don't have this kind of issue.

Sorry don't understand connector? You probably mean the cursors?

Regards,
Alex

Mime
View raw message