felix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Needed: LDAP expression evaluation optimization
Date Fri, 26 Jan 2007 01:51:41 GMT
Richard S. Hall wrote:
> On Jan 25, 2007, at 5:30 PM, Alex Karasulu wrote:
>> BJ Hargrave wrote:
>>> Another solution to this is to cache resolve state results across 
>>> invocations. You will only need to invalidate the cache if the set of 
>>> installed bundles changes or some other event triggers the resolver 
>>> to modify the state.
>> IMO this is the only option.  Indices have value if the set of 
>> properties you have are relatively constant.  Let me explain:
>> If bundles represent entries with properties which you need to test 
>> for filter evaluation then you need some kind of master table with an 
>> id to properties mapping.  Then you need to build indices mapping into 
>> that master table where the key of the index is a property value, and 
>> the value is the id into the master.  For example you may have 
>> something like this.
>> Master
>> ======
>> 1 | Bundle A Properties
>> 2 | Bundle B Properties
>> 3 | Bundle C Properties
>> 4 | Bundle D Properties
>> 5 | Bundle E Properties
>> ObjectClass Property Index
>> =================
>> Foo | 3
>> Bar | 4
>> XYZ Property Index
>> =================
>> ... etc
>> Then search would use these indices to evaluate the filter.
>> The problem is the properties you'll encounter are unbounded (I maybe 
>> wrong) so the number indices you'll need will be as large as the 
>> number of unique properties used.
>> So I think BJ your idea may be the best option.
> No, it is not for the other reasons I explained in response to BJ's 
> message.
> I think you hit the nail on the head, we don't need to index all of the 
> properties, we only need to index the important ones, like package name 
> and bundle symbolic name, for starters. The set of important properties 
> should be pretty easy to figure out and could possibly be configurable 
> for those who have different use cases. For non-indexed properties, then 
> you can resort to the slow method.

Ok then that's not so bad.  Because essentially ApacheDS has code we can 
use to optimize the search by leveraging indices to reduce the search 
set even when we do not have an index.

To illustrate this with an example take the following search:

(& (abc=foo) (xyz=bar) )

An optimizer will take a scan count of the indices if present and 
annotate the leaf nodes of the AST representation of this filter which 
looks like this:

        & [10]
       / \
      /   \
     /     \
    /       \
abc=foo  xyz=bar
  [20]      [10]

If both indices are present then we get scan counts on both.  The parent 
node inherits the scan count of the lesser of the two child nodes with 
AND operations.  The search engine will loop through all the index 
entries with xyz=bar using a cursor on index xyz and do lookups on the 
abc index to see if each candidate also has xyz=bar.  This way 10 loop 
iterations with 10 lookups is cheaper than 20 loop iterations with 20 
lookups.  If one index is missing:

        & [20]
       / \
      /   \
     /     \
    /       \
abc=foo  xyz=bar
  [20]      [INT_MAX]

In this case the engine will position a cursor on the abc index and 
iterate through all the index entries where the key = foo.  Then each 
value corresponding to an id into the master table is used to lookup the 
properties of the bundle.  The xyz property is pulled out and compared 
to see if we have a match.  So with a missing index we have a partial 
scan of the master table.

With both indices missing we have a full scan of the master table.

I think a partial scan is just fine ... I was just afraid of having too 
many indices without bounds which would make this solution useless.

I might tinker a bit with this.


  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message