jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kiehl <christ...@sulu3000.de>
Subject Re: Query Performance and Optimization
Date Wed, 14 Mar 2007 18:09:12 GMT
Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>>> I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791
>>
>> Are you working on this issue? Or should I try to implement something?
> 
> I just started working on it ;)

Great news ;)

Now that you are working on implementing this cache on a per index reader basis, 
I got another suggestion for improvement ;)

As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the 
contextHits are used to filter the subHits result to only include nodes of the 
given context. The context is something like /foo/bar//*, which means all 
descendents of /foo/bar. Is that right?
In our application the context for most of our queries is the same, so it would 
make a lot of sense to cache the contextHits for this context. There is already 
a todo in the constructor of DescendantSelfAxisScorer which probably aims at this.
I would go even further and not only cache these contextHits, but cache 
contextHits per _node_ in a hierarchy, which means there is a BitSet for 
/foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for /foo/bar//* 
I could just join the BitSets of the descendents. This would allow reuse the 
BitSets for different contexts. What do you think about this? It should improve 
performance a lot the larger the resultset is an the less specific your context is.

>> It seems like if I rewrite the following query from
>>
>> /foo/*[@foo:bar!='john' and @foo:bar!='doe']
>>
>> to
>>
>> /foo/*[not(@foo:bar='john' or @foo:bar='doe')]
>>
>> I get a better performance. Can you confirm this?
> 
> Yes, I can. Basically because any != comparison is translated into: get 
> all nodes with the given property, then exclude the ones that match the 
> literal. Which is obviously much more expensive than just: get all nodes 
> that match a given literal.

Wouldn't it make sense to rewrite all @foo:bar!='john' queries to 
not(@foo:bar!='john') by default instead of using creating a MatchAllQuery?

Cheers,
Christoph


Mime
View raw message