jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kiehl <christ...@sulu3000.de>
Subject Re: Query Performance and Optimization
Date Wed, 14 Mar 2007 14:33:09 GMT
Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>> Christoph Kiehl wrote:
>>
>>> I was digging a bit into Jackrabbit today and found another place 
>>> where some caching did provide a substantial performance gain to 
>>> queries which check one attribute for more than one value (like 
>>> /foo/*[@foo:bar='john' or foo:bar='doe']). The BitSet in 
>>> calculateDocFilter() is right now created twice for the query above. 
>>> On large repositories this takes about 200ms per BitSet on my machine 
>>> for a particular field. Caching these BitSets per IndexReader and 
>>> field in a WeakHashMap with the IndexReader as a key gave me some 
>>> real improvements. 
> 
> agreed, this should definitively be cached per index segment and is 
> doable with reasonable effort.
> 
> I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791

Are you working on this issue? Or should I try to implement something?

>> - I was referring to calculateDocFilter() in 
>> org.apache.jackrabbit.core.query.lucene.MatchAllScorer
>> - The achieved performance improvement varied between 30-60% depending 
>> on the actual query
> 
> but that means your query is rather:
> 
> /foo/*[@foo:bar]
> 
> right?

Actually it's /foo/*[@foo:bar!='john']

> @foo:bar='john' should be translated into a term query.

You are right. "="-comparisons translate into term queries whereas 
"!="-comparisons gets translated into MatchAllQueries.

It seems like if I rewrite the following query from

/foo/*[@foo:bar!='john' and @foo:bar!='doe']

to

/foo/*[not(@foo:bar='john' or @foo:bar='doe')]

I get a better performance. Can you confirm this?


Cheers,
Christoph


Mime
View raw message