jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Quellenberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (JCR-3513) Slower range query execution
Date Fri, 08 Feb 2013 08:05:13 GMT

    [ https://issues.apache.org/jira/browse/JCR-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574307#comment-13574307

Tom Quellenberg commented on JCR-3513:

Hallo Alex,

my stack trace looks like this

Term.compareTo(Term) line: 114	
TermInfosReader.get(Term, boolean) line: 212	
TermInfosReader.get(Term) line: 179	
SegmentTermDocs.seek(Term) line: 57	
DirectoryReader$MultiTermDocs.termDocs(int) line: 1224	
DirectoryReader$MultiTermDocs.read(int[], int[]) line: 1177	
ReadOnlyIndexReader$FilteredTermDocs.read(int[], int[]) line: 257	
DirectoryReader$MultiTermDocs.read(int[], int[]) line: 1182	
MultiTermQueryWrapperFilter<Q>.getDocIdSet(IndexReader) line: 122	
ConstantScoreQuery$ConstantScorer.<init>(ConstantScoreQuery, Similarity, IndexReader,
Weight) line: 122	
ConstantScoreQuery$ConstantWeight.scorer(IndexReader, boolean, boolean) line: 86	
BooleanQuery$BooleanWeight.scorer(IndexReader, boolean, boolean) line: 306	
JackrabbitIndexSearcher(IndexSearcher).search(Weight, Filter, Collector) line: 210	
JackrabbitIndexSearcher(Searcher).search(Query, Collector) line: 67	

My code ends up on a TermInfosReader, too. The conclusion, that the Lucene code does not use
a cache, sounds reasonable to me.

For me there are two solutions:
# change the code, so that lucene uses a cached reader. (I have no idea how to achieve this)
# avoid the usage of the MultiTermQueryWrapperFilter

We go with the second solution and removed the method org.apache.jackrabbit.core.query.lucene.RangeQuery.rewrite(IndexReader).
In the super class, this method returns 'this' and thus the Jackrabbit RangeQuery is used
always. I'm not sure whether this will solve your problem.
> Slower range query execution
> ----------------------------
>                 Key: JCR-3513
>                 URL: https://issues.apache.org/jira/browse/JCR-3513
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>    Affects Versions: 2.4.3
>            Reporter: Tom Quellenberg
>            Assignee: Alex Parvulescu
> After switching from JachRabbit 1.6.4 to 2.4.3 we experienced extreme slow query executions.
All range query on date fields are often 10 times slow than before.
> In our repositories more than 1 million documents are stored which all contain for example
a creation date. Typical queries look like this:
> //element(*, sophora-nt:story)[@sophora:creationDate > ...]
> JackRabbit has its own RangeQuery implementation which is used when Lucene throws a TooManyBooleanClauses-exception
(and in some other situations, too). This worked well in Jackrabbit 1.6. In newer versions
a different Lucene library is used which never throws TooManyBooleanClauses exceptions. Instead,
is has its own fall-back in situations where a BooleanQuery does not work. This fall-back
with a MultiTermQueryWrapperFilter seams to us much slower than the fall-back implementation
in JackRabbit (Does anybody know the reason?). It is the same situation in Jackrabbit 2.6.0
(with Lucene 3.6.0)
> We patched org.apache.jackrabbit.core.query.lucene.RangeQuery to never use org.apache.lucene.search.TermRangeQuery
but always use the JackRabbit implementation. This leads to query executions as fast as in
older Jackrabbit versions.
> Do other people experience this problem? Are there any drawbacks using always the JackRabbit
implementation for range queries? 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message