jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: DescendantSelfAxisWeight ChildAxisQuery performance
Date Fri, 30 Nov 2007 11:14:26 GMT

> > This is not my point. Wether you have an order by or not, 
> lucene will 
> > compute the score of all hits anyway. So, no order by, does 
> not mean 
> > that lucene does not order: it orders on score (but ofcourse you 
> > already know that :-) ) So, my thing holds with and without 
> order by.

> Marcel Reutegger wrote:
> WRT lucene this is correct. but the same is not true for JCR. 
> if there is no order by the implementation is free to return 
> the nodes in any order.

True, but lucene will sort is anyway accoring score AFAICS. So, the
implementation is tree to return it in any order, but AFAIU, lucene
still returns the Hits sorted according score. And, this is ofcourse
important in the case of textsearches with contains().

> I did a quick test and wrote a custom IndexSearcher (see 
> below), which collects only the first n matching documents. 
> the test query then executed much faster because the number 
> of DescendantSelfAxisScorer.isValid() calls dropped drastically.
> There is one drawback though. you don't know the total number 
> of results. in this case it might be OK to return -1 for the 
> RangeIterator.getSize().

Yes, true. And, we have to take into account that people might have an

> the order by is more difficult to solve. what we could try is 
> order the result of the sub query first and then run the 
> descendant axis test against the context nodes. 
> DescendantSelfAxisQuery does not add nodes to the sub query 
> but only limits the set subsequent ordering can be skipped. 
> this requires that we need to pass along ordering information 
> with the scorer. e.g. index-order, relevance, property.

That is what I meant with the 'lazy filter', in which we start filtering
according paths *after* the fast initial result set returned by lucene.

> In any case we should create a jira issue for it.

I can fetch some snippets of this discussion and add it to JCR-1196
[Queries for DescendantSelfAxisWeight/ChildAxisQuery are currently very
heavy and become slow pretty quickly], or do you want a new issue?


> regards
>   marcel
> public class JackrabbitIndexSearcher extends IndexSearcher {
>      private final IndexReader reader;
>      public JackrabbitIndexSearcher(IndexReader r) {
>          super(r);
>          this.reader = r;
>      }
>      // inherit javadoc
>      public TopDocs search(Weight weight, Filter filter, int nDocs)
>              throws IOException {
>          TopDocCollector collector = new TopDocCollector(nDocs);
>          Scorer scorer = weight.scorer(reader);
>          if (scorer != null) {
>              while (scorer.next() && nDocs-- > 0) {
>                collector.collect(scorer.doc(), scorer.score());
>              }
>          }
>          return collector.topDocs();
>      }
> }

View raw message