lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup
Date Wed, 07 Nov 2007 22:29:50 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540913
] 

Mike Klaas commented on LUCENE-693:
-----------------------------------

Paul wrote:
> As just discussed on java-dev, the creation of an object during the call to sort could
well be due to the creation
> of a new comparator for each call to sort This might be fixed by keeping a single comparator
around.
> I wouldn't expect any java library sort to create a copy of its argument, but I'm not
sure.

According to http://java.sun.com/j2se/1.5.0/docs/api/java/util/Arrays.html#sort(java.lang.Object[])
, java is using mergesort for this method.  I can't imagine it copying the individual _elements_,
but mergesort does require 2N space in general and so some array copying takes place.

Unfortunately, top-level conjunctions are an important case.

Perhaps one way to proceed is to incorporate some of the refactoring improvements (namely,
determining all scorers at constructor-time) and do some trivial optimizations to the existing
sortScorers method (lift out the ad-hoc Comparator, use insertion sort for N < 5, etc.).
 It might be worthwhile to code versions for common cases, like N=2, with a factory to choose
among them.

> ConjunctionScorer - more tuneup
> -------------------------------
>
>                 Key: LUCENE-693
>                 URL: https://issues.apache.org/jira/browse/LUCENE-693
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.1
>         Environment: Windows Server 2003 x64, Java 1.6, pretty large index
>            Reporter: Peter Keegan
>         Attachments: conjunction.patch, conjunction.patch, conjunction.patch, conjunction.patch.nosort1
>
>
> (See also: #LUCENE-443)
> I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new
bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning
the Scorers array on every sort, which is quite expensive on large indexes because of the
size of the 'norms' array within, and isn't necessary. 
> Here is one possible solution:
>   private void sortScorers() {
> // squeeze the array down for the sort
> //    if (length != scorers.length) {
> //      Scorer[] temps = new Scorer[length];
> //      System.arraycopy(scorers, 0, temps, 0, length);
> //      scorers = temps;
> //    }
>     insertionSort( scorers,length );
>     // note that this comparator is not consistent with equals!
> //    Arrays.sort(scorers, new Comparator() {         // sort the array
> //        public int compare(Object o1, Object o2) {
> //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> //        }
> //      });
>   
>     first = 0;
>     last = length - 1;
>   }
>   private void insertionSort( Scorer[] scores, int len)
>   {
>       for (int i=0; i<len; i++) {
>           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j--
) {
>               swap (scores, j, j-1);
>           }
>       }
>       return;
>   }
>   private void swap(Object[] x, int a, int b) {
>     Object t = x[a];
>     x[a] = x[b];
>     x[b] = t;
>   }
>  
> The squeezing of the array is no longer needed. 
> We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the
array for common queries, although this probably has less performance impact.
> This change added about 3% to query throughput in my testing.
> Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message