lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-693) ConjunctionScorer - more tuneup
Date Tue, 24 Oct 2006 05:17:19 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-693?page=all ]

Yonik Seeley updated LUCENE-693:
--------------------------------

    Attachment: conjunction.patch

Here's a patch that:
1) nails things down in the constructor (removes incremental add code)
2) removes sorting
3) always skips to the highest docid seen as opposed to "target"

It should be faster in almost all cases, but I'll make a performance test to verify tomorrow.

Notes: the docs[] array could be removed... I started out with it because you can't currently
depend on doc() to tell you the position because the javadoc says it's undefined at first
(as opposed to -1).  So I switched to using a docs[] array and initialized that to -1, but
then learned that calling skipTo() before calling next() doesn't always work.

> ConjunctionScorer - more tuneup
> -------------------------------
>
>                 Key: LUCENE-693
>                 URL: http://issues.apache.org/jira/browse/LUCENE-693
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.1
>         Environment: Windows Server 2003 x64, Java 1.6, pretty large index
>            Reporter: Peter Keegan
>         Attachments: conjunction.patch
>
>
> (See also: #LUCENE-443)
> I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new
bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning
the Scorers array on every sort, which is quite expensive on large indexes because of the
size of the 'norms' array within, and isn't necessary. 
> Here is one possible solution:
>   private void sortScorers() {
> // squeeze the array down for the sort
> //    if (length != scorers.length) {
> //      Scorer[] temps = new Scorer[length];
> //      System.arraycopy(scorers, 0, temps, 0, length);
> //      scorers = temps;
> //    }
>     insertionSort( scorers,length );
>     // note that this comparator is not consistent with equals!
> //    Arrays.sort(scorers, new Comparator() {         // sort the array
> //        public int compare(Object o1, Object o2) {
> //          return ((Scorer)o1).doc() - ((Scorer)o2).doc();
> //        }
> //      });
>   
>     first = 0;
>     last = length - 1;
>   }
>   private void insertionSort( Scorer[] scores, int len)
>   {
>       for (int i=0; i<len; i++) {
>           for (int j=i; j>0 && scores[j-1].doc() > scores[j].doc();j--
) {
>               swap (scores, j, j-1);
>           }
>       }
>       return;
>   }
>   private void swap(Object[] x, int a, int b) {
>     Object t = x[a];
>     x[a] = x[b];
>     x[b] = t;
>   }
>  
> The squeezing of the array is no longer needed. 
> We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the
array for common queries, although this probably has less performance impact.
> This change added about 3% to query throughput in my testing.
> Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message