lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/
Date Wed, 16 Nov 2005 20:36:57 GMT
On Tuesday 15 November 2005 23:45, Yonik Seeley wrote:
> Totally untested, but here is a hack at what the scorer might look
> like when the number of terms is large.
> 
> -Yonik
> 
> 
> package org.apache.lucene.search;
> 
> import org.apache.lucene.index.TermEnum;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.TermDocs;
> 
> import java.io.IOException;
> 
> /**
>  * @author yonik
>  * @version $Id$
>  */
> public class MultiTermScorer extends Scorer{
>   protected final float[] scores;
>   protected int pos;
>   protected float docScore;
> 
>   public MultiTermScorer(Similarity similarity, IndexReader reader,
> Weight w, TermEnum terms, byte[] norms, boolean include_idf, boolean
> include_tf) throws IOException {
>     super(similarity);
>     float weightVal = w.getValue();
>     int maxDoc = reader.maxDoc();
>     this.scores = new float[maxDoc];
>     float[] normDecoder = Similarity.getNormDecoder();
> 
>     TermDocs tdocs = reader.termDocs();

This part is only needed at the top level of the query, so
one could implement in this optimization hook of BooleanScorer:

  /** Expert: Collects matching documents in a range.
   * <br>Note that {@link #next()} must be called once before this method is
   * called for the first time.
   * @param hc The collector to which all matching documents are passed 
through
   * {@link HitCollector#collect(int, float)}.
   * @param max Do not score documents past this.
   * @return true if more matching documents may remain.
   */
  protected boolean score(HitCollector hc, int max) throws IOException {
...
  }

>     while (terms.next()) {
>       tdocs.seek(terms);

terms.term() iirc.

>       float termScore = weightVal;
>       if (include_idf) {
>         termScore *= similarity.idf(terms.docFreq(),maxDoc);
>       }
>       while (tdocs.next()) {
>         int doc = tdocs.doc();
>         float subscore = termScore;
>         if (include_tf) subscore *= tdocs.freq();

getSimilarity().tf(tdocs.freq());

>         if (norms!=null) subscore *= normDecoder[norms[doc&0xff]];
>         scores[doc] += subscore;

The scores[] array is the pain point, but when it can be used
this can be generalized to DisjunctionSumScorer, so it would
work for all disjunctions, not only terms.

I think it is possible to implement this hook for
DisjunctionSumScorer with a scores[] array, iterating over the
subscorers one by one.
Getting that hook called through BooleanScorer2 is no problem
when the coordination factor can be left out.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message