lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: Token retrieval question
Date Wed, 10 Oct 2001 17:56:31 GMT
Doug, thanks for posting these. I may end up going in this direction in 
the next few days and will use this as a blueprint. Maybe I'll end up 
putting in the first pass implementation and then you can later further 
tune it when you get to it.

Question on term numbers through: what would be an approach for merging 
these across multiple IndexReaders for the purposes of MultiSearcher?

Doug Cutting wrote:

>Right now, Lucene does not have good support for what you're doing.  Lucene
>as it stands is designed to support basic search, not other statistical text
>processing.  However there are two features that I would like to add to
>Lucene that would help you.

>This would add an IndexReader two methods:
>  public TermFreqVector getTermFreqVector(int docNumber);
>  public Term getTerm(int termNumber);
>The TermFreqVector class would be defined something like:
>  public class TermFreqVector {
>    public int[] getTermNumbers();
>    public int[] getTermFrequencies();
>  }
>The term number array would be sorted.  The frequency of the term numbered
>getTermNumbers()[i] is getTermFrequencies()[i].
>Another class that would be useful is something like:
>  public class TermWeightVector {
>    public int[] getTermNumbers();
>    public float[] getTermWeights();
>    public void add(TermWeightVector other);
>    public float distance(TermWeightVector other);
>  }

View raw message