lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: indexing help
Date Thu, 08 Jul 2004 17:01:59 GMT
John Wang wrote:
>      The solution you proposed is still a derivative of creating a
> dummy document stream. Taking the same example, java (5), lucene (6),
> VectorTokenStream would create a total of 11 Tokens whereas only 2 is
> neccessary.

That's easy to fix.  We just need to reuse the token:

public class VectorTokenStream extends TokenStream {
   private int term = -1;
   private int freq = 0;
   private Token token;
   public VectorTokenStream(String[] terms, int[] freqs) {
     this.terms = terms;
     this.freqs = freqs;
   }
   public Token next() {
     if (freq == 0) {
       term++;
       if (term >= terms.length)
         return null;
       token = new Token(terms[term], 0, 0);
       freq = freqs[term];
     }
     freq--;
     return token;
   }
}

Then only two tokens are created, as you desire.

If you for some reason don't want to create a dummy document stream, 
then you could instead implement an IndexReader that delivers a 
synthetic index for a single document.  Then use 
IndexWriter.addIndexes() to turn this into a real, FSDirectory-based 
index.  However that would be a lot more work and only very marginally 
faster.  So I'd stick with the approach I've outlined above.  (Note: 
this code has not been compiled or run.  It may have bugs.)

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message