lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: indexing help
Date Thu, 08 Jul 2004 19:15:52 GMT
Thanks Doug. I will do just that.

Just for my education, can you maybe elaborate on using the
"implement an IndexReader that delivers a
synthetic index" approach?

Thanks in advance

-John

On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting <cutting@apache.org> wrote:
> John Wang wrote:
> >      The solution you proposed is still a derivative of creating a
> > dummy document stream. Taking the same example, java (5), lucene (6),
> > VectorTokenStream would create a total of 11 Tokens whereas only 2 is
> > neccessary.
> 
> That's easy to fix.  We just need to reuse the token:
> 
> public class VectorTokenStream extends TokenStream {
>   private int term = -1;
>   private int freq = 0;
>   private Token token;
>   public VectorTokenStream(String[] terms, int[] freqs) {
>     this.terms = terms;
>     this.freqs = freqs;
>   }
>   public Token next() {
>     if (freq == 0) {
>       term++;
>       if (term >= terms.length)
>         return null;
>       token = new Token(terms[term], 0, 0);
>       freq = freqs[term];
>     }
>     freq--;
>     return token;
>   }
> }
> 
> Then only two tokens are created, as you desire.
> 
> If you for some reason don't want to create a dummy document stream,
> then you could instead implement an IndexReader that delivers a
> synthetic index for a single document.  Then use
> IndexWriter.addIndexes() to turn this into a real, FSDirectory-based
> index.  However that would be a lot more work and only very marginally
> faster.  So I'd stick with the approach I've outlined above.  (Note:
> this code has not been compiled or run.  It may have bugs.)
> 
> 
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message