lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject pieces missing in reusable analyzers?
Date Mon, 10 Aug 2009 22:10:05 GMT
I had thought that implementing reusable analyzers in solr was going
to be cake... but either I'm missing something, or Lucene is missing
something.

Here's the way that one used to create custom analyzers:

class CustomAnalyzer extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new LowerCaseFilter(new NGramTokenFilter(new
StandardTokenizer(reader)));
  }
}


Now let's try to make this reusable:

class CustomAnalyzer2 extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new LowerCaseFilter(new NGramTokenFilter(new
StandardTokenizer(reader)));
  }

  @Override
  public TokenStream reusableTokenStream(String fieldName, Reader
reader) throws IOException {
    TokenStream ts = getPreviousTokenStream();
    if (ts == null) {
      ts = tokenStream(fieldName, reader);
      setPreviousTokenStream(ts);
      return ts;
    } else {
      // uh... how do I reset a token stream?
      return ts;
    }
  }
}


See the missing piece?  Seems like TokenStream needs a reset(Reader r)
method or something?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message