lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <>
Subject Re: Seemingly very difficult to wrap an Analyzer with CharFilter
Date Thu, 13 Jun 2013 00:26:12 GMT
On 6/12/2013 7:02 PM, Steven Schlansker wrote:
> On Jun 12, 2013, at 3:44 PM, Michael Sokolov <> wrote:
>> You may not have noticed that CharFilter extends Reader.  The expected pattern here
is that you chain instances together -- your CharFilter should act as *input* to the Analyzer,
I think.  Don't think in terms of extending these analysis classes (except the base ones designed
for it): compose them so that each consumes the one before it
> Hi Mike,
> Hm, that may work out.  I am a little surprised because I thought the intention is that
you set the Analyzer up as part of the configuration, and when you add documents, the analyzer
takes care of all text processing.  In particular this means that now I have to ensure that
the same transformation is done at query time, and I thought the analyzer abstraction was
supposed to avoid this.
> But if this is how it should be done, it could work.  Thanks for the pointer.
> Steven
Um I'm sorry I was in a hurry and forgot to think... I went back and 
looked at my code and found the pattern was different from what I was 
thinking.  I have:

public final class DefaultAnalyzer extends Analyzer {

     protected TokenStreamComponents createComponents(String fieldName, 
Reader reader) {
         Tokenizer tokenizer = new 
StandardTokenizer(IndexConfiguration.LUCENE_VERSION, reader);
         TokenStream tokenStream =  new 
LowerCaseFilter(IndexConfiguration.LUCENE_VERSION, tokenizer);
         // ASCIIFoldingFilter
         // Stemming
         return new TokenStreamComponents(tokenizer, tokenStream);


You were exactly right that subclassing Analyzer and overriding the 
initReader is the way to go.
The composition I was talking about can happen among filters.  I guess 
you have to duplicate the internals of StandardAnalyzer, but I don't 
think there's all that much in there?

I used AnalyzerWrapper for something -- um switching between multiple 
analyzers based on the input.  But it doesn't allow you to do anything 
with the internals of the analyzer(s) it wraps.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message