lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Schlansker <>
Subject Seemingly very difficult to wrap an Analyzer with CharFilter
Date Tue, 11 Jun 2013 23:52:59 GMT
Hi everyone,

I am trying to add a CharFilter to my Analyzer.  I started with a StandardAnalyzer wrapped
with an ASCIIFoldingFilter.  Then I realized that it does not handle searches for names that
include punctuation well, for example I want a PrefixQuery "pf" to match "P.F. Chang's" or
"zaras" to match "Zara's".

It seems that the easiest plan of attack here is to filter out all punctuation before analysis.
 Per the Analyzer package documentation, that means I should use a CharFilter.

However, it seems next to impossible to actually insert a CharFilter into the analyzer!

The JavaDoc for Analyzer.initReader says "Override this if you want to insert a CharFilter".

If my code extends Analyzer, I can extend initReader but I cannot delegate createComponents
to my base StandardAnalyzer, as it is protected.  I cannot delegate tokenStream to my base
analyzer, because it is final.  So a subclass of Analyzer seemingly cannot use another Analyzer
to do its dirty work.

There is an AnalyzerWrapper class that seems perfect for what I want!  I can provide a base
analyzer and only override the pieces that I want.  Except … initReader is overridden already
to delegate to the base analyzer, and this override is "final"!  Bummer!

I guess I could have my Analyzer be in the org.apache.lucene.analyzers package and then I
can access the protected createComponents method, but this seems like a disgustingly hacky
way to bypass the public API that I really should use.

Am I missing something glaring here?  How can I amend a StandardAnalyzer to use a custom CharFilter?

Thanks for any guidance,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message