lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <>
Subject Re: Seemingly very difficult to wrap an Analyzer with CharFilter
Date Wed, 12 Jun 2013 22:44:48 GMT
You may not have noticed that CharFilter extends Reader.  The expected 
pattern here is that you chain instances together -- your CharFilter 
should act as *input* to the Analyzer, I think.  Don't think in terms of 
extending these analysis classes (except the base ones designed for it): 
compose them so that each consumes the one before it


On 6/11/2013 7:52 PM, Steven Schlansker wrote:
> Hi everyone,
> I am trying to add a CharFilter to my Analyzer.  I started with a StandardAnalyzer wrapped
with an ASCIIFoldingFilter.  Then I realized that it does not handle searches for names that
include punctuation well, for example I want a PrefixQuery "pf" to match "P.F. Chang's" or
"zaras" to match "Zara's".
> It seems that the easiest plan of attack here is to filter out all punctuation before
analysis.  Per the Analyzer package documentation, that means I should use a CharFilter.
> However, it seems next to impossible to actually insert a CharFilter into the analyzer!
> The JavaDoc for Analyzer.initReader says "Override this if you want to insert a CharFilter".
> If my code extends Analyzer, I can extend initReader but I cannot delegate createComponents
to my base StandardAnalyzer, as it is protected.  I cannot delegate tokenStream to my base
analyzer, because it is final.  So a subclass of Analyzer seemingly cannot use another Analyzer
to do its dirty work.
> There is an AnalyzerWrapper class that seems perfect for what I want!  I can provide
a base analyzer and only override the pieces that I want.  Except … initReader is overridden
already to delegate to the base analyzer, and this override is "final"!  Bummer!
> I guess I could have my Analyzer be in the org.apache.lucene.analyzers package and then
I can access the protected createComponents method, but this seems like a disgustingly hacky
way to bypass the public API that I really should use.
> Am I missing something glaring here?  How can I amend a StandardAnalyzer to use a custom
> Thanks for any guidance,
> Steven
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message