lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Schlansker <>
Subject Using an AnalyzerWrapper with ASCIIFoldingFilter
Date Fri, 15 Mar 2013 18:18:26 GMT
Hi everyone,

I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the ASCIIFoldingFilter.
The token stream handling has changed significantly since them, and I cannot figure out what
I am doing wrong.

It seems that I should extend AnalyzerWrapper so that I can intercept the TokenStream and
filter it with the ASCIIFoldingFilter.

I have written the following code:

public final class TokenFilterAnalyzerWrapper extends AnalyzerWrapper {
    private final Analyzer baseAnalyzer;
    private final TokenFilterFactory tokenFilterFactory;

    public TokenFilterAnalyzerWrapper(Analyzer baseAnalyzer, TokenFilterFactory tokenFilterFactory)
        this.baseAnalyzer = baseAnalyzer;
        this.tokenFilterFactory = tokenFilterFactory;

    public void close() {

    protected Analyzer getWrappedAnalyzer(String fieldName)
        return baseAnalyzer;

    protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents
        return new TokenStreamComponents(components.getTokenizer(), tokenFilterFactory.create(components.getTokenStream()));

and the following test case:

public class TokenFilterAnalyzerWrapperTest
    public void testFilter() throws Exception
        char[] expected = {'a', 'e', 'i', 'o', 'u'};
        try (Analyzer analyzer = new TokenFilterAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_42),
new ASCIIFoldingFilterFactory())) {
            TokenStream stream = analyzer.tokenStream("test", new StringReader("a é î ø

            for (int i = 0; i < 5; i++) {
                assertEquals(Character.toString(expected[i]), stream.getAttribute(CharTermAttribute.class).toString());


but all I can produce is this NullPointerException:
	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(
	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(
	at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(
	at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(
	at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(
	at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(
	at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(
	at is
    /* finally: fill the buffer with new input */
    int numRead =, zzEndRead,

The "reader" is clearly the unexpectedly null value, however I cannot figure out how to set
it correctly.

Through experimentation, it seems that I can evade some problems by calling reset() and setReader()
at various points.
However I always end up at some other exception buried deep within, so I believe I am still
missing some piece of the puzzle.

Any help greatly appreciated!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message