lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Schlansker <ste...@likeness.com>
Subject Using an AnalyzerWrapper with ASCIIFoldingFilter
Date Fri, 15 Mar 2013 18:18:26 GMT
Hi everyone,

I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the ASCIIFoldingFilter.
The token stream handling has changed significantly since them, and I cannot figure out what
I am doing wrong.

It seems that I should extend AnalyzerWrapper so that I can intercept the TokenStream and
filter it with the ASCIIFoldingFilter.

I have written the following code:

public final class TokenFilterAnalyzerWrapper extends AnalyzerWrapper {
    private final Analyzer baseAnalyzer;
    private final TokenFilterFactory tokenFilterFactory;

    public TokenFilterAnalyzerWrapper(Analyzer baseAnalyzer, TokenFilterFactory tokenFilterFactory)
{
        this.baseAnalyzer = baseAnalyzer;
        this.tokenFilterFactory = tokenFilterFactory;
    }

    @Override
    public void close() {
        baseAnalyzer.close();
        super.close();
    }

    @Override
    protected Analyzer getWrappedAnalyzer(String fieldName)
    {
        return baseAnalyzer;
    }

    @Override
    protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents
components)
    {
        return new TokenStreamComponents(components.getTokenizer(), tokenFilterFactory.create(components.getTokenStream()));
    }
}

and the following test case:

public class TokenFilterAnalyzerWrapperTest
{
    @Test
    public void testFilter() throws Exception
    {
        char[] expected = {'a', 'e', 'i', 'o', 'u'};
        try (Analyzer analyzer = new TokenFilterAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_42),
new ASCIIFoldingFilterFactory())) {
            TokenStream stream = analyzer.tokenStream("test", new StringReader("a é î ø
ü"));

            for (int i = 0; i < 5; i++) {
                assertTrue(stream.incrementToken());
                assertEquals(Character.toString(expected[i]), stream.getAttribute(CharTermAttribute.class).toString());
            }

            assertFalse(stream.incrementToken());
        }
    }
}

but all I can produce is this NullPointerException:
java.lang.NullPointerException
	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
	at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:180)
	at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49)
	at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
	at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
	at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:71)
	at xyz.search.lucene.TokenFilterAnalyzerWrapperTest.testFilter(TokenFilterAnalyzerWrapperTest.java:27)

StandardTokenizerImpl.java:923 is
    /* finally: fill the buffer with new input */
    int numRead = zzReader.read(zzBuffer, zzEndRead,
                                            zzBuffer.length-zzEndRead);

The "reader" is clearly the unexpectedly null value, however I cannot figure out how to set
it correctly.

Through experimentation, it seems that I can evade some problems by calling reset() and setReader()
at various points.
However I always end up at some other exception buried deep within, so I believe I am still
missing some piece of the puzzle.

Any help greatly appreciated!

Thanks,
Steven


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message