Hi everyone,
I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the ASCIIFoldingFilter.
The token stream handling has changed significantly since them, and I cannot figure out what
I am doing wrong.
It seems that I should extend AnalyzerWrapper so that I can intercept the TokenStream and
filter it with the ASCIIFoldingFilter.
I have written the following code:
public final class TokenFilterAnalyzerWrapper extends AnalyzerWrapper {
private final Analyzer baseAnalyzer;
private final TokenFilterFactory tokenFilterFactory;
public TokenFilterAnalyzerWrapper(Analyzer baseAnalyzer, TokenFilterFactory tokenFilterFactory)
{
this.baseAnalyzer = baseAnalyzer;
this.tokenFilterFactory = tokenFilterFactory;
}
@Override
public void close() {
baseAnalyzer.close();
super.close();
}
@Override
protected Analyzer getWrappedAnalyzer(String fieldName)
{
return baseAnalyzer;
}
@Override
protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents
components)
{
return new TokenStreamComponents(components.getTokenizer(), tokenFilterFactory.create(components.getTokenStream()));
}
}
and the following test case:
public class TokenFilterAnalyzerWrapperTest
{
@Test
public void testFilter() throws Exception
{
char[] expected = {'a', 'e', 'i', 'o', 'u'};
try (Analyzer analyzer = new TokenFilterAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_42),
new ASCIIFoldingFilterFactory())) {
TokenStream stream = analyzer.tokenStream("test", new StringReader("a é î ø
ü"));
for (int i = 0; i < 5; i++) {
assertTrue(stream.incrementToken());
assertEquals(Character.toString(expected[i]), stream.getAttribute(CharTermAttribute.class).toString());
}
assertFalse(stream.incrementToken());
}
}
}
but all I can produce is this NullPointerException:
java.lang.NullPointerException
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:180)
at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49)
at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:71)
at xyz.search.lucene.TokenFilterAnalyzerWrapperTest.testFilter(TokenFilterAnalyzerWrapperTest.java:27)
StandardTokenizerImpl.java:923 is
/* finally: fill the buffer with new input */
int numRead = zzReader.read(zzBuffer, zzEndRead,
zzBuffer.length-zzEndRead);
The "reader" is clearly the unexpectedly null value, however I cannot figure out how to set
it correctly.
Through experimentation, it seems that I can evade some problems by calling reset() and setReader()
at various points.
However I always end up at some other exception buried deep within, so I believe I am still
missing some piece of the puzzle.
Any help greatly appreciated!
Thanks,
Steven
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|