lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spyros Kapnissis <ska...@yahoo.com>
Subject WhitespaceTokenizer 4.0 issue
Date Thu, 08 Nov 2012 13:20:46 GMT
Hello,


Noticed the following issue during our recent code migration to LUCENE_40. The test below
will fail with an ArrayIndexOutOfBoundsException -1.  It will pass only if tokenizer.reset()
is called before incrementing the tokens. 

@Test
public void whitespaceTokTest() throws IOException {

String text = "a b c d";
Tokenizer tokenizer = new WhitespaceTokenizer(Version.LUCENE_40, new StringReader(text));
List<String> tokens = new ArrayList<String>();
while (tokenizer.incrementToken()) {
tokens.add(tokenizer.getAttribute(CharTermAttribute.class).toString());
}
assertEquals(tokens, Arrays.asList(new String[]{"a","b","c","d"}));

}

This used to work, at least until LUCENE_33. Is this a bug, or am I missing something? 

Thank you, 
Spyros
Mime
View raw message