lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: WhitespaceTokenizer 4.0 issue
Date Thu, 08 Nov 2012 13:25:55 GMT
Read the documentation of the tokenstream API. its been this way since 2.9

I changed the code intentionally to throw this exception so people
would be forced to fix their code.

On Thu, Nov 8, 2012 at 8:20 AM, Spyros Kapnissis <skapni@yahoo.com> wrote:
> Hello,
>
> Noticed the following issue during our recent code migration to LUCENE_40.
> The test below will fail with an ArrayIndexOutOfBoundsException -1.  It will
> pass only if tokenizer.reset() is called before incrementing the tokens.
>
> @Test
> public void whitespaceTokTest() throws IOException {
>
> String text = "a b c d";
> Tokenizer tokenizer = new WhitespaceTokenizer(Version.LUCENE_40, new
> StringReader(text));
> List<String> tokens = new ArrayList<String>();
> while (tokenizer.incrementToken()) {
> tokens.add(tokenizer.getAttribute(CharTermAttribute.class).toString());
> }
> assertEquals(tokens, Arrays.asList(new String[]{"a","b","c","d"}));
> }
>
> This used to work, at least until LUCENE_33. Is this a bug, or am I missing
> something?
>
> Thank you,
> Spyros

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message