lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
Date Wed, 21 Mar 2012 00:52:39 GMT


Robert Muir commented on LUCENE-3894:

I think we have bugs in some tokenizers. Its currently very hard to reproduce and we get no
random seed :(

I think the issue is the maxWordLength=20. This is not long enough to catch bugs in tokenizers
I think,
we should exceed whatever buffersize they use for example.

So I think we need to refactor this logic so that the multithreaded tests take maxWordLength,
and ensure
this parameter is always respected.

This way, tests for things like tokenizers can bump this up to things like CharTokenizer.IO_BUFFER_SIZE*2
or whatever makes sense to them, to ensure we really test them well.

I don't like the fact that only my stupid trivial test (testHugeDoc) found the IO-311 bug,
what if we
didn't have that silly test? 

I'll add a patch.
> Make BaseTokenStreamTestCase a bit more evil
> --------------------------------------------
>                 Key: LUCENE-3894
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>         Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
> Throw an exception from the Reader while tokenizing, stop after not consuming all tokens,
sometimes spoon-feed chars from the reader...

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message