lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4343) Clear up more Tokenizer.setReader/TokenStream.reset issues
Date Thu, 30 Aug 2012 11:44:08 GMT


Michael McCandless commented on LUCENE-4343:


We had a lot of tokenizers abusing setReader for stuff they should be doing in reset!

It would be really nice to make setReader final but that sounds like a challenge...
> Clear up more Tokenizer.setReader/TokenStream.reset issues
> ----------------------------------------------------------
>                 Key: LUCENE-4343
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>         Attachments: LUCENE-4343.patch
> spinoff from user-list thread.
> I think the rename helps, but the javadocs still have problems: they seem to only describe
a totally wacky case (CachingTokenFilter) and not the normal case.
> Ideally setReader would be final I think, but there are a few crazy tokenstreams to fix
before I could make that work. Would also need something hackish so MockTokenizer's state
machine is still functional.
> But i worked on fixing up the mess in our various tokenstreams, which is easy for the
most part.
> As part of this I found it was really useful in flushing out test bugs (ones that dont
use MockTokenizer, which they really should), if we can do some best-effort exceptions when
the consumer is broken and it costs nothing.
> For example:
> {noformat}
> -  private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0;
> +  // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't call reset()
> +  private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0;
> {noformat}
> I think this is worth exploring more... this was really effective at finding broken tests
etc. We should see if we can be more thorough/ideally throw better exceptions when consumers
are broken and its free.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message