lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (LUCENE-4343) Clear up more Tokenizer.setReader/TokenStream.reset issues
Date Thu, 30 Aug 2012 19:20:08 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved LUCENE-4343.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   5.0
    
> Clear up more Tokenizer.setReader/TokenStream.reset issues
> ----------------------------------------------------------
>
>                 Key: LUCENE-4343
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4343
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Robert Muir
>             Fix For: 5.0, 4.0
>
>         Attachments: LUCENE-4343.patch, LUCENE-4343.patch, LUCENE-4343.patch, LUCENE-4343.patch
>
>
> spinoff from user-list thread.
> I think the rename helps, but the javadocs still have problems: they seem to only describe
a totally wacky case (CachingTokenFilter) and not the normal case.
> Ideally setReader would be final I think, but there are a few crazy tokenstreams to fix
before I could make that work. Would also need something hackish so MockTokenizer's state
machine is still functional.
> But i worked on fixing up the mess in our various tokenstreams, which is easy for the
most part.
> As part of this I found it was really useful in flushing out test bugs (ones that dont
use MockTokenizer, which they really should), if we can do some best-effort exceptions when
the consumer is broken and it costs nothing.
> For example:
> {noformat}
> -  private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0;
> +  // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't call reset()
> +  private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0;
> {noformat}
> I think this is worth exploring more... this was really effective at finding broken tests
etc. We should see if we can be more thorough/ideally throw better exceptions when consumers
are broken and its free.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message