lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Sevi <erans...@gmail.com>
Subject Re: Bug in StandardAnalyzer + StopAnalyzer?
Date Sun, 15 Nov 2009 17:10:27 GMT
OK.
Thanks for the clarification.
 it's a bit confusing - maybe the comments should be updated so other people
won't do the same mistake as I did.

On Sun, Nov 15, 2009 at 7:02 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

>  I checked again, reset() on the top filter does not need to be there, as
> the indexer calls it automatically as the first call after
> reusableTokenStream. For reusing only reset(Reader) must be called. It’s a
> little bit strange that both methods have the same name, the reset(Reader)
> one has a completely different meaning.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Uwe Schindler [mailto:uwe@thetaphi.de]
> *Sent:* Sunday, November 15, 2009 5:56 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* RE: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> It should be there... But ist unimplemented in the TokenFilters used by
> Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk
> with Robert Muir about it.
>
>
>
> Uwe
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Eran Sevi [mailto:eransevi@gmail.com]
> *Sent:* Sunday, November 15, 2009 5:51 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Good point. I missed that part :) since only the tokenizer uses the reader,
> we must call it directly.
>
> So the reset() on the filteredTokenStream was omitted on purpose because
> there's not underlying implementation? or is it really missing?
>
> On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> It must call both reset on the top-level TokenStream and reset(Reader) on
> the Tokenizer-. If the latter is not done, how should the TokenStream get
> his new Reader?
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Eran Sevi [mailto:eransevi@gmail.com]
> *Sent:* Sunday, November 15, 2009 5:19 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Hi,
> when changing my code to support the not-so-new reusableTokenStream I
> noticed that in the cases when a SavedStream class was used in an analyzer
> (Standard,Stop and maybe others as well) the reset() method is called on the
> tokenizer instead of on the filter.
>
> The filter implementation of reset() calls the inner filters+input reset()
> methods, but the tokenizer reset() method can't do that.
> I think this bug hasn't caused any errors so far since none of the filters
> used in the analyzers overrides the reset() method, but it might cause
> problems if the implementation changes in the future.
>
> the fix is very simple. for example (in StandardAnalyzer):
>
> if (streams == null) {
>       streams = new SavedStreams();
>       setPreviousTokenStream(streams);
>       streams.tokenStream = new StandardTokenizer(matchVersion, reader);
>       streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
>       streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
>       streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
>     } else {
>       streams.tokenStream.reset(reader);
>     }
>
> should become:
>
> if (streams == null) {
>       streams = new SavedStreams();
>       setPreviousTokenStream(streams);
>       streams.tokenStream = new StandardTokenizer(matchVersion, reader);
>       streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
>       streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
>       streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
>     } else {
>       streams.filteredTokenStream.reset(); // changed line.
>     }
>
>
> What do you think?
>
> Eran.
>
>
>

Mime
View raw message