lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Bug in StandardAnalyzer + StopAnalyzer?
Date Sun, 15 Nov 2009 18:33:41 GMT
Uwe, I will throw another twist at it: I do not like the name of
Tokenizer.reset(Reader) method.
I wish it was called .setReader() or something else, I think it is confusing
for Tokenizer to have .reset() and .reset(Reader), when the latter should
hardly ever be overridden.

Great example of how insane this is right now, is StandardTokenizer:

 @Override
  public void reset(Reader reader) throws IOException {
    super.reset(reader);
    reset();
  }

LOL!

On Sun, Nov 15, 2009 at 12:45 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

>  Yes, but on the other hand it does not hurt to automaticall reset in the
> analyzer.... **krr** I do not know how to proceed. I think we should keep
> it as it was since the beginning of Lucene (call to reset inside analyzer,
> QP) and document it correctly.
>
>
>
> You are right, at the beginning, BaseTokenStreamTestCase and many other
> hardcoded tests did not call reset. Now, the test also calls end() and
> close().
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Sunday, November 15, 2009 6:40 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> ok, at one point i do not think BaseTokenStreamTestCase did.
>
> if this is the case, then its the consumer's responsibility to call reset,
> and we should remove extra resets() inside reusableTokenStream() from
> analyzers that have it... and probably improve the docs of this contract.
>
> On Sun, Nov 15, 2009 at 12:17 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> Even QueryParser calls reset() as first call. Also BaseTokenStreamTestCase
> does it.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Sunday, November 15, 2009 6:14 PM
>
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Uwe, not so sure it doesn't need to be there, what about other consumers
> such as QueryParser?
>
> On Sun, Nov 15, 2009 at 12:02 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> I checked again, reset() on the top filter does not need to be there, as
> the indexer calls it automatically as the first call after
> reusableTokenStream. For reusing only reset(Reader) must be called. It’s a
> little bit strange that both methods have the same name, the reset(Reader)
> one has a completely different meaning.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Uwe Schindler [mailto:uwe@thetaphi.de]
> *Sent:* Sunday, November 15, 2009 5:56 PM
>
>
> *To:* java-dev@lucene.apache.org
>
> *Subject:* RE: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> It should be there... But ist unimplemented in the TokenFilters used by
> Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk
> with Robert Muir about it.
>
>
>
> Uwe
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Eran Sevi [mailto:eransevi@gmail.com]
> *Sent:* Sunday, November 15, 2009 5:51 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Good point. I missed that part :) since only the tokenizer uses the reader,
> we must call it directly.
>
> So the reset() on the filteredTokenStream was omitted on purpose because
> there's not underlying implementation? or is it really missing?
>
> On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> It must call both reset on the top-level TokenStream and reset(Reader) on
> the Tokenizer-. If the latter is not done, how should the TokenStream get
> his new Reader?
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Eran Sevi [mailto:eransevi@gmail.com]
> *Sent:* Sunday, November 15, 2009 5:19 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Hi,
> when changing my code to support the not-so-new reusableTokenStream I
> noticed that in the cases when a SavedStream class was used in an analyzer
> (Standard,Stop and maybe others as well) the reset() method is called on the
> tokenizer instead of on the filter.
>
> The filter implementation of reset() calls the inner filters+input reset()
> methods, but the tokenizer reset() method can't do that.
> I think this bug hasn't caused any errors so far since none of the filters
> used in the analyzers overrides the reset() method, but it might cause
> problems if the implementation changes in the future.
>
> the fix is very simple. for example (in StandardAnalyzer):
>
> if (streams == null) {
>       streams = new SavedStreams();
>       setPreviousTokenStream(streams);
>       streams.tokenStream = new StandardTokenizer(matchVersion, reader);
>       streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
>       streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
>       streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
>     } else {
>       streams.tokenStream.reset(reader);
>     }
>
> should become:
>
> if (streams == null) {
>       streams = new SavedStreams();
>       setPreviousTokenStream(streams);
>       streams.tokenStream = new StandardTokenizer(matchVersion, reader);
>       streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
>       streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
>       streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
>     } else {
>       streams.filteredTokenStream.reset(); // changed line.
>     }
>
>
> What do you think?
>
> Eran.
>
>
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message