lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Bug in StandardAnalyzer + StopAnalyzer?
Date Sun, 15 Nov 2009 17:17:27 GMT
Even QueryParser calls reset() as first call. Also BaseTokenStreamTestCase
does it.

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Sunday, November 15, 2009 6:14 PM
To: java-dev@lucene.apache.org
Subject: Re: Bug in StandardAnalyzer + StopAnalyzer?

 

Uwe, not so sure it doesn't need to be there, what about other consumers
such as QueryParser?

On Sun, Nov 15, 2009 at 12:02 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

I checked again, reset() on the top filter does not need to be there, as the
indexer calls it automatically as the first call after reusableTokenStream.
For reusing only reset(Reader) must be called. It's a little bit strange
that both methods have the same name, the reset(Reader) one has a completely
different meaning.

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Sunday, November 15, 2009 5:56 PM


To: java-dev@lucene.apache.org

Subject: RE: Bug in StandardAnalyzer + StopAnalyzer?

 

It should be there... But ist unimplemented in the TokenFilters used by
Standard/Stop Analyzer. Buf for consistency it should be there. I'll talk
with Robert Muir about it.

 

Uwe

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
 <http://www.thetaphi.de> http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Eran Sevi [mailto:eransevi@gmail.com] 
Sent: Sunday, November 15, 2009 5:51 PM
To: java-dev@lucene.apache.org
Subject: Re: Bug in StandardAnalyzer + StopAnalyzer?

 

Good point. I missed that part :) since only the tokenizer uses the reader,
we must call it directly.

So the reset() on the filteredTokenStream was omitted on purpose because
there's not underlying implementation? or is it really missing?

On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

It must call both reset on the top-level TokenStream and reset(Reader) on
the Tokenizer-. If the latter is not done, how should the TokenStream get
his new Reader?

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Eran Sevi [mailto:eransevi@gmail.com] 
Sent: Sunday, November 15, 2009 5:19 PM
To: java-dev@lucene.apache.org
Subject: Bug in StandardAnalyzer + StopAnalyzer?

 

Hi,
when changing my code to support the not-so-new reusableTokenStream I
noticed that in the cases when a SavedStream class was used in an analyzer
(Standard,Stop and maybe others as well) the reset() method is called on the
tokenizer instead of on the filter. 

The filter implementation of reset() calls the inner filters+input reset()
methods, but the tokenizer reset() method can't do that.
I think this bug hasn't caused any errors so far since none of the filters
used in the analyzers overrides the reset() method, but it might cause
problems if the implementation changes in the future.

the fix is very simple. for example (in StandardAnalyzer):

if (streams == null) {
      streams = new SavedStreams();
      setPreviousTokenStream(streams);
      streams.tokenStream = new StandardTokenizer(matchVersion, reader);
      streams.filteredTokenStream = new StandardFilter(streams.tokenStream);
      streams.filteredTokenStream = new
LowerCaseFilter(streams.filteredTokenStream);
      streams.filteredTokenStream = new
StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion
),
 
streams.filteredTokenStream, stopSet);
    } else {
      streams.tokenStream.reset(reader);
    }

should become:

if (streams == null) {
      streams = new SavedStreams();
      setPreviousTokenStream(streams);
      streams.tokenStream = new StandardTokenizer(matchVersion, reader);
      streams.filteredTokenStream = new StandardFilter(streams.tokenStream);
      streams.filteredTokenStream = new
LowerCaseFilter(streams.filteredTokenStream);
      streams.filteredTokenStream = new
StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion
),
 
streams.filteredTokenStream, stopSet);
    } else {
      streams.filteredTokenStream.reset(); // changed line.
    }


What do you think?

Eran.

 




-- 
Robert Muir
rcmuir@gmail.com


Mime
View raw message