lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Schlansker <ste...@likeness.com>
Subject Re: Using an AnalyzerWrapper with ASCIIFoldingFilter
Date Fri, 15 Mar 2013 18:36:00 GMT

On Mar 15, 2013, at 11:25 AM, "Uwe Schindler" <uwe@thetaphi.de> wrote:

> Hi,
> 
> The API did not really change.

The API definitely did change, as before you would override the now-final tokenStream method.
 But you are correct that this was not the root of the problem.

> The bug is in your test:
> If you would carefully read the javadocs of the TokenStream interface, you would notice
that your consumer does not follow the correct workflow: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/analysis/TokenStream.html
> 
> In short, before calling incrementToken() the TokenStream must be reset(). This did not
change and was always the case. In earlier Lucene versions, lots of TokenStreams were behaving
wrong, so we made the basic Tokenizers "fail" in some way. The Exception is not really helpful
here, but for performance reasons this was the only way to go.
> 
> Please always take care that the described workflow in the Javadocs is always used from
top to bottom (including end() and close()), otherwise behavior of TokenStreams is not guaranteed
to be correct.
> 


Thank you, this was exactly the problem.  It would be nice if the tokenizers did some checking
of state to catch issues like this, or at least emit a clearer error message, but I definitely
was doing this wrong.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message