lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: Analyzer forcing tokenStream and reusableTokenStream to be final
Date Tue, 19 Oct 2010 16:17:09 GMT

On Oct 19, 2010, at 11:21 AM, Robert Muir wrote:

> On Tue, Oct 19, 2010 at 11:10 AM, Shai Erera <serera@gmail.com> wrote:
>> Is there real danger in having my analyzer not declaring these methods final
>> - something that can affect Lucene code for example? Or am I only risking my
>> code?
>> 
> 
> There is a real danger: bugs like
> https://issues.apache.org/jira/browse/LUCENE-1678
> 
> I would love for us to re-think the whole
> tokenStream/reusableTokenStream issue...
> 
> If someone doesn't override both (e.g. they just override
> tokenStream), then it wouldnt actually use their subclasses code. So
> then the reflection hack from LUCENE-1678 would force the analyzer to
> never re-use, but instead call tokenStream: but this is very bad for
> indexing performance!
> 
> Are there still real use cases where an analyzer cannot actually
> reuse? For example, all Solr tokenstreams are reused. With an
> application as big and widely used as that having no need for
> non-reusable tokenStream(), I think we should seriously consider
> simplifying the analysis api to be "reusable by default".

I'd be surprised if there are use cases for non-reuse.

IIRC: When we started down the reuse path, the goal was reuse only, not just reuse by default.
But in order to bridge the past to the future, there was the possibility of continued non-reuse.
In a sense non-reuse was deprecated, but I'm not sure that @deprecated as a mechanism was
able to clearly indicate that.

DM


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message