lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Analyzer forcing tokenStream and reusableTokenStream to be final
Date Tue, 19 Oct 2010 16:20:45 GMT
On Tue, Oct 19, 2010 at 12:17 PM, DM Smith <dmsmith555@gmail.com> wrote:

> I'd be surprised if there are use cases for non-reuse.
>
> IIRC: When we started down the reuse path, the goal was reuse only, not just reuse by
default. But in order to bridge the past to the future, there was the possibility of continued
non-reuse. In a sense non-reuse was deprecated, but I'm not sure that @deprecated as a mechanism
was able to clearly indicate that.
>

Exactly: i don't think theres a clear way to detect that your
tokenStream() method is "reuse-safe" and deprecate it: e.g. you have
to implement reset() correctly in your tokenstreams.

But lets think about this: for non-experts, making Analyzer "reusable
by default" by removing reusableTokenStream() and reusing
tokenStream() would probably be the single largest indexing
performance improvement we could make... the API is so confusing that
I think many people probably have analyzers that aren't reusing today.

I think its worth considering a backwards break, especially since as
Mike mentioned, for the very special (possibly even only theoretical!)
non-reuse case, there are ways they could still index: but the "fast
way" should be the "easy/default way".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message