lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <>
Subject Re: Analyzer forcing tokenStream and reusableTokenStream to be final
Date Tue, 19 Oct 2010 16:26:47 GMT

On Oct 19, 2010, at 12:20 PM, Robert Muir wrote:

> On Tue, Oct 19, 2010 at 12:17 PM, DM Smith <> wrote:
>> I'd be surprised if there are use cases for non-reuse.
>> IIRC: When we started down the reuse path, the goal was reuse only, not just reuse
by default. But in order to bridge the past to the future, there was the possibility of continued
non-reuse. In a sense non-reuse was deprecated, but I'm not sure that @deprecated as a mechanism
was able to clearly indicate that.
> Exactly: i don't think theres a clear way to detect that your
> tokenStream() method is "reuse-safe" and deprecate it: e.g. you have
> to implement reset() correctly in your tokenstreams.
> But lets think about this: for non-experts, making Analyzer "reusable
> by default" by removing reusableTokenStream() and reusing
> tokenStream() would probably be the single largest indexing
> performance improvement we could make... the API is so confusing that
> I think many people probably have analyzers that aren't reusing today.
> I think its worth considering a backwards break, especially since as
> Mike mentioned, for the very special (possibly even only theoretical!)
> non-reuse case, there are ways they could still index: but the "fast
> way" should be the "easy/default way".

To me, the backwards break is merely a code break. I can't see how it would break an index.

-- DM

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message