lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: who clears attributes?
Date Mon, 10 Aug 2009 21:12:47 GMT
It sounds like the 'old' API should stay a bit longer than 3.0. We'd like to
give more people a chance to experiment w/ the new API before we claim it is
the new Analysis API in Lucene. And that means that more users will have to
live w/ the "bit of slowness" more than what is believed in this thread.

I personally worry much about needing to throw away the current API. I'll
have a lot of code to port over and I haven't read anything so far that
convinces me the new API is better. I don't have any problems w/ the current
API today. I feel I have all the flexibility I need w/ indexing fields. I
use payloads, Field.Index constants, write Analyzers, TokenStreams ...
actually I have 0 complaints.

Maybe we should follow what I seem to read from Earwin and Grant - come up
w/ real use cases, try to implement them w/ the current API, then if it's
impossible, discuss how we can make the current API more adaptive. If at the
end of this we'll get back to the new API, then we'll at least feel better
about it, and more convinced it is the way to go.

Hack .. maybe we'll be convinced to base the Luceue analysis on UIMA? :)


On Mon, Aug 10, 2009 at 11:54 PM, Uwe Schindler <> wrote:

> > >> I have serious doubts about releasing this new API until these
> > >> performance issues are resolved and better proven out from a
> > >> usability
> > >> standpoint.
> > >
> > > I think LUCENE-1796 has fixed the performance problems, which was
> > > caused by
> > > a missing reflection-cache needed for bw compatibility. I hope to
> > > commit
> > > soon!
> > >
> > > 2.9 may be a little bit slower when you mix old and new API and do
> > > not reuse
> > > Tokenizers (but Robert is already adding reusableTokenStream to all
> > > contrib
> > > analyzers). When the backwards layer is removed completely or
> > > setOnlyUseNewAPI is enabled, there is no speed impact at all.
> > >
> >
> >
> > The Analysis features of Lucene are the single most common place where
> > people enhance Lucene.  Very few add queries, or muck with field
> > caches, but they do write their own Analyzers and TokenStreams,
> > etc.    Within that, mixing old and new is likely the most common case
> > for everyone who has made their own customizations, so a "little bit
> > slower" is something I'd rather not live with just for the sake of
> > some supposed goodness in a year or two.
> But because of this flexibility, we added the backwards layer. The old
> style
> with setUseNewAPI was not flexible at all, and nobody would move his
> Tokenizers to the new API without that flexibility (maybe he uses external
> analyzer packages not yet updated).
> With "a little bit" I mean the cost of wrapping the old and new API is
> really minimal, it is just an if statement and a method call, hopefully
> optimized away by the JVM. In my tests the standard deviation between
> different test runs was much higher than the difference between mixing
> old/new API (on Win32), so it is not really sure, that the cost comes from
> the delegation.
> The only case that is really slower is (now minimized cost of creation in
> TokenStream.<init>, if you not reuse TokenStreams: Two LinkedHashMaps have
> to be created and setup). But this is not caused by the backwards layer.
> Uwe
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message