lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: New Token API was Re: Payloads and TrieRangeQuery
Date Mon, 15 Jun 2009 20:46:20 GMT
Maybe change the deprecation wrapper around next() and next(Token) [the
default impl of incrementToken()] to check, if the retrieved token is not
identical to the attribute and then just copy the contents to the
instance-Token? This would be a slowdown, but only be the case for very rare
TokenStreams that did not reuse token before (and were slow before, too).


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen


From: Michael Busch [] 
Sent: Monday, June 15, 2009 10:39 PM
Subject: Re: New Token API was Re: Payloads and TrieRangeQuery


I have implemented most of that actually (the interface part and Token
implementing all of them).

The problem is a paradigm change with the new API: the assumption is that
there is always only one single instance of an Attribute. With the old API,
it is recommended to reuse the passed-in token, but you don't have to, you
can also return a new one with every call of next().

Now with this change the indexer classes should only know about the
interfaces, if shouldn't know Token anymore, which seems fine when Token
implements all those interfaces. BUT, since there can be more than once
instance of Token, the indexer would have to call getAttribute() for all
Attributes it needs after each call of next(). I haven't measured how
expensive that is, but it seems like a severe performance hit.

That's basically the main reason why the backwards compatibility is ensured
in such a goofy way right now.


On 6/15/09 1:28 PM, Uwe Schindler wrote: 

And I don't like the *useNewAPI*() methods either. I spent a lot of time 
thinking about backwards compatibility for this API. It's tricky to do 
without sacrificing performance. In API patches I find myself spending 
more time for backwards-compatibility than for the actual new feature! :(
I'll try to think about how to simplify this confusing old/new API mix.

One solution to fix this useNewAPI problem would be to change the
AttributeSource in a way, to return classes that implement interfaces (as
you proposed some weeks ago). The good old Token class then do not need to
be deprecated, it could simply implement all 4 interfaces. AttributeSource
then must implement a registry, which classes implement which interfaces. So
if somebody wants a TermAttribute, he always gets the Token. In all other
cases, the default could be a *Impl default class.
In this case, next() could simply return this Token (which is the all 4
attributes). The reuseableToken is simply thrown away in the deprecated API,
the reuseable Token comes from the AttributeSource and is per-instance.
Is this an idea?
To unsubscribe, e-mail:
For additional commands, e-mail:


View raw message