lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: PayloadAttribute behavior change between Lucene 2.9/3.0 and the trunk
Date Sat, 04 Dec 2010 23:22:09 GMT
On Sat, Dec 4, 2010 at 6:05 PM, Teruhiko Kurosaka <> wrote:
> Thank you, Robert, substituting getAttribute with addAttribute worked!
> But I don't understand why.  Could you help me to understand the mechanics?
> In my setting,
> hasAttribute(PayloadAttribute.class) returns false.
> So I thought addAttribute(PayloadAttribute.class) would just
> create a new PayloadAttribute object.  It would remedy the
> Exception, but it wouldn't do any good accessing the payload
> generated upstream.
> But the newly generated PayloadAttribute t is actually
> getting the payload that was generated upstream (by my Tokenizer).
> How is this possible?

Attributes are shared for the entire analysis chain.
It is best to think of getAttribute as "get a reference to an
already-added attribute".

And to think of addAttribute as "if the attribute already exists,
return a reference to it, otherwise add it to the chain and return a
reference to that".

In other words, in the entire Analyzer, there can only be one
PayloadAttribute. Because it is shared, it does not matter who calls

So, its best to always use addAttribute in your constructor.

The simplest way to see why this is good: imagine if someone was to
use your TokenFilter with say a WhitespaceTokenizer that does not add
PayloadAttribute. Then your filter would not produce any error, the
PayloadAttribute would just be empty as you expect.

The reason your code worked with getAttribute in Lucene 2.9 is to
provide backwards-compatibility with the Token API:  the 6 attributes
from Token were always automatically added: TermAttribute,
OffsetAttribute, PositionIncrementAttribute, PayloadAttribute,
TypeAttribute, FlagsAttribute. You can see this by looking at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message