Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 9261 invoked from network); 4 Dec 2010 23:23:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Dec 2010 23:23:00 -0000 Received: (qmail 37170 invoked by uid 500); 4 Dec 2010 23:22:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 36958 invoked by uid 500); 4 Dec 2010 23:22:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 36950 invoked by uid 99); 4 Dec 2010 23:22:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 23:22:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 23:22:52 +0000 Received: by bwz9 with SMTP id 9so10400643bwz.35 for ; Sat, 04 Dec 2010 15:22:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=ZeKvEw2hGC0OW+qXqiQUxx2euYV5Rx0dqveCGMFDR1c=; b=noDPQ9bvDmPWur5Vg5Op1cRyAahV+o1gBHNkz9R29W5u/HMSFFd/yMx1OYXi0y8TI+ P6CXD+vCzdQ8unk9NjVTO0WALRHiDFWZ9IUZTFvpLzJtqIlSuvludUgdD6IafbuJzUqG kZZW0s7BFMRCK1NJ3YN1B34mIFzw68D3odhSQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=TvmvMTfYIHzaHET9+gLgHoK4ygFOa3NENvszOByw6Wk7Jvd+r26d7iz0abZ09Nh/ot tBWmoKAh0cOY8lHypkM8m7hvInRhDcpYD8sENrqgODqAEXrsCMhtcQF9wn6VUpDA+yzs +bc1JKNZfHkAxMPBkvnZXrHOTjdQ/6fcZA3o0= Received: by 10.204.47.158 with SMTP id n30mr735622bkf.133.1291504950402; Sat, 04 Dec 2010 15:22:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.77.201 with HTTP; Sat, 4 Dec 2010 15:22:09 -0800 (PST) In-Reply-To: References: <45634479-520B-4BA7-98CE-BE0486F98E58@basistech.com> From: Robert Muir Date: Sat, 4 Dec 2010 18:22:09 -0500 Message-ID: Subject: Re: PayloadAttribute behavior change between Lucene 2.9/3.0 and the trunk To: java-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sat, Dec 4, 2010 at 6:05 PM, Teruhiko Kurosaka wrot= e: > Thank you, Robert, substituting getAttribute with addAttribute worked! > > But I don't understand why. =C2=A0Could you help me to understand the mec= hanics? > > In my setting, > hasAttribute(PayloadAttribute.class) returns false. > > So I thought addAttribute(PayloadAttribute.class) would just > create a new PayloadAttribute object. =C2=A0It would remedy the > Exception, but it wouldn't do any good accessing the payload > generated upstream. > > But the newly generated PayloadAttribute t is actually > getting the payload that was generated upstream (by my Tokenizer). > How is this possible? Attributes are shared for the entire analysis chain. It is best to think of getAttribute as "get a reference to an already-added attribute". And to think of addAttribute as "if the attribute already exists, return a reference to it, otherwise add it to the chain and return a reference to that". In other words, in the entire Analyzer, there can only be one PayloadAttribute. Because it is shared, it does not matter who calls addAttribute. So, its best to always use addAttribute in your constructor. The simplest way to see why this is good: imagine if someone was to use your TokenFilter with say a WhitespaceTokenizer that does not add PayloadAttribute. Then your filter would not produce any error, the PayloadAttribute would just be empty as you expect. The reason your code worked with getAttribute in Lucene 2.9 is to provide backwards-compatibility with the Token API: the 6 attributes from Token were always automatically added: TermAttribute, OffsetAttribute, PositionIncrementAttribute, PayloadAttribute, TypeAttribute, FlagsAttribute. You can see this by looking at TokenStream.initTokenWrapper: http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/src/java/or= g/apache/lucene/analysis/TokenStream.java --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org