Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 14011 invoked from network); 20 Nov 2007 18:50:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Nov 2007 18:50:20 -0000 Received: (qmail 18876 invoked by uid 500); 20 Nov 2007 18:50:04 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 18834 invoked by uid 500); 20 Nov 2007 18:50:04 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 18823 invoked by uid 99); 20 Nov 2007 18:50:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 10:50:04 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 18:49:52 +0000 Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 738D64CA07 for ; Tue, 20 Nov 2007 13:49:45 -0500 (EST) Received: from web8.messagingengine.com ([10.202.2.217]) by compute1.internal (MEProxy); Tue, 20 Nov 2007 13:49:45 -0500 Received: by web8.messagingengine.com (Postfix, from userid 99) id 554FD14759; Tue, 20 Nov 2007 13:49:45 -0500 (EST) Message-Id: <1195584585.8778.1222398473@webmail.messagingengine.com> X-Sasl-Enc: nAKvu8X5jvGZZPh/9KJQ7ykBIw06wfv3K5kOPkMy+T9B 1195584585 From: "Michael McCandless" To: java-dev@lucene.apache.org Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 X-Mailer: MessagingEngine.com Webmail Interface References: <30367549.1195584523226.JavaMail.jira@brutus> Subject: Re: [jira] Commented: (LUCENE-1063) Token re-use API breaks back compatibility in certain TokenStream chains In-Reply-To: <30367549.1195584523226.JavaMail.jira@brutus> Date: Tue, 20 Nov 2007 13:49:45 -0500 X-Virus-Checked: Checked by ClamAV on apache.org Will do ... Mike "Yonik Seeley (JIRA)" wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544005 > ] > > Yonik Seeley commented on LUCENE-1063: > -------------------------------------- > > Could we make this a little more concrete by creating a simple test case > that fails? > > > > Token re-use API breaks back compatibility in certain TokenStream chains > > ------------------------------------------------------------------------ > > > > Key: LUCENE-1063 > > URL: https://issues.apache.org/jira/browse/LUCENE-1063 > > Project: Lucene - Java > > Issue Type: Bug > > Components: Analysis > > Affects Versions: 2.3 > > Reporter: Michael McCandless > > Assignee: Michael McCandless > > Fix For: 2.3 > > > > > > In scrutinizing the new Token re-use API during this thread: > > http://www.gossamer-threads.com/lists/lucene/java-dev/54708 > > I realized we now have a non-back-compatibility when mixing re-use and > > non-re-use TokenStreams. > > The new "reuse" next(Token) API actually allows two different aspects > > of re-use: > > 1) "Backwards re-use": the subsequent call to next(Token) is allowed > > to change all aspects of the provided Token, meaning the caller > > must do all persisting of Token that it needs before calling > > next(Token) again. > > 2) "Forwards re-use": the caller is allowed to modify the returned > > Token however it wants. Eg the LowerCaseFilter is allowed to > > downcase the characters in-place in the char[] termBuffer. > > The forwards re-use case can break backwards compatibility now. EG: > > if a TokenStream X providing only the "non-reuse" next() API is > > followed by a TokenFilter Y using the "reuse" next(Token) API to pull > > the tokens, then the default implementation in TokenStream.java for > > next(Token) will kick in. > > That default implementation just returns the provided "private copy" > > Token returned by next(). But, because of 2) above, this is not > > legal: if the TokenFilter Y modifies the char[] termBuffer (say), that > > is actually modifying the cached copy being potentially stored by X. > > I think the opposite case is handled correctly. > > A simple way to fix this is to make a full copy of the Token in the > > next(Token) call in TokenStream, just like we do in the next() method > > in TokenStream. The downside is this is a small performance hit. However > > that hit only happens at the boundary between a non-reuse and a re-use > > tokenizer. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org