Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 2033 invoked from network); 5 Aug 2008 15:19:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Aug 2008 15:19:44 -0000 Received: (qmail 18928 invoked by uid 500); 5 Aug 2008 15:19:37 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 18892 invoked by uid 500); 5 Aug 2008 15:19:37 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 18883 invoked by uid 99); 5 Aug 2008 15:19:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 08:19:37 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Aug 2008 15:18:50 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8001A234C196 for ; Tue, 5 Aug 2008 08:18:46 -0700 (PDT) Message-ID: <1062897664.1217949526523.JavaMail.jira@brutus> Date: Tue, 5 Aug 2008 08:18:46 -0700 (PDT) From: "DM Smith (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1350) SnowballFilter resets the payload In-Reply-To: <496705649.1217925526033.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619925#action_12619925 ] DM Smith commented on LUCENE-1350: ---------------------------------- When we go to the reuse pattern across all of Lucene, the problem will be nearly everywhere. The pattern for Token after deprecations is removed is: public Token next(Token token) { ... token.clear(); // This clears Payload token.setTermBuffer(newBuffer); ... } In https://issues.apache.org/jira/browse/LUCENE-1333, I've changed snowballs next(Token token) to be this pattern. Using clone is probably not the best. The following pattern works: public Token next(Token token) { ... Payload payload = token.getPayload(); token.clear(); // This clears Payload token.setTermBuffer(newBuffer); token.setPayload(payload); ... } If payload is to be preserved in the face of the reuse pattern, perhaps clear() should not clear Payload. Since Payload is experimental and marked as subject to change, I don't think that this break of backward compatibility should be an issue. If it is, I think there is a better pattern for Token. The filter order issue concerning payload also pertains to the flags field, which is also marked experimental, and I also think it pertains to type. The most typical pattern of Token reuse is: token.clear(); // reset everything except startOffset, endOffset and type to their defaults. token.setStartOffset(newStartOffset); token.setEndOffset(newEndOffset); token.setType(Token.DEFAULT_TYPE); token.setTermBuffer(newTerm); // or some variation of this. This is rather tedious and I think clear is a bit to agressive with setting payload and flags to their default. I think it would be good to add to Token the following and deprecate clear(): public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset, String type) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(String buffer, int offset, int length, int startOffset, int endOffset, String type) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(String buffer, int startOffset, int endOffset, String type) { setTermBuffer(buffer); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } public void reuse(String buffer, int offset, int length, int startOffset, int endOffset) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } public void reuse(String buffer, int startOffset, int endOffset) { setTermBuffer(buffer); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } > SnowballFilter resets the payload > --------------------------------- > > Key: LUCENE-1350 > URL: https://issues.apache.org/jira/browse/LUCENE-1350 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/* > Reporter: Doron Cohen > Assignee: Doron Cohen > Attachments: LUCENE-1350.patch > > > Passing tokens with payloads through SnowballFilter results in tokens with no payloads. > A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient. > Patch to follow that preserves the payload. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org