Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 62605 invoked from network); 8 Sep 2009 23:02:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Sep 2009 23:02:23 -0000 Received: (qmail 72561 invoked by uid 500); 8 Sep 2009 23:02:22 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 72474 invoked by uid 500); 8 Sep 2009 23:02:22 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 72466 invoked by uid 99); 8 Sep 2009 23:02:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2009 23:02:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Sep 2009 23:02:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D81F4234C1EC for ; Tue, 8 Sep 2009 16:01:57 -0700 (PDT) Message-ID: <1050138266.1252450917884.JavaMail.jira@brutus> Date: Tue, 8 Sep 2009 16:01:57 -0700 (PDT) From: "Uwe Schindler (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1903) Incorrect ShingleFilter behavior when outputUnigrams == false In-Reply-To: <1549628094.1252443357559.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752813#action_12752813 ] Uwe Schindler commented on LUCENE-1903: --------------------------------------- Chris: Could you test this patch, if it works as exspected also for you. Maybe I found a fix only valid for your testcase but not other cases. In my opinion, the code works now identical to the 2.4.1 one (without the output buffer). Unigrams are simply detected by shingleBufferPosition==0. The position increments are also tested by your code and the implementation also looks right. In principle, there was only missing the increment of the shingleBufferPosition if no unigrams are provided. Mark: If you want to build RC3 soon, just assign yourself and commit this fix. I will go to bed now. I will commit this tomorrow, if you hadn't. A CHANGES.txt entry is not needed in my opinion, as this is not a new feature or a bug from 2.4.1. > Incorrect ShingleFilter behavior when outputUnigrams == false > ------------------------------------------------------------- > > Key: LUCENE-1903 > URL: https://issues.apache.org/jira/browse/LUCENE-1903 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/analyzers > Affects Versions: 2.9 > Reporter: Chris Harris > Fix For: 2.9 > > Attachments: LUCENE-1903.patch, LUCENE-1903_testcases.patch, LUCENE-1903_testcases_lucene2_4_1_version.patch, TEST-org.apache.lucene.analysis.shingle.ShingleFilterTest.xml > > > ShingleFilter isn't working as expected when outputUnigrams == false. In particular, it is outputting unigrams at least some of the time when outputUnigrams==false. > I'll attach a patch to ShingleFilterTest.java that adds some test cases that demonstrate the problem. > I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false got changed when the class was upgraded to the new TokenStream API? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org