Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 69371 invoked from network); 16 Aug 2010 21:46:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Aug 2010 21:46:42 -0000 Received: (qmail 38464 invoked by uid 500); 16 Aug 2010 21:46:41 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 38396 invoked by uid 500); 16 Aug 2010 21:46:40 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 38385 invoked by uid 99); 16 Aug 2010 21:46:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Aug 2010 21:46:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Aug 2010 21:46:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o7GLkGAR021621 for ; Mon, 16 Aug 2010 21:46:16 GMT Message-ID: <9442703.377411281995176464.JavaMail.jira@thor> Date: Mon, 16 Aug 2010 17:46:16 -0400 (EDT) From: "Yonik Seeley (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (SOLR-2051) analysis.jsp is incorrect for protWords etc In-Reply-To: <13937528.370151281981627820.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899109#action_12899109 ] Yonik Seeley commented on SOLR-2051: ------------------------------------ Ah, yeah, good catch! bq. i wonder if we can use tee/sinks to do this cleaner? Insert a tap after each filter? Yeah, might be safer by more closely emulating how the analysis actually works. For example, if someone develops some whacky filters that rely on thread locals to pass info or something. Since it looks like you've fixed it already, I'd just commit that though. > analysis.jsp is incorrect for protWords etc > ------------------------------------------- > > Key: SOLR-2051 > URL: https://issues.apache.org/jira/browse/SOLR-2051 > Project: Solr > Issue Type: Bug > Components: web gui > Affects Versions: 3.1, 4.0 > Reporter: Robert Muir > Attachments: SOLR-2051.patch, SOLR-2051.patch > > > Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt" or the like. > This is because this is now implemented with KeywordAttribute (so you can easily override any stemmer etc). > For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it being stemmed to "foobar", even though this doesnt actually happen. > The problem is that this jsp is downconverting the entire tokenstream to Token in between processing, so it silently discards KeywordAttribute and you get the wrong result. > Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which would be a new feature). Its about not throwing them away so that the analysis actually represents what happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org