Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 15957 invoked from network); 20 Oct 2010 15:24:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Oct 2010 15:24:50 -0000 Received: (qmail 84780 invoked by uid 500); 20 Oct 2010 15:24:49 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 84574 invoked by uid 500); 20 Oct 2010 15:24:48 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 84472 invoked by uid 99); 20 Oct 2010 15:24:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 15:24:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 15:24:45 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9KFONdn026880 for ; Wed, 20 Oct 2010 15:24:23 GMT Message-ID: <12891260.10761287588263208.JavaMail.jira@thor> Date: Wed, 20 Oct 2010 11:24:23 -0400 (EDT) From: "Grant Ingersoll (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality In-Reply-To: <11826678.525311279830113090.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923004#action_12923004 ] Grant Ingersoll commented on SOLR-2010: --------------------------------------- James, you are right. I mislabeled my merge. Still getting used to this merge from trunk to branch stuff. At any rate, no need for a patch, I will get the merged figured out soon. > Improvements to SpellCheckComponent Collate functionality > --------------------------------------------------------- > > Key: SOLR-2010 > URL: https://issues.apache.org/jira/browse/SOLR-2010 > Project: Solr > Issue Type: New Feature > Components: clients - java, spellchecker > Affects Versions: 1.4.1 > Environment: Tested against trunk revision 966633 > Reporter: James Dyer > Assignee: Grant Ingersoll > Priority: Minor > Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch, solr_2010_3x.patch > > > Improvements to SpellCheckComponent Collate functionality > Our project requires a better Spell Check Collator. I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features. > 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also). This is especially helpful when there is more than one correction per query. The 1.4 behavior does not verify that a particular combination will actually return hits. > 2. Provide the option to get multiple collation suggestions > 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction. > This patch is similar to what is described in SOLR-507 item #1. Also, this patch provides a viable workaround for the problem discussed in SOLR-1074. A dictionary could be created that combines the terms from the multiple fields. The collator then would prune out any spurious suggestions this would cause. > This patch adds the following spellcheck parameters: > 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. Default is 0, which maintains backwards-compatible behavior (do not check collations). > 2. spellcheck.maxCollations - maximum # of collations to return. Default is 1, which maintains backwards-compatible behavior. > 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found. default is false, which maintains backwards-compatible behavior. When true, output is like this (in context): > > > > 94 > 7 > 11 > > hope > how > hope > chops > hoped > etc > > > 100 > 16 > 21 > > fall > fails > fail > fill > faith > all > etc > > > > Title:(how AND fails) > 2 > > how > fails > > > > Title:(hope AND faith) > 2 > > hope > faith > > > > Title:(chops AND all) > 1 > > chops > all > > > > > In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format. getCollatedResult(), which returns a single String, is retained for backwards-compatibility. Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false. > This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org