lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
Date Thu, 02 Jun 2011 14:38:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042795#comment-13042795
] 

James Dyer commented on SOLR-2010:
----------------------------------

Robert,  your changes to fix resource leak in SpellCheckCollatorTest was merged into 3.x by
Yonik in r1026921.  I believe this issue should have been long closed as it was committed
to Trunk and 3.x last year.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch,
SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch,
SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch,
SOLR-2010_shardSearchHandler_999521.patch, multiple_collations_as_an_array.patch, solr_2010_3x.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch
to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying
original fq params also).  This is especially helpful when there is more than one correction
per query.  The 1.4 behavior does not verify that a particular combination will actually return
hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return
and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides
a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created
that combines the terms from the multiple fields.  The collator then would prune out any spurious
suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before
giving up.  Lower values ensure better performance.  Higher values may be necessary to find
a collation that can return results.  Default is 0, which maintains backwards-compatible behavior
(do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which
maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing
collations found.  default is false, which maintains backwards-compatible behavior.  When
true, output is like this (in context):
> <lst name="spellcheck">
> 	<lst name="suggestions">
> 		<lst name="hopq">
> 			<int name="numFound">94</int>
> 			<int name="startOffset">7</int>
> 			<int name="endOffset">11</int>
> 			<arr name="suggestion">
> 				<str>hope</str>
> 				<str>how</str>
> 				<str>hope</str>
> 				<str>chops</str>
> 				<str>hoped</str>
> 				etc
> 			</arr>
> 		<lst name="faill">
> 			<int name="numFound">100</int>
> 			<int name="startOffset">16</int>
> 			<int name="endOffset">21</int>
> 			<arr name="suggestion">
> 				<str>fall</str>
> 				<str>fails</str>
> 				<str>fail</str>
> 				<str>fill</str>
> 				<str>faith</str>
> 				<str>all</str>
> 				etc
> 			</arr>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(how AND fails)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">how</str>
> 				<str name="faill">fails</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(hope AND faith)</str>
> 			<int name="hits">2</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">hope</str>
> 				<str name="faill">faith</str>
> 			</lst>
> 		</lst>
> 		<lst name="collation">
> 			<str name="collationQuery">Title:(chops AND all)</str>
> 			<int name="hits">1</int>
> 			<lst name="misspellingsAndCorrections">
> 				<str name="hopq">chops</str>
> 				<str name="faill">all</str>
> 			</lst>
> 		</lst>
> 	</lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which
will return the expanded Collation format.  getCollatedResult(), which returns a single String,
is retained for backwards-compatibility.  Other APIs were not changed but will still work
provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction
with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message