lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingrambook.com>
Subject RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
Date Fri, 13 Aug 2010 14:55:20 GMT
Grant,

I saw your comment and I agree its probably best to somehow re-query
through a Search Handler, either the existing one with all other
components turned off, or through a new one just for this purpose.  If
you (or someone else) are not able to work on implementing it this way
then I can probably get a little time in a few weeks.   

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, August 13, 2010 7:34 AM
To: dev@lucene.apache.org
Subject: Re: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality

Hi James,

Did you see my comments on the issue?  

On Aug 11, 2010, at 12:28 AM, Dyer, James wrote:

> Tom,
> 
> I'm going to also need this to work with 1.4.1 within the next month
or two so if someone else doesn't back-port it to 1.4.1 then I probably
will.  I also would like to see this working with shards.  The
PossibilityIterator class likely can be made a lot simpler.  If nobody
else takes care of these items I will try to find time to do so myself
prior to making it work with 1.4.1.
> 
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
> 
> -----Original Message-----
> From: Tom Phethean (JIRA) [mailto:jira@apache.org] 
> Sent: Tuesday, August 10, 2010 10:01 AM
> To: dev@lucene.apache.org
> Subject: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality
> 
> 
>    [
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#
action_12896903 ] 
> 
> Tom Phethean commented on SOLR-2010:
> ------------------------------------
> 
> Ok, thanks. Do you know if there is a rough timescale on that?
> 
>> Improvements to SpellCheckComponent Collate functionality
>> ---------------------------------------------------------
>> 
>>                Key: SOLR-2010
>>                URL: https://issues.apache.org/jira/browse/SOLR-2010
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: clients - java, spellchecker
>>   Affects Versions: 1.4.1
>>        Environment: Tested against trunk revision 966633
>>           Reporter: James Dyer
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>        Attachments: SOLR-2010.patch, SOLR-2010.patch
>> 
>> 
>> Improvements to SpellCheckComponent Collate functionality
>> Our project requires a better Spell Check Collator.  I'm contributing
this as a patch to get suggestions for improvements and in case there is
a broader need for these features.
>> 1. Only return collations that are guaranteed to result in hits if
re-queried (applying original fq params also).  This is especially
helpful when there is more than one correction per query.  The 1.4
behavior does not verify that a particular combination will actually
return hits.
>> 2. Provide the option to get multiple collation suggestions
>> 3. Provide extended collation results including the # of hits
re-querying will return and a breakdown of each misspelled word and its
correction.
>> This patch is similar to what is described in SOLR-507 item #1.
Also, this patch provides a viable workaround for the problem discussed
in SOLR-1074.  A dictionary could be created that combines the terms
from the multiple fields.  The collator then would prune out any
spurious suggestions this would cause.
>> This patch adds the following spellcheck parameters:
>> 1. spellcheck.maxCollationTries - maximum # of collation
possibilities to try before giving up.  Lower values ensure better
performance.  Higher values may be necessary to find a collation that
can return results.  Default is 0, which maintains backwards-compatible
behavior (do not check collations).
>> 2. spellcheck.maxCollations - maximum # of collations to return.
Default is 1, which maintains backwards-compatible behavior.
>> 3. spellcheck.collateExtendedResult - if true, returns an expanded
response format detailing collations found.  default is false, which
maintains backwards-compatible behavior.  When true, output is like this
(in context):
>> <lst name="spellcheck">
>> 	<lst name="suggestions">
>> 		<lst name="hopq">
>> 			<int name="numFound">94</int>
>> 			<int name="startOffset">7</int>
>> 			<int name="endOffset">11</int>
>> 			<arr name="suggestion">
>> 				<str>hope</str>
>> 				<str>how</str>
>> 				<str>hope</str>
>> 				<str>chops</str>
>> 				<str>hoped</str>
>> 				etc
>> 			</arr>
>> 		<lst name="faill">
>> 			<int name="numFound">100</int>
>> 			<int name="startOffset">16</int>
>> 			<int name="endOffset">21</int>
>> 			<arr name="suggestion">
>> 				<str>fall</str>
>> 				<str>fails</str>
>> 				<str>fail</str>
>> 				<str>fill</str>
>> 				<str>faith</str>
>> 				<str>all</str>
>> 				etc
>> 			</arr>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(how AND
fails)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">how</str>
>> 				<str name="faill">fails</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(hope AND
faith)</str>
>> 			<int name="hits">2</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">hope</str>
>> 				<str name="faill">faith</str>
>> 			</lst>
>> 		</lst>
>> 		<lst name="collation">
>> 			<str name="collationQuery">Title:(chops AND
all)</str>
>> 			<int name="hits">1</int>
>> 			<lst name="misspellingsAndCorrections">
>> 				<str name="hopq">chops</str>
>> 				<str name="faill">all</str>
>> 			</lst>
>> 		</lst>
>> 	</lst>
>> </lst>
>> In addition, SOLRJ is updated to include
SpellCheckResponse.getCollatedResults(), which will return the expanded
Collation format.  getCollatedResult(), which returns a single String,
is retained for backwards-compatibility.  Other APIs were not changed
but will still work provided that spellcheck.collateExtendedResult is
false.
>> This likely will not return valid results if using Shards.  Rather, a
more robust interaction with the index would be necessary than what
exists in SpellCheckCollator.collate().
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message