lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lochschmied, Alexander" <Alexander.Lochschm...@vishay.com>
Subject AW: Spellchecking and suggesting part numbers
Date Mon, 03 Nov 2014 08:36:42 GMT
Thanks James, this did help a lot.

Is it possible to make DirectSolrSpellChecker try to return suggestions with maximum length
of matching leading characters?

Alexander

-----Urspr√ľngliche Nachricht-----
Von: Dyer, James [mailto:James.Dyer@ingramcontent.com] 
Gesendet: Mittwoch, 24. September 2014 16:42
An: solr-user@lucene.apache.org
Betreff: RE: Spellchecking and suggesting part numbers

Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your application
pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice anyhow) and set the
"minPrefix" field.  This will require up to n characters on the left side to match before
it will make suggestions.  Taking a quick look at the code, it seems to me it won't try and
correct anything in this prefix region also.  So perhaps you can set this to 2-4 (default=1).
 See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Lochschmied, Alexander [mailto:Alexander.Lochschmied@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:

	<searchComponent name="spellcheck_part" class="solr.SpellCheckComponent">
		<lst name="spellchecker">
			<str name="classname">solr.IndexBasedSpellChecker</str>
			<str name="spellcheckIndexDir">./spellchecker</str>
			<str name="field">did_you_mean_part</str>
		</lst>
	</searchComponent>
	<requestHandler name="/spell_part" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<str name="df">did_you_mean_part</str>
			<str name="spellcheck">on</str>
		</lst>
		<arr name="last-components">
			<str>spellcheck_part</str>
		</arr>
	</requestHandler>


	<fieldType name="did_you_mean_part" class="solr.TextField" positionIncrementGap="100">
		<analyzer type="index">
			<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
			<tokenizer class="solr.WhitespaceTokenizerFactory"/>
			<filter class="solr.LowerCaseFilterFactory"/>
			<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20" side="front"/>
			<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
		</analyzer>
		<analyzer type="query">
			<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
			<tokenizer class="solr.KeywordTokenizerFactory"/>
			<filter class="solr.LowerCaseFilterFactory"/>
			<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20" side="front"/>
		</analyzer>
	</fieldType>

Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander



Mime
View raw message