lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Weiss <swe...@stylesight.com>
Subject Re: Destemming snafu
Date Thu, 18 Jun 2009 21:10:39 GMT
Yes, that's exactly what I needed.  I don't know how I missed that.   
Thank you!

--
Steve

On Jun 18, 2009, at 4:49 PM, Brendan Grainger wrote:

> Are you using Porter Stemming? If so I think you can just specify  
> your word in the protwords.txt file (or whatever you've called it).
>
> Check out http://wiki.apache.org/solr/ 
> AnalyzersTokenizersTokenFilters and the example config for the  
> Porter Stemmer:
> <fieldtype name="myfieldtype" class="solr.TextField">
> 	 <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/>  
> <filter class="solr.EnglishPorterFilterFactory"  
> protected="protwords.txt" /> </analyzer>
> </fieldtype>
>
> HTH
> Brendan
>
> On Jun 18, 2009, at 4:38 PM, Stephen Weiss wrote:
>
>> Hi,
>>
>> I've hit a bit of a problem with destemming and could use some  
>> advice.
>>
>> Right now there is a word in the index called "Stylesight" and  
>> another word "Stylesightings", which was just added.  When users  
>> search for "Stylesightings", the client really only wants them to  
>> get results that match "Stylesightings" and not "Stylesight", as  
>> they are two [relatively] unrelated things.  However, I'm guessing  
>> because of the destemmer, "Stylesightings" becomes "Stylesight"  
>> internally... which results in the "wrong" behavior.
>>
>> I really don't want to turn off the destemmer, that's like killing  
>> an ant with a nuke.  I was thinking, perhaps, since we use both  
>> index- and query-time synonyms, I could make a synonym like this:
>>
>> "Stylesightings" =>  "xlkje0r923jjfsdf"
>>
>> or some other random string of un-destemmable junk, that might  
>> work, but I'm not sure and reindexing all the affected documents  
>> will take quite some time so it would be good to know in advance if  
>> this is even a good idea.
>>
>> Of course, if there's another, better idea, I'd be very open to  
>> that too.
>>
>> Thanks for any suggestions!
>>
>> --
>> Steve
>


Mime
View raw message