lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sokolov <soko...@ifactory.com>
Subject Re: Automatic synonyms for multiple variations of a word
Date Tue, 26 Apr 2011 20:13:48 GMT
Suppose your analysis stack includes lower-casing, but your synonyms are 
only supposed to apply to upper-case tokens.  For example, "PET" might 
be a synonym of "positron emission tomography", but "pet" wouldn't be.

-Mike

On 04/26/2011 09:51 AM, Robert Muir wrote:
> On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com>  wrote:
>
>    
>> But somehow this feels bad (well, so does sticking word variations in what's
>> supposed to be a synonyms file), partly because it means that the person adding
>> new synonyms would need to know what they stem to (or always check it against
>> Solr before editing the file).
>>      
> when creating the synonym map from your input file, currently the
> factory actually uses your Tokenizer only to pre-process the synonyms
> file.
>
> One idea would be to use the tokenstream up to the synonymfilter
> itself (including filters). This way if you put a stemmer before the
> synonymfilter, it would stem your synonyms file, too.
>
> I haven't totally thought the whole thing through to see if theres a
> big reason why this wouldn't work (the synonymsfilter is complicated,
> sorry). But it does seem like it would produce more consistent
> results... and perhaps the inconsistency isnt so obvious since in the
> default configuration the synonymfilter is directly after the
> tokenizer.
>    

Mime
View raw message