lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: Providing token variants at index time
Date Thu, 22 Jul 2010 20:01:58 GMT
I think the Synonym filter should actually do exactly what you want, no? 

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Hmm, maybe not exactly what you want as you describe it. It comes close, 
maybe good enough. Do you REALLY need to support "I Business M" or "I B 
Machines" as source/query? Your spec suggests yes, synonym filter won't 
easily do that.But if you just want "International Business Machines" == 
"IBM", keeping positions intact for subsequent terms, I think synonym 
filter will do it. 

If not, I suppose you could look at it's source to write your own. Or 
maybe there's some way to combine the PositionFilter with something else 
to do it, but I can't figure one out.

Jonathan

Paul Dlug wrote:
> Is there a tokenizer that supports providing variants of the tokens at
> index time? I'm looking for something that could take a syntax like:
>
> International|I Business|B Machines|M
>
> Which would take each pipe delimited token and preserve its position
> so that phrase queries work properly. The above would result in
> queries for "International Business Machines" as well as "I B M" or
> any variants. The point is that the variants would be generated
> externally as part of the indexing process so they may not be as
> simple as the above.
>
> Any ideas or do I have to write a custom tokenizer to do this?
>
>
> Thanks,
> Paul
>
>   

Mime
View raw message