lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Zientara" <sc...@tekdata.com>
Subject Re: How to use synonms on a faceted field with multiple words
Date Wed, 18 Aug 2010 18:58:41 GMT
A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with 
some other character/pattern. I used ## (i.e. video => digital##media). Then add the 
solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space. 
This works, but I'd love to know if there is a better way.

Send reply to:  	solr-user@lucene.apache.org
From:           	"Scott Zientara" <scott@tekdata.com>
Organization:   	Tek Data
To:             	solr-user@lucene.apache.org
Date sent:      	Wed, 18 Aug 2010 12:31:57 -0500
Subject:        	How to use synonms on a faceted field with multiple words
Send reply to:  	scott@tekdata.com
Priority:       	normal

[ Double-click this line for list subscription options ] 

I am trying to use solr.SynonymFilterFactory on a faceted field in Solr 1.3. 
I am using Solr to index resources from a media library. The data is coming from various 
sources, some of which I do not have control over. I need to be able to map resource 
types in the data to common terms for faceting. For example:
video,audio => digital media
film,laser disc, vhs video => other

I am using solr.KeywordTokenizerFactory for the analyzer, but Solr will not treat 
multiple words as a single token. 
A single word to single word map (i.e. film => other) works perfectly .
A single to double word map (i.e. film => other stuff) becomes 2 terms which is unfit for


faceting.
A double word to single word map (i.e. vhs video => videotape) doesn't seem to match at

all.

I've tried this with and without the tokenizerFactory="solr.KeywordTokenizerFactory" 
attribute in the synonm filter element. I've tried to escape the space in the synonm file


(i.e. video => digital\bmedia).

Is it possible to use the synonm filter to map multi-word terms for a facteted field? If 
so, what am I missing?


Mime
View raw message