lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <angelf...@yahoo.com>
Subject NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?
Date Sat, 02 Oct 2010 09:32:55 GMT
I working on a user-generated tagging feature. Some of the tags could be multi-lingual, mixng
languages like English, Chinese, Japanese

I'd like to add auto-complete to help users to enter the tags. And I'd want to match in the
middle of the tags as well.

For example, if a user types "guit" I want to suggest:
"guitar"
"electric guitar"
"电动guitar"
"guitar英雄"

And if a user types "吉他" I want to suggest:
"吉他Hero"
"electric吉他"
"古典吉他"


I'm thinking about using:

<fieldType name="autocomplete" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="15" />
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

Would the above setup do what I want to do?

Also how would I deal with hyphens? For example I want an input or either "wi-f" or "wif"
to match the tag "wi-fi". 

Would adding WordDelimiterFilterFactory to both "index" and "query" accomplish that?


Thanks.


      

Mime
View raw message