lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy <angelf...@yahoo.com>
Subject Multi-lingual auto-complete?
Date Mon, 27 Sep 2010 09:46:55 GMT
I want to provide auto-complete to users when they're inputting tags. The auto-complete tag
suggestions would be based on tags that are already in the system.

Multiple tags are separated by commas. A single tag could contain multiple words such as "Apple
computer".

One issue is that a tag could be in multiple languages, including both languages (e.g. English,
French) that use whitespace as word separator and languages that don't (e.g. CJK)

An example of such a multi-lingual tag is "Apple 电脑".

If a user types "apple", I'd like the autocomplete suggestions to include both "Apple computer"
(ie. matches are case insensitive) and "green apple" (ie. matches aren't restricted to prefixes).
And a user typing "电脑" should match "Apple 电脑".

Is it possible to do that? I read the article:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

In that article KeywordTokenizerFactor is used. If I changed it to CJKTokenizer would that
work? 

With an input of "Apple 电脑", what would CJKTokenizer produce?

-is it "Apple", "电", "脑" ?
or
- is it "A", "p", "p", "l", "e", "电", "脑" ?

Any help would be greatly appreciated.

Andy


      

Mime
View raw message