lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bing <JSuser1...@hotmail.com>
Subject Can solr-langid(Solr3.5.0) detect multiple languages in one text?
Date Tue, 13 Mar 2012 03:25:50 GMT
Hi, all, 

I am using solr-langid(Solr3.5.0) to do language detection, and I hope
multiple languages in one text can be detected. 

The example text is: 
咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創,由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷,此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中,「kari」是「醬」的意思。在馬來西亞,kari也稱dal(當在mamak檔)。早期印度被蒙古人所建立的莫臥兒帝國(Mughal
Empire)所統治過,其間從波斯(現今的伊朗)帶來的飲食習慣,從而影響印度人的烹調風格直到現今。
Curry (plural, Curries) is a generic term primarily employed in Western
culture to denote a wide variety of dishes originating in Indian, Pakistani,
Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their
common feature is the incorporation of more or less complex combinations of
spices and herbs, usually (but not invariably) including fresh or dried hot
capsicum peppers, commonly called "chili" or "cayenne" peppers.

I want the text can be separated into two parts, and the part in Chinese
goes to "text_zh-tw" while the other one "text_en". Can I do something like
that? 

Thank you. 

Best Regards, 
Bing 


--
View this message in context: http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message