lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junte Zhang <Junte.Zh...@localsearch.ch>
Subject RE: multi language search engine in solr
Date Mon, 11 Sep 2017 16:32:48 GMT
Having the language already separated makes it a lot easier. 

You could add the language suffix (e.g. 3 letter with ISO 639-2B https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
per field where you have the different languages. Or else you could have copied an entire
field to their language-analyzed fields, and hope that would be good enough for matching.


I think Malay should be very similar to Indonesian (https://wiki.apache.org/solr/LanguageAnalysis#Indonesian).
However, you could extend this by adding your own dictionary (keywords) and stopwords (if
that is desirable).

/JZ

-----Original Message-----
From: Mugeesh Husain [mailto:mugeesh@gmail.com] 
Sent: Monday, September 11, 2017 3:46 AM
To: solr-user@lucene.apache.org
Subject: Re: multi language search engine in solr

Thank you rick for your response.

The document document have sepearte of the lanaguage instead of mix of Arabic, English, Bengali,
Hindi, Malay.

I coul not find any tokenizer for Malay, can you suggest me if you know please.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message