lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From weidong sun <lmcw...@gmail.com>
Subject Re: Question wrt Lucene analyzer for different language
Date Thu, 14 May 2009 15:26:30 GMT
Thanks for the quick answer. :-)

So  can I say, for ArabicAnalyzer, generally it can tokenize the mixed
content with Arabic and English? :-)

I am not really familiar with Arabic language. What do you mean for "change
Arabic tokens"? Does Arabic has something like upper/lower case as English
does?


On Thu, May 14, 2009 at 10:47 AM, Robert Muir <rcmuir@gmail.com> wrote:

> in the case of ArabicAnalyzer it will only change Arabic tokens, and will
> leave english words as-is (it will not convert them to lowercase or
> anything
> like that)
>
> so if you want to have good Arabic and English behavior you would want to
> create a custom analyzer that looks like Arabic analyzer but also invokes
> lowercasefilter, perhaps also some english stemmer, etc etc.
>
> On Thu, May 14, 2009 at 10:11 AM, weidong sun <lmcwesu@gmail.com> wrote:
>
> > Hello,
> >
> > I am a newbie in Lucene world. I might ask some obvious question which
> > unfortunately I don't know the answer. Please help me 'grow'.
> >
> > We have a project intend to use Lucene search engine for search some
> user's
> > info stored our system. The user info might not be in English even it
> will
> > be stored in UTF-8 encoding.
> >
> > My question is, if I use one particular Lucene analyzer for a language
> > other
> > than English (e.g. ChineseAnalyzer or ArabicAnalyzer), can it still able
> to
> > handle it correctly if user info is mixed with English character/word?
> >
> > Really appreciated with any answers.
> >
> > :-)
> >
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message