lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Question wrt Lucene analyzer for different language
Date Thu, 14 May 2009 14:47:33 GMT
in the case of ArabicAnalyzer it will only change Arabic tokens, and will
leave english words as-is (it will not convert them to lowercase or anything
like that)

so if you want to have good Arabic and English behavior you would want to
create a custom analyzer that looks like Arabic analyzer but also invokes
lowercasefilter, perhaps also some english stemmer, etc etc.

On Thu, May 14, 2009 at 10:11 AM, weidong sun <> wrote:

> Hello,
> I am a newbie in Lucene world. I might ask some obvious question which
> unfortunately I don't know the answer. Please help me 'grow'.
> We have a project intend to use Lucene search engine for search some user's
> info stored our system. The user info might not be in English even it will
> be stored in UTF-8 encoding.
> My question is, if I use one particular Lucene analyzer for a language
> other
> than English (e.g. ChineseAnalyzer or ArabicAnalyzer), can it still able to
> handle it correctly if user info is mixed with English character/word?
> Really appreciated with any answers.
> :-)

Robert Muir

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message