lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Analyzer enquiry
Date Mon, 14 Mar 2011 02:06:22 GMT
StandardAnalyzer works well for most European languages. The problem will
be stemming. Applying stemming via English rules to non-English languages results.

You can go ahead and create language-specific fields for each language and
use StandardAnalyzer with the appropriate stopwords and stemming with each,
this is a common approach.. The Snowball stemmer takes a language parameter...

You need to use specific analyzers for Chinese Japanese Korean (CJK) documents

Hope that helps

On Sun, Mar 13, 2011 at 7:23 PM, Vasiliki Gkouta <> wrote:
> Hello everybody,
> I have an enquiry about StandardAnalyzer. Can I use it for other languages
> except from English? I give the right list of stop words at initialization.
> Is there anything else inside the class that is by default set in English?
> I've found the Analyzers for other languages too but they where seem to be
> deprecated.. Moreover I use english and other languages, all together in my
> project so I would like to ask if there is a way to use either the same
> class analyzer for all of them, or analyzers of the same functionality for
> all the languages. Thanks in advance!
> Best regards,
> Vicky
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message