lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Analyzer enquiry
Date Mon, 14 Mar 2011 02:06:22 GMT
StandardAnalyzer works well for most European languages. The problem will
be stemming. Applying stemming via English rules to non-English languages
produces...er...interesting results.

You can go ahead and create language-specific fields for each language and
use StandardAnalyzer with the appropriate stopwords and stemming with each,
this is a common approach.. The Snowball stemmer takes a language parameter...

You need to use specific analyzers for Chinese Japanese Korean (CJK) documents
though.

Hope that helps
Erick

On Sun, Mar 13, 2011 at 7:23 PM, Vasiliki Gkouta <vgkouta@csd.auth.gr> wrote:
> Hello everybody,
>
> I have an enquiry about StandardAnalyzer. Can I use it for other languages
> except from English? I give the right list of stop words at initialization.
> Is there anything else inside the class that is by default set in English?
> I've found the Analyzers for other languages too but they where seem to be
> deprecated.. Moreover I use english and other languages, all together in my
> project so I would like to ask if there is a way to use either the same
> class analyzer for all of them, or analyzers of the same functionality for
> all the languages. Thanks in advance!
>
> Best regards,
> Vicky
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message