lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Murzaku <>
Subject analyzers using snowball
Date Sat, 21 Sep 2002 15:15:38 GMT
Since from time to time we have these questions/discussions about
whether Lucene supports specific natural languages, I adapted a set of
analyzers and filters to use the Snowball
( generated Java stemmers. This could be a
good start for anybody needing to get into more detail in a particular
language (like the existing Russian and German analyzers). It uses the
StandardTokenizer which works fine for the other languages (except

The whole package is located at
and it is about 2.3MB. The reason for this size is that it also
contains all the test dictionaries for the 12 languages supported.
These languages are: Danish, Dutch, English (Porter2), Finnish, French,
German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish.
Finnish has some minor problems and I wasn't able to test properly
Russian since I am not familiar with character codesets. But I wouldn't
bother with Russian (or German) since it is already contained in the
Lucene package. As for Finnish, I am already communicating with the
Snowball team and hopefully it will work in Java as well as in the
other environments.

Best regards,


__________________________________ --

Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message